Equivalence of Statistics on a Pair of Gaussian Channels
This post is a departure from our usual discussions. It relates to statistics and information theory, as applied to a somewhat limited model of communication. In this model, we have two variables with normally distributed amplitudes. One variable is the "true" signal, and the other contains the "true" signal mixed in with some noise. This is a model of a noisy Guassian communication channel. The main purpose is to show that, under these conditions, correlation, squared error, mutual information, and signal-to-noise ratio all become equally good and interconvertible measurements of how related the two signals are. This topic was of interest because many people apply all of these statistics separately to the same data. These notes are not written for a general audience, I'm just putting this out here in case someone, somewhere, finds it interesting.
For a pair of Gaussian channels ( continuous random variables who's values follow a normal distribution ), the mutual information, correlation, root mean squared error, correlation, and signal to noise ratio, are all equivalent and can be computed from each-other. Without loss of generality we restrict this discussion to zero-mean unit variance channels. This discussion elaborates on the discussion of mutual information between Gaussian channels presented in the third chapter of Spikes.
Correlation & Mutual Information
Consider a single gaussian channel $y = g x + n$, where $x$ is the input, $y$ is the output, $g$ is the gain, and $n$ is addative gaussian noise. Without loss of generality, assume that $x$, $n$ and $y$ have been converted to z-scores. Reconstructed z-scores can always be mapped back to the original gaussian variable by multiplying by the original standard deviations and adding in the original means. This means that all random variables have zero mean and unit variance. If we do this, we will need a separate gain for the signal and noise, say, $a$ and $b$.
\[y = a x + b n\]
Since ths signal and noise are independent, their variances add:
\[\sigma^2_{y} = \sigma^2_{a x} + \sigma^2_{b n}\] and the gain parameters can be factored out
\[\sigma^2_{y} = a^2 \sigma^2_{x} + b^2 \sigma^2_{n}.\]
Since $\sigma^2_{y}=\sigma^2_{x}=\sigma^2_{n}=1$,
\[a^2+b^2=1\]
This can be parameterized as
\[\sigma^2_{y} = \alpha \sigma^2_{x} + (1-\alpha) \sigma^2_{n},\,\,\alpha=a^2\in[0,1]\]
and
\[y = x\sqrt{\alpha} + n\sqrt{1-\alpha}\]
The relationships between mutual information $I$ and signal-to-noise ration $SNR$ come from Spikes, chapter 3.
\[I=\frac{1}{2}lg(1+\frac{\sigma^2_{a x}}{\sigma^2_{b n}})=\frac{1}{2}lg(1+SNR)\]
Where $lg(\dots)$ is the base-2 logarithm.
The $SNR$ simplifies as :
\[SNR=\frac{\sigma^2_{a x}}{\sigma^2_{b n}}=\frac{\alpha \sigma^2_x}{(1-\alpha) \sigma^2_n}=\frac{\alpha}{1-\alpha}\]
Mutual information simplifies as :
\[I=\frac{1}{2}lg(1+SNR)=\frac{1}{2}lg{\frac{\sigma^2_y}{\sigma^2_{b n}}}=\frac{1}{2}lg{\frac{\sigma^2_y}{(1-\alpha)\sigma^2_n}}=\frac{1}{2}lg{\frac{1}{1-\alpha}}\]
The correlation $\rho$ is the standard definition of Pearson's product-moment correlation coefficient, which can be viewed as the angle $\theta$ between vectors defined by the samples of random variables $x$ and $y$.
\[\rho=cos(\theta)=\frac{x y}{|x||y|}\]
Since $x$ and $n$ are independent, the samples of $x$ and $n$ can be viewed as an orthonormal basis for the samples of $y$, where the weights of the components are just previously defined $a$ and $b$, respectively. This relates our gain parameters to the correlation coefficient: the tangent of the angle between $y$ and $x$ is just the ratio of the noise gain to the signal gain
\[tan(\theta)=\frac{b}{a}=\frac{\sqrt{1-\alpha}}{\sqrt{\alpha}}\]
Then $tan(\theta)$ can be expressed in terms of the correlation coefficient $\rho$ :
\[tan(\theta)=\frac{sin(\theta)}{cos(\theta)}=\frac{\sqrt{1-cos(\theta)^2}}{cos(\theta)}=\frac{\sqrt{1-\rho^2}}{\rho}\]
This gives the relationship $\sqrt{1-\alpha}/\sqrt{\alpha}=\sqrt{1-\rho^2}/\rho$, which implies that that $\alpha=\rho^2$, or $a=\rho$. (There is a slight problem here in that correlation can be negative, but it is the magnitude of the correlation that really matters. As a temporary fix, correlation now means "absolute value of the correlation".) This can be used to relate $\rho$ to $SNR$ and mutual informtaion:
\[SNR=\frac{\rho^2}{1-\rho^2}\]
\[I=\frac{1}{2}lg{\frac{1}{1-\rho^2}}=-\frac{1}{2}lg(1-\rho^2)\]
As a corollary, if $\phi=\sqrt{1-\rho^2}$ is the correlation of $y$ and the noise $n$, then information is simply $I=-lg(\phi)$. Mean squared error ($MSE$) is also related :
\[MSE=(1-\rho)^2+(1-\rho^2)=1-2\rho+1=2(1-\rho)\]
which implies that
\[\rho=1-\frac{1}{2}MSE\]
and gives a relationship between mutual information and mean squared error:
\[I=-\frac{1}{2}lg(1-\rho^2)=-\frac{1}{2}lg(1-(1-MSE/2)^2)\]
The relationships between correlation $\rho$, root mean squared error $RMSE$, information $I$, and signal to noise ratio $SNR$, all increase monotonically, implying that correlation, SNR, and mutual information, all give the same quality ranking for a collection of channels.
Further Speculation
This can be generalized (as in chapter 3 of Spikes) to vector-valued Gaussian variables by transforming into a space where $Y=AX+BN$ is diagonal, treating each component independently, and then transforming back into the original space.
Similarly to how chapter 3 of Spikes generalizes mutual information of a Gaussian channel into a bound on mutual information for possibly non-gaussian, vector valued, channels, these relationships can be generalized to inequalities for non-Gaussian channels :
\[I\geq-lg(\Phi)=-\frac{1}{2}lg(1-\Sigma^2)=-\frac{1}{2}lg(1-(1-MSE/2)^2)\]
Where, for vector valued variables, $\phi$, $\rho$, and $MSE$ become matrices $\Phi$, $\Sigma$, and $MSE$.
No comments:
Post a Comment