This is a continuation of the exercises in "Machine learning - a probabilistic perspective" by Kevin Murphy. Chapter 4 is on "Gaussian Models". Let's get started!

4.1 Uncorrelated does not imply independent

Let $$X \sim U(-1,1)$$ and $$Y = X^2$$. Clearly Y is dependent on X, show $$\rho(X,Y)=0$$.

$$\rho(X,Y)$$ is just a normalised version of the covariance, so we just need to show the covariance is zero, i.e.:

$$\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]$$

Clearly $$\mathbb{E}[X] = 0$$ and so we just need to calculate $$\mathbb{E}[XY]$$ and show this is zero. We can write:

$$\mathbb{E}[XY] = \int_{-1}^1 dx \int_0^1 dy \ xy p(x,y)$$

Then we say $$p(x,y) = p(y|x) p(x)$$, but $$p(y|x) = \delta(y - x^2)$$, i.e. a dirac-delta function, and $$p(x)=1/2$$, i.e. just a constant. This means we can evaluate the integral over y to get:

$$\mathbb{E}[XY] = 1/2 \int_{-1}^1 x^3$$

This is the integral of an odd function and so is clearly equal to zero.

4.2 Uncorrelated and Gaussian does not imply independent, unless jointly Gaussian

Let $$X \sim \mathcal{N}(0,1)$$ and $$Y=WX$$, where W takes values $$\pm 1$$ with equal probability. Clearly X and Y are not independent, as Y is a function of X.

(a) Show$$Y \sim \mathcal{N}(0,1)$$

This is kind of obvious from symmetry because $$\mathcal{N}(0,1)$$ is symmetric, i.e. $$\mathcal{N}(x|0,1) = \mathcal{N}(-x|0,1)$$. This means we can write:

$$P(Y=y) = P(W=1)P(X=y) + P(W=-1)P(X=-y) = P(X=y) = \mathcal{N}(0,1)$$

(b) Show covariance between X and Y is zero

We know that $$\mathbb{E}[X] = \mathbb{E}[Y] = 0$$, so we just need to evaluate $$\mathbb{E}[XY]$$:

$$\mathbb{E}[XY] = \int \int dx \ dy \ xy p(x,y)$$

But again $$p(x,y) = p(y|x)p(x)$$, and we can write $$p(y|x) = 0.5 \delta(y-x) + 0.5 \delta(y+x)$$. This means we are left with:

$$\mathbb{E}[XY] = \int_{-\infty}^{\infty} x \mathcal{N}(x|0,1)(0.5(x-x)) dx = 0$$

which proves the result.

4.3 Prove $$-1 \le \rho(X,Y) \le 1$$

$$\rho(X,Y) = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X) \text{Var}(Y)}}$$

$$\text{Cov}(X,Y) = \mathbb{E}[(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])]$$

$$\text{Var}(X) = \mathbb{E}[(X-\mathbb{E}[X])^2]$$

Let us write $$\mu_X = \mathbb{E}[X]$$ and $$\mu_Y = \mathbb{E}[Y]$$, for notational convenience. If now for any constants a and b we consider:

$$\mathbb{E}[(a(X-\mu_X) + b(Y-\mu_Y))^2]$$

which is clearly greater than or equal to zero. Multiplying out, this inequality gives:

$$a^2 \mathbb{E}[(X-\mu_X)^2] + b^2 \mathbb{E}[(Y-\mu_Y)^2] + 2ab \mathbb{E}[(X-\mu_X)(Y-\mu_Y)] \ge 0$$

Which we can re-write as:

$$2ab \text{Cov}(X,Y) \ge -a^2 \text{Var}(X) - b^2 \text{Var}(Y)$$

Now let us substitute in $$a^2 = \text{Var}(Y)$$ and $$b^2 = \text{Var}(X)$$:

$$2 \sqrt{\text{Var}(X) \text{Var}(Y)} \text{Cov}(X,Y) \ge -2 \text{Var}(X) \text{Var}(Y)$$

$$\implies \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X) \text{Var}(Y)}} = \rho(X,Y) \ge -1$$

If we do the same thing, but instead now consider $$\mathbb{E}[(a(X-\mu_X) - b(Y-\mu_Y))^2]$$, with the same definitions of a and b, it's easy to show that $$\rho(X,Y) \le 1$$ as well.

4.4 Correlation coefficient for linearly related variables

If $$Y=aX + b$$, then if $$a > 0$$ show that $$\rho(X,Y)=1$$, and if $$a < 0$$ that $$\rho(X,Y) = -1$$.

Let's say $$\mathbb{E}[X] = \mu_X$$ and $$\text{Var}(X) = \sigma_X^2$$. It follows that:

$$\mathbb{E}[Y] = a \mu_X + b$$ and $$\text{Var}(Y) = a^2 \sigma_X^2$$.

Now, to evaluate the correlation we need $$\mathbb{E}[XY] = \mathbb{E}[aX^2 + bX] = a \mathbb{E}[X^2] + b \mu_X$$

This means that the covariance is:

$$\text{Cov}(X,Y) = a \mathbb{E}[X^2] + b \mu_X - \mu_X(a \mu_X + b) = a \sigma_X^2$$

This allows us to get the correlation:

$$\rho(X,Y) = \frac{ \text{Cov}(X,Y)}{ \sqrt{\sigma_X^2 \sigma_Y^2}} = \frac{a \sigma_X^2}{\sqrt{a^2 \sigma_X^4}} = \frac{a \sigma_X^2}{|a| \sigma_X^2} = sgn(a)$$

Which is all we were asked to show!

4.5 Normalization constant for MV Gaussian

Prove that: $$(2 \pi)^{d/2} | \mathbf{\Sigma}|^{1/2} = \int \exp(-\frac{1}{2} (\mathbf{x-\mu}^T \mathbf{\Sigma}^{-1} (\mathbf{x-\mu})) d \mathbf{x}$$

We are told to diagonalize the covariance matrix, which can always be done since it is symmetric. That is, we can write:

$$D = P^{-1} \Sigma P$$

Where D is a diagonal matrix where the entries are the eigenvalues of $$\Sigma$$ and the columns of P are the eigenvectors. In fact, since $$\Sigma$$ is symmetric the eigenvectors can form an orthogonal basis, and it is possible to make P an orthogonal matrix, such that $$P^{-1} = P^T$$. This allows us to say:

$$D^{-1} = P^T \Sigma^{-1} P \implies \Sigma^{-1} = P D^{-1} P^T$$

As such, we can write the integral as:

$$\int \exp(-\frac{1}{2}(x-\mu)^T P D^{-1} P^T(x-\mu)) dx = \int \exp(-\frac{1}{2} (P(x-\mu))^T \begin{bmatrix} \frac{1}{\lambda_1} & & \\ & \ddots & \\ & & \frac{1}{\lambda_d} \end{bmatrix} (P(x-\mu))) dx$$

Now let us define $$y = P(x-\mu)$$. Because P is an orthogonal matrix (which has determinant 1), the Jacobian is 1 and we can replace $$dx$$ with $$dy$$. The term inside the exponential is then:

$$\sum_{ij} y_i \delta_{ij} \frac{1}{\lambda_i} y_j = \sum_i \frac{y_i^2}{\lambda_i}$$. Effectively by transforming to the eigenbasis we have decoupled the components of y, so we can write:

$$= \int_{-\infty}^{\infty} dy_1 e^{-\frac{y_1^2}{2 \lambda_1}} \dots \int_{-\infty}^{\infty} dy_d e^{-\frac{y_d^2}{2 \lambda_d}}$$

i.e. just the product of many one-dimensional Gaussians. This is equal to:

$$\sqrt{2 \pi \lambda_1} \sqrt{2 \pi \lambda_2} \dots \sqrt{2 \pi \lambda_d} = (2 \pi)^{d/2} \sqrt{\lambda_1 \dots \lambda_d}$$

We then use that $$det(\Sigma) = \prod_{i=1}^d \lambda_i$$, which gives us the final answer we want!