# Machine Learning - A Probabilistic Perspective Exercises - Chapter 4

This is a continuation of the exercises in "Machine learning - a probabilistic perspective" by Kevin Murphy. Chapter 4 is on "Gaussian Models". Let's get started!

**4.1 Uncorrelated does not imply independent**

Let \( X \sim U(-1,1) \) and \(Y = X^2\). Clearly Y is dependent on X, show \(\rho(X,Y)=0\).

\(\rho(X,Y)\) is just a normalised version of the covariance, so we just need to show the covariance is zero, i.e.:

\(\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]\)

Clearly \( \mathbb{E}[X] = 0\) and so we just need to calculate \(\mathbb{E}[XY]\) and show this is zero. We can write:

\( \mathbb{E}[XY] = \int_{-1}^1 dx \int_0^1 dy \ xy p(x,y) \)

Then we say \(p(x,y) = p(y|x) p(x)\), but \(p(y|x) = \delta(y - x^2)\), i.e. a dirac-delta function, and \(p(x)=1/2\), i.e. just a constant. This means we can evaluate the integral over y to get:

\( \mathbb{E}[XY] = 1/2 \int_{-1}^1 x^3\)

This is the integral of an odd function and so is clearly equal to zero.

**4.2 Uncorrelated and Gaussian does not imply independent, unless jointly Gaussian**

Let \(X \sim \mathcal{N}(0,1)\) and \(Y=WX\), where W takes values \( \pm 1\) with equal probability. Clearly X and Y are not independent, as Y is a function of X.

**(a) Show**\(Y \sim \mathcal{N}(0,1)\)

This is kind of obvious from symmetry because \(\mathcal{N}(0,1)\) is symmetric, i.e. \( \mathcal{N}(x|0,1) = \mathcal{N}(-x|0,1) \). This means we can write:

\(P(Y=y) = P(W=1)P(X=y) + P(W=-1)P(X=-y) = P(X=y) = \mathcal{N}(0,1) \)

**(b) Show covariance between X and Y is zero**

We know that \(\mathbb{E}[X] = \mathbb{E}[Y] = 0\), so we just need to evaluate \( \mathbb{E}[XY]\):

\( \mathbb{E}[XY] = \int \int dx \ dy \ xy p(x,y)\)

But again \(p(x,y) = p(y|x)p(x)\), and we can write \(p(y|x) = 0.5 \delta(y-x) + 0.5 \delta(y+x)\). This means we are left with:

\(\mathbb{E}[XY] = \int_{-\infty}^{\infty} x \mathcal{N}(x|0,1)(0.5(x-x)) dx = 0\)

which proves the result.

**4.3 Prove **\(-1 \le \rho(X,Y) \le 1\)

Let us start with the definitions:

\(\rho(X,Y) = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X) \text{Var}(Y)}}\)

\(\text{Cov}(X,Y) = \mathbb{E}[(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])]\)

\(\text{Var}(X) = \mathbb{E}[(X-\mathbb{E}[X])^2] \)

Let us write \(\mu_X = \mathbb{E}[X]\) and \(\mu_Y = \mathbb{E}[Y]\), for notational convenience. If now for any constants a and b we consider:

\( \mathbb{E}[(a(X-\mu_X) + b(Y-\mu_Y))^2] \)

which is clearly greater than or equal to zero. Multiplying out, this inequality gives:

\(a^2 \mathbb{E}[(X-\mu_X)^2] + b^2 \mathbb{E}[(Y-\mu_Y)^2] + 2ab \mathbb{E}[(X-\mu_X)(Y-\mu_Y)] \ge 0 \)

Which we can re-write as:

\(2ab \text{Cov}(X,Y) \ge -a^2 \text{Var}(X) - b^2 \text{Var}(Y)\)

Now let us substitute in \(a^2 = \text{Var}(Y)\) and \(b^2 = \text{Var}(X)\):

\(2 \sqrt{\text{Var}(X) \text{Var}(Y)} \text{Cov}(X,Y) \ge -2 \text{Var}(X) \text{Var}(Y) \)

\( \implies \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X) \text{Var}(Y)}} = \rho(X,Y) \ge -1\)

If we do the same thing, but instead now consider \(\mathbb{E}[(a(X-\mu_X) - b(Y-\mu_Y))^2]\), with the same definitions of a and b, it's easy to show that \( \rho(X,Y) \le 1\) as well.

**4.4 Correlation coefficient for linearly related variables**

If \(Y=aX + b\), then if \( a > 0 \) show that \( \rho(X,Y)=1\), and if \(a < 0\) that \( \rho(X,Y) = -1\).

Let's say \(\mathbb{E}[X] = \mu_X\) and \(\text{Var}(X) = \sigma_X^2\). It follows that:

\( \mathbb{E}[Y] = a \mu_X + b\) and \( \text{Var}(Y) = a^2 \sigma_X^2\).

Now, to evaluate the correlation we need \(\mathbb{E}[XY] = \mathbb{E}[aX^2 + bX] = a \mathbb{E}[X^2] + b \mu_X\)

This means that the covariance is:

\( \text{Cov}(X,Y) = a \mathbb{E}[X^2] + b \mu_X - \mu_X(a \mu_X + b) = a \sigma_X^2\)

This allows us to get the correlation:

\( \rho(X,Y) = \frac{ \text{Cov}(X,Y)}{ \sqrt{\sigma_X^2 \sigma_Y^2}} = \frac{a \sigma_X^2}{\sqrt{a^2 \sigma_X^4}} = \frac{a \sigma_X^2}{|a| \sigma_X^2} = sgn(a)\)

Which is all we were asked to show!

**4.5 Normalization constant for MV Gaussian**

Prove that: \( (2 \pi)^{d/2} | \mathbf{\Sigma}|^{1/2} = \int \exp(-\frac{1}{2} (\mathbf{x-\mu}^T \mathbf{\Sigma}^{-1} (\mathbf{x-\mu})) d \mathbf{x} \)

We are told to diagonalize the covariance matrix, which can always be done since it is symmetric. That is, we can write:

\(D = P^{-1} \Sigma P\)

Where D is a diagonal matrix where the entries are the eigenvalues of \(\Sigma\) and the columns of P are the eigenvectors. In fact, since \(\Sigma\) is symmetric the eigenvectors can form an orthogonal basis, and it is possible to make P an orthogonal matrix, such that \(P^{-1} = P^T\). This allows us to say:

\(D^{-1} = P^T \Sigma^{-1} P \implies \Sigma^{-1} = P D^{-1} P^T\)

As such, we can write the integral as:

\( \int \exp(-\frac{1}{2}(x-\mu)^T P D^{-1} P^T(x-\mu)) dx = \int \exp(-\frac{1}{2} (P(x-\mu))^T \begin{bmatrix} \frac{1}{\lambda_1} & & \\ & \ddots & \\ & & \frac{1}{\lambda_d} \end{bmatrix} (P(x-\mu))) dx \)

Now let us define \(y = P(x-\mu)\). Because P is an orthogonal matrix (which has determinant 1), the Jacobian is 1 and we can replace \(dx\) with \(dy\). The term inside the exponential is then:

\( \sum_{ij} y_i \delta_{ij} \frac{1}{\lambda_i} y_j = \sum_i \frac{y_i^2}{\lambda_i}\). Effectively by transforming to the eigenbasis we have decoupled the components of y, so we can write:

\( = \int_{-\infty}^{\infty} dy_1 e^{-\frac{y_1^2}{2 \lambda_1}} \dots \int_{-\infty}^{\infty} dy_d e^{-\frac{y_d^2}{2 \lambda_d}}\)

i.e. just the product of many one-dimensional Gaussians. This is equal to:

\( \sqrt{2 \pi \lambda_1} \sqrt{2 \pi \lambda_2} \dots \sqrt{2 \pi \lambda_d} = (2 \pi)^{d/2} \sqrt{\lambda_1 \dots \lambda_d}\)

We then use that \(det(\Sigma) = \prod_{i=1}^d \lambda_i\), which gives us the final answer we want!