EM algorithm for MLE from a bivariate normal sample with missing data: Stuck on M-step

Question 1

I'm trying to understand applying the EM algorithm to compute the MLE in a missing data problem.

Specifically, suppose $(x_1,y_1),\ldots,(x_n,y_n)$ is a random sample from the bivariate normal distribution with mean $(0,0)$ and unknown covariance matrix $$ \Sigma=\begin{bmatrix} \sigma_x^2&\rho\sigma_x\sigma_y\\ \rho\sigma_x\sigma_y&\sigma_y^2 \end{bmatrix}. $$ I want to find the MLE of $\Sigma$ given that the first $\nu< n$ of the $y$-coordinates are missing.

For the E-step of the EM algorithm, I need to compute

$$ Q_i(\Sigma,\tilde\Sigma):= \begin{cases} \mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)}[\log f(x_i,y\mid\Sigma)]& \text{if }1\leq i\leq \nu,\\ \log f(x_i,y_i\mid\Sigma)&\text{if }\nu< i\leq n, \end{cases} $$ where $$ \begin{aligned} -\log f(x,y\mid\Sigma) = \log2\pi &+ \frac12\log\sigma_x^2 + \frac12\log\sigma_y^2 + \frac12\log(1-\rho^2)\\ &+ \frac{1}{2(1-\rho^2)} \left\{\frac{x^2}{\sigma_x^2} - \frac{2\rho xy}{\sigma_x\sigma_y} + \frac{y^2}{\sigma_y^2}\right\} \end{aligned} $$ and $$ \tilde\Sigma=\begin{bmatrix} \tilde\sigma_x^2&\tilde\rho\tilde\sigma_x\tilde\sigma_y\\ \tilde\rho\tilde\sigma_x\tilde\sigma_y&\tilde\sigma_y^2 \end{bmatrix}. $$

Since $$ \begin{aligned} \mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)} [y] &= \frac{\tilde\rho \tilde\sigma_yx_i}{\tilde\sigma_x},\\ \mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)}[y^2] &=\tilde\sigma_y^2(1-\tilde\rho^2) + \frac{\tilde\rho^2 \tilde\sigma^2_yx_i^2}{\tilde\sigma_x^2}, \end{aligned} $$

I get that

$$ \begin{aligned} -Q_i(\Sigma,\tilde\Sigma) = \log2\pi &+ \frac12\log\sigma_x^2 + \frac12\log\sigma_y^2 + \frac12\log(1-\rho^2)\\ &+ \frac{1}{2(1-\rho^2)} \left\{\frac{x_i^2}{\sigma_x^2} - \frac{2\rho\tilde\rho\tilde\sigma_y^2 x_i^2}{\sigma_x\tilde\sigma_x\sigma_y} + \frac{\tilde\sigma_x^2\tilde\sigma_y^2(1-\tilde\rho^2)+ \tilde\rho^2 \tilde\sigma^2_yx_i^2}{\tilde\sigma_x^2\sigma_y^2} \right\}. \end{aligned} $$

for 1ドル\leq i\leq \nu$.

Now, for the M-step, I need to compute $$ \operatorname*{argmax}_{\Sigma} \sum_{i=1}^n Q_i(\Sigma,\tilde\Sigma). $$

And here I'm stuck. Is there a nice form for the optimal $\Sigma$?

Question 2

Added missing 0.5*log(1-rho^2) term. Thanks!

Question 3

Added the $-$ sign as suggested. Thanks again, Xi'an.

Question 4

Fixed that, too. Also and a commentary on your hint. I included it as an answer (even though your answer is the answer, @xian) since it was too long to fit in a comment.

Question 5

Hint: When considering \begin{aligned} \log f(x_i,y_i\mid\Sigma) = -\log2\pi &- \frac12\log\sigma_x^2 - \frac12\log\sigma_y^2 - \frac12\log(1-\rho^2)\\ &+ \frac{1}{2(1-\rho^2)} \left\{\frac{x_i^2}{\sigma_x^2} - \frac{2\rho x_iy_i}{\sigma_x\sigma_y} + \frac{y^2}{\sigma_y^2}\right\} \end{aligned} and \begin{aligned} Q_i(\Sigma,\tilde\Sigma) = &-\log2\pi- \frac12\log\sigma_x^2 - \frac12\log\sigma_y^2 - \frac12\log(1-\rho^2)\\ &+ \frac{1/2}{(1-\rho^2)} \left\{\frac{x_i^2}{\sigma_x^2} - \frac{2\rho x_i\tilde\rho\tilde\sigma_y x_i}{\sigma_x\tilde\sigma_x\sigma_y} + \frac{\tilde\sigma_y^2(1-\tilde\rho^2)+\tilde\rho^2\tilde\sigma_y^2 x_i^2/\tilde\sigma_x^2}{\sigma_y^2} \right\} \end{aligned} both expressions are essentially of identical shapes as functions of $\Sigma$. This means that the objective function to optimize writes as \begin{align}\sum_{i=1}^\nu,円 &\log f(x_i,\mathbb E_{y\sim f(y\mid x_i,\tilde\Sigma)}[y_i]\mid\Sigma)+\\ \sum_{i=1}^\nu,円 &\left\{\log f(0,\tilde\sigma_y(1-\tilde\rho^2)^{1/2})\mid\Sigma)+\log2\pi + \frac12\log[\sigma_x^2 \sigma_y^2(1-\rho^2)]\right\}+\\\sum_{i\nu+=1}^n,円 &\log f(x_i,\tilde y_i\mid\Sigma)\end{align} i.e. as a regular Normal log-likelihood for a modified Normal sample $\mathbf Z$ (depending on the current $\tilde\Sigma$) $$ -\frac n2 \log|\Sigma|-\frac12\text{trace}(\mathbf Z \Sigma^{-1}) $$ The estimator of $\Sigma$ can thus be derived as in the Normal case.

Question 6

To flesh out @xian's answer (and to make sure I understand it!):

For 1ドル\leq i\leq \nu$, let $$ \begin{aligned} \tilde y_i & = \mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)}[y],\\ \tilde z_i^2 & = \operatorname{Var}_{y\sim f(y\mid x_i,\tilde\Sigma)}[y]. \end{aligned} $$

We have: $$ \tilde z_i^2 = \mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)}[y^2] - \tilde y_i^2 $$

If 1ドル\leq i\leq \nu$, $$ \begin{aligned} Q_i(\Sigma,\tilde\Sigma) &= \mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)}[\log f(x_i,y\mid\Sigma)]\\ &= \mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)}\left[-\log2\pi - \frac12\log\Sigma - \frac{1}{2(1-\rho^2)} \left\{\frac{x_i^2}{\sigma_x^2} - \frac{2\rho x_iy}{\sigma_x\sigma_y} + \frac{y^2}{\sigma_y^2}\right\}\right]\\ &= -\log2\pi - \frac12\log\Sigma - \frac{1}{2(1-\rho^2)} \left\{\frac{x_i^2}{\sigma_x^2} - \frac{2\rho x_i\tilde y_i}{\sigma_x\sigma_y} + \frac{\mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)}[y^2]}{\sigma_y^2}\right\}\\ &= -\log2\pi - \frac12\log\Sigma - \frac{1}{2(1-\rho^2)} \left\{\frac{x_i^2}{\sigma_x^2} - \frac{2\rho x_i\tilde y_i}{\sigma_x\sigma_y} + \frac{\tilde y_i^2}{\sigma_y^2}\right\} - \frac{\tilde z_i^2}{2\sigma_y^2(1-\rho^2)}\\ &= \log f(x_i,\tilde y_i\mid\Sigma) - \frac{\tilde z_i^2}{2\sigma_y^2(1-\rho^2)}. \end{aligned} $$

The term $\log f(x_i,\tilde y_i\mid\Sigma)$ is intuitively appealing; we've imputed the unknown $y$-values with their conditional expectations.

Let's consider the correction term:

$$ - \frac{\tilde z_i}{2\sigma_y^2(1-\rho^2)} = \log f(0,\tilde z_i\mid\Sigma) + \frac12\log 2\pi + \frac12\log\det\Sigma. $$

Thus, if 1ドル\leq i\leq \nu$, $$ Q_i(\Sigma,\tilde\Sigma) = \log f(x_i,\tilde y_i\mid\Sigma) + \log f(0,\tilde z_i\mid\Sigma) + \frac12\log 2\pi + \frac12\log\det\Sigma. $$

The M-step involves maximizing the function $$ \begin{aligned} Q(\Sigma,\tilde\Sigma) &:= \sum_{i=1}^n Q_i(\Sigma,\tilde\Sigma) \end{aligned} $$

Being essentially sum of Gaussian log-likelihoods and "simple" correction terms, we can maximize $Q(\Sigma,\tilde\Sigma)$ using standard techniques. Unless I messed something up (likely), the resulting maximizer is

$$ \Sigma = \frac1n\begin{bmatrix} \sum_{i=1}^n x_i^2& \sum_{i=1}^\nu x_i \tilde y_i + \sum_{i=\nu+1}^n x_i y_i\\ \sum_{i=1}^\nu x_i \tilde y_i + \sum_{i=\nu+1}^n x_i y_i& \sum_{i=1}^\nu (\tilde y_i^2 + \tilde{z}_i^2) + \sum_{i=\nu+1}^n y_i^2 \end{bmatrix}. $$

Question 7

This sounds correct.

Xi'an 109k13 gold badges200 silver badges696 bronze badges · Accepted Answer · 2021-01-14 18:45:34Z

Hint: When considering \begin{aligned} \log f(x_i,y_i\mid\Sigma) = -\log2\pi &- \frac12\log\sigma_x^2 - \frac12\log\sigma_y^2 - \frac12\log(1-\rho^2)\\ &+ \frac{1}{2(1-\rho^2)} \left\{\frac{x_i^2}{\sigma_x^2} - \frac{2\rho x_iy_i}{\sigma_x\sigma_y} + \frac{y^2}{\sigma_y^2}\right\} \end{aligned} and \begin{aligned} Q_i(\Sigma,\tilde\Sigma) = &-\log2\pi- \frac12\log\sigma_x^2 - \frac12\log\sigma_y^2 - \frac12\log(1-\rho^2)\\ &+ \frac{1/2}{(1-\rho^2)} \left\{\frac{x_i^2}{\sigma_x^2} - \frac{2\rho x_i\tilde\rho\tilde\sigma_y x_i}{\sigma_x\tilde\sigma_x\sigma_y} + \frac{\tilde\sigma_y^2(1-\tilde\rho^2)+\tilde\rho^2\tilde\sigma_y^2 x_i^2/\tilde\sigma_x^2}{\sigma_y^2} \right\} \end{aligned} both expressions are essentially of identical shapes as functions of $\Sigma$. This means that the objective function to optimize writes as \begin{align}\sum_{i=1}^\nu,円 &\log f(x_i,\mathbb E_{y\sim f(y\mid x_i,\tilde\Sigma)}[y_i]\mid\Sigma)+\\ \sum_{i=1}^\nu,円 &\left\{\log f(0,\tilde\sigma_y(1-\tilde\rho^2)^{1/2})\mid\Sigma)+\log2\pi + \frac12\log[\sigma_x^2 \sigma_y^2(1-\rho^2)]\right\}+\\\sum_{i\nu+=1}^n,円 &\log f(x_i,\tilde y_i\mid\Sigma)\end{align} i.e. as a regular Normal log-likelihood for a modified Normal sample $\mathbf Z$ (depending on the current $\tilde\Sigma$) $$ -\frac n2 \log|\Sigma|-\frac12\text{trace}(\mathbf Z \Sigma^{-1}) $$ The estimator of $\Sigma$ can thus be derived as in the Normal case.

Stack Exchange Network

EM algorithm for MLE from a bivariate normal sample with missing data: Stuck on M-step

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

EM algorithm for MLE from a bivariate normal sample with missing data: Stuck on M-step

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions