4
$\begingroup$

I'm trying to understand applying the EM algorithm to compute the MLE in a missing data problem.

Specifically, suppose $(x_1,y_1),\ldots,(x_n,y_n)$ is a random sample from the bivariate normal distribution with mean $(0,0)$ and unknown covariance matrix $$ \Sigma=\begin{bmatrix} \sigma_x^2&\rho\sigma_x\sigma_y\\ \rho\sigma_x\sigma_y&\sigma_y^2 \end{bmatrix}. $$ I want to find the MLE of $\Sigma$ given that the first $\nu< n$ of the $y$-coordinates are missing.

For the E-step of the EM algorithm, I need to compute

$$ Q_i(\Sigma,\tilde\Sigma):= \begin{cases} \mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)}[\log f(x_i,y\mid\Sigma)]& \text{if }1\leq i\leq \nu,\\ \log f(x_i,y_i\mid\Sigma)&\text{if }\nu< i\leq n, \end{cases} $$ where $$ \begin{aligned} -\log f(x,y\mid\Sigma) = \log2\pi &+ \frac12\log\sigma_x^2 + \frac12\log\sigma_y^2 + \frac12\log(1-\rho^2)\\ &+ \frac{1}{2(1-\rho^2)} \left\{\frac{x^2}{\sigma_x^2} - \frac{2\rho xy}{\sigma_x\sigma_y} + \frac{y^2}{\sigma_y^2}\right\} \end{aligned} $$ and $$ \tilde\Sigma=\begin{bmatrix} \tilde\sigma_x^2&\tilde\rho\tilde\sigma_x\tilde\sigma_y\\ \tilde\rho\tilde\sigma_x\tilde\sigma_y&\tilde\sigma_y^2 \end{bmatrix}. $$

Since $$ \begin{aligned} \mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)} [y] &= \frac{\tilde\rho \tilde\sigma_yx_i}{\tilde\sigma_x},\\ \mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)}[y^2] &=\tilde\sigma_y^2(1-\tilde\rho^2) + \frac{\tilde\rho^2 \tilde\sigma^2_yx_i^2}{\tilde\sigma_x^2}, \end{aligned} $$

I get that

$$ \begin{aligned} -Q_i(\Sigma,\tilde\Sigma) = \log2\pi &+ \frac12\log\sigma_x^2 + \frac12\log\sigma_y^2 + \frac12\log(1-\rho^2)\\ &+ \frac{1}{2(1-\rho^2)} \left\{\frac{x_i^2}{\sigma_x^2} - \frac{2\rho\tilde\rho\tilde\sigma_y^2 x_i^2}{\sigma_x\tilde\sigma_x\sigma_y} + \frac{\tilde\sigma_x^2\tilde\sigma_y^2(1-\tilde\rho^2)+ \tilde\rho^2 \tilde\sigma^2_yx_i^2}{\tilde\sigma_x^2\sigma_y^2} \right\}. \end{aligned} $$

for 1ドル\leq i\leq \nu$.

Now, for the M-step, I need to compute $$ \operatorname*{argmax}_{\Sigma} \sum_{i=1}^n Q_i(\Sigma,\tilde\Sigma). $$

And here I'm stuck. Is there a nice form for the optimal $\Sigma$?

asked Jan 14, 2021 at 17:42
$\endgroup$
3
  • $\begingroup$ Added missing 0.5*log(1-rho^2) term. Thanks! $\endgroup$ Commented Jan 14, 2021 at 18:13
  • $\begingroup$ Added the $-$ sign as suggested. Thanks again, Xi'an. $\endgroup$ Commented Jan 14, 2021 at 18:39
  • 1
    $\begingroup$ Fixed that, too. Also and a commentary on your hint. I included it as an answer (even though your answer is the answer, @xian) since it was too long to fit in a comment. $\endgroup$ Commented Jan 14, 2021 at 22:15

2 Answers 2

6
$\begingroup$

Hint: When considering \begin{aligned} \log f(x_i,y_i\mid\Sigma) = -\log2\pi &- \frac12\log\sigma_x^2 - \frac12\log\sigma_y^2 - \frac12\log(1-\rho^2)\\ &+ \frac{1}{2(1-\rho^2)} \left\{\frac{x_i^2}{\sigma_x^2} - \frac{2\rho x_iy_i}{\sigma_x\sigma_y} + \frac{y^2}{\sigma_y^2}\right\} \end{aligned} and \begin{aligned} Q_i(\Sigma,\tilde\Sigma) = &-\log2\pi- \frac12\log\sigma_x^2 - \frac12\log\sigma_y^2 - \frac12\log(1-\rho^2)\\ &+ \frac{1/2}{(1-\rho^2)} \left\{\frac{x_i^2}{\sigma_x^2} - \frac{2\rho x_i\tilde\rho\tilde\sigma_y x_i}{\sigma_x\tilde\sigma_x\sigma_y} + \frac{\tilde\sigma_y^2(1-\tilde\rho^2)+\tilde\rho^2\tilde\sigma_y^2 x_i^2/\tilde\sigma_x^2}{\sigma_y^2} \right\} \end{aligned} both expressions are essentially of identical shapes as functions of $\Sigma$. This means that the objective function to optimize writes as \begin{align}\sum_{i=1}^\nu,円 &\log f(x_i,\mathbb E_{y\sim f(y\mid x_i,\tilde\Sigma)}[y_i]\mid\Sigma)+\\ \sum_{i=1}^\nu,円 &\left\{\log f(0,\tilde\sigma_y(1-\tilde\rho^2)^{1/2})\mid\Sigma)+\log2\pi + \frac12\log[\sigma_x^2 \sigma_y^2(1-\rho^2)]\right\}+\\\sum_{i\nu+=1}^n,円 &\log f(x_i,\tilde y_i\mid\Sigma)\end{align} i.e. as a regular Normal log-likelihood for a modified Normal sample $\mathbf Z$ (depending on the current $\tilde\Sigma$) $$ -\frac n2 \log|\Sigma|-\frac12\text{trace}(\mathbf Z \Sigma^{-1}) $$ The estimator of $\Sigma$ can thus be derived as in the Normal case.

answered Jan 14, 2021 at 18:45
$\endgroup$
1
$\begingroup$

To flesh out @xian's answer (and to make sure I understand it!):

For 1ドル\leq i\leq \nu$, let $$ \begin{aligned} \tilde y_i & = \mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)}[y],\\ \tilde z_i^2 & = \operatorname{Var}_{y\sim f(y\mid x_i,\tilde\Sigma)}[y]. \end{aligned} $$

We have: $$ \tilde z_i^2 = \mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)}[y^2] - \tilde y_i^2 $$

If 1ドル\leq i\leq \nu$, $$ \begin{aligned} Q_i(\Sigma,\tilde\Sigma) &= \mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)}[\log f(x_i,y\mid\Sigma)]\\ &= \mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)}\left[-\log2\pi - \frac12\log\Sigma - \frac{1}{2(1-\rho^2)} \left\{\frac{x_i^2}{\sigma_x^2} - \frac{2\rho x_iy}{\sigma_x\sigma_y} + \frac{y^2}{\sigma_y^2}\right\}\right]\\ &= -\log2\pi - \frac12\log\Sigma - \frac{1}{2(1-\rho^2)} \left\{\frac{x_i^2}{\sigma_x^2} - \frac{2\rho x_i\tilde y_i}{\sigma_x\sigma_y} + \frac{\mathbb{E}_{y\sim f(y\mid x_i,\tilde\Sigma)}[y^2]}{\sigma_y^2}\right\}\\ &= -\log2\pi - \frac12\log\Sigma - \frac{1}{2(1-\rho^2)} \left\{\frac{x_i^2}{\sigma_x^2} - \frac{2\rho x_i\tilde y_i}{\sigma_x\sigma_y} + \frac{\tilde y_i^2}{\sigma_y^2}\right\} - \frac{\tilde z_i^2}{2\sigma_y^2(1-\rho^2)}\\ &= \log f(x_i,\tilde y_i\mid\Sigma) - \frac{\tilde z_i^2}{2\sigma_y^2(1-\rho^2)}. \end{aligned} $$

The term $\log f(x_i,\tilde y_i\mid\Sigma)$ is intuitively appealing; we've imputed the unknown $y$-values with their conditional expectations.

Let's consider the correction term:

$$ - \frac{\tilde z_i}{2\sigma_y^2(1-\rho^2)} = \log f(0,\tilde z_i\mid\Sigma) + \frac12\log 2\pi + \frac12\log\det\Sigma. $$

Thus, if 1ドル\leq i\leq \nu$, $$ Q_i(\Sigma,\tilde\Sigma) = \log f(x_i,\tilde y_i\mid\Sigma) + \log f(0,\tilde z_i\mid\Sigma) + \frac12\log 2\pi + \frac12\log\det\Sigma. $$

The M-step involves maximizing the function $$ \begin{aligned} Q(\Sigma,\tilde\Sigma) &:= \sum_{i=1}^n Q_i(\Sigma,\tilde\Sigma) \end{aligned} $$

Being essentially sum of Gaussian log-likelihoods and "simple" correction terms, we can maximize $Q(\Sigma,\tilde\Sigma)$ using standard techniques. Unless I messed something up (likely), the resulting maximizer is

$$ \Sigma = \frac1n\begin{bmatrix} \sum_{i=1}^n x_i^2& \sum_{i=1}^\nu x_i \tilde y_i + \sum_{i=\nu+1}^n x_i y_i\\ \sum_{i=1}^\nu x_i \tilde y_i + \sum_{i=\nu+1}^n x_i y_i& \sum_{i=1}^\nu (\tilde y_i^2 + \tilde{z}_i^2) + \sum_{i=\nu+1}^n y_i^2 \end{bmatrix}. $$

answered Jan 14, 2021 at 22:08
$\endgroup$
1
  • $\begingroup$ This sounds correct. $\endgroup$ Commented Jan 15, 2021 at 9:54

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.