Why marginal likelihood is optimized in expectation maximization?

Asked 5 years, 7 months ago

Viewed 706 times

$\begingroup$

Suppose we would like maximize a likelihood function $p(\mathbf x, \mathbf z| \theta)$, where $\mathbf x$ is observed, $\mathbf z$ is a latent variable, and $\theta$ is the collection of model parameters. We would like to use expectation maximization for this.

If I understand it correctly, we optimize the marginal likelihood $p(\mathbf x|\theta)$ as $\mathbf z$ is unobserved. However, this is counterintuitive to me.

If $\mathbf z$ is unobserved, I think of it as another model parameter. Therefore, for maximum likelihood estimation, we should find $\mathbf z, \theta$ such that $p(\mathbf x|\mathbf z, \theta)$ is maximized.

So, my question is why is it standard to optimize $p(\mathbf x|\theta)$ instead of $p(\mathbf x|\mathbf z, \theta)$?

I have searched through several explanations of EM, but could not find answer to this question.

expectation-maximization

Improve this question

asked Apr 20, 2020 at 19:28

zigs211567's user avatar

zigs211567

1349 bronze badges

$\endgroup$

Add a comment |

1 Answer 1

Sorted by: Reset to default

$\begingroup$

If you don't know $z$ you cannot condition on $z$ by $p(x|z,\theta)$, but we can "hallucinate" it for the lower bound function using the parameter we get in the previous step.

So, my question is why is it standard to optimize p(x|θ) instead of p(x|z,θ)?

Because of the missing data problem. $z$ is not observed and missing in our training data.

Ultimately we are optimizing $p(x|\theta)$ but it can lead to multiple local maxima and no closed-form solution then we can make it a sequence of subproblems that can be optimized in each step and guaranteed to converge to a local optimum(may be global optimum) by introducing $q$.

References:
1. What is the expectation maximization algorithm?

Improve this answer

edited May 10, 2020 at 4:56

answered May 7, 2020 at 17:07

Lerner Zhang's user avatar

Lerner Zhang

7,2222 gold badges48 silver badges89 bronze badges

$\endgroup$

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

expectation-maximization

See similar questions with these tags.

Stack Exchange Network

Why marginal likelihood is optimized in expectation maximization?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Why marginal likelihood is optimized in expectation maximization?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions