10
$\begingroup$

Christopher Bishop defines the expected value of the complete-data log likelihood function (i.e. assuming that we are given both the observable data X as well as the latent data Z) as follows:

$$ \mathbb{E}_\textbf{Z}[\ln p(\textbf{X},\textbf{Z} \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}, \boldsymbol{\pi})] = \sum_{n=1}^N \sum_{k=1}^K \gamma(z_{nk})\{\ln \pi_k + \ln \mathcal{N}(\textbf{x}_n \mid \ \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k)\} \tag 1 $$

where $\gamma(z_{nk})$ is defined as:

$$ \frac{\pi_k \mathcal{N}(\textbf{x}_n \mid \ \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k)}{\sum_{j=1}^K \pi_j \mathcal{N}(\textbf{x}_n \mid \ \boldsymbol{\mu}_j, \boldsymbol{\Sigma}_j)} \tag 2 $$

The idea, as described, is to consider a Gaussian Mixture Model in which the covariance matrices of the mixture components are given by $\epsilon \textbf{I}$, where $\epsilon$ is a variance parameter that is shared by all of the components, such that:

$$ p(\textbf x \mid \boldsymbol \mu_k, \boldsymbol \Sigma_k) = \frac{1}{(2 \pi \epsilon)^\frac{M}{2}} \exp\big\{{-\frac{1}{2 \epsilon} \|\textbf x - \boldsymbol \mu_k\|^2}\big\} \tag 3 $$

and so, $\gamma(z_{nk})$ is now defined as:

$$ \frac{\pi_k \exp\{ - \| \textbf x_n - \boldsymbol \mu_k\|^2 / 2 \epsilon\}}{\sum_{j=1}^K \pi_j \exp\{ - \| \textbf x_n - \boldsymbol \mu_j\|^2 / 2 \epsilon\}} \tag 4 $$

The argument now is the following:

if we consider the limit $\epsilon \to 0$, we see that in the denominator the term for which $\| \textbf x_n - \boldsymbol \mu_j\|^2$ is smallest, will go to zero most slowly, and hence the responsibilities $\gamma(z_{nk})$ for the data point $\textbf x_n$ all go to zero except for term j, for which the responsibility $\gamma(z_{nk})$ will go to unity. Thus, in this limit, we obtain a hard assignment of data points to clusters, just as in the $K$-means algorithm, so that $\gamma(z_{nk}) \to r_{nk}$

where $r_{nk}$ is defined as:

\begin{equation*} f(n) = \begin{cases} 1 & \text{if } k = \text{arg } \text{min}_j \|\textbf x_n - \boldsymbol \mu_j\|^2\\ 0 & \text{otherwise}\\ \tag 5 \end{cases} \end{equation*}

My question is how does the above argument hold? Namely, what does it mean for a term to go to zero $\textbf{most slowly}$ ? And how does taking the limit $\epsilon \to 0$ in eqn 4ドル$ result in a binary responsibility?

Xi'an
109k13 gold badges200 silver badges696 bronze badges
asked Jan 11, 2015 at 12:40
$\endgroup$
3
  • 1
    $\begingroup$ When $\epsilon$ goes to zero, $\exp\{ - \| \textbf x_n - \boldsymbol \mu_k\|^2 / 2 \epsilon\}=\exp\{-\delta_n/\epsilon\}$ goes to zero for all $n$'s but at different speeds depending on $\delta_n,ドル the smallest $\delta_n$ then gather the whole weight in the limit. $\endgroup$ Commented Jan 11, 2015 at 12:53
  • 1
    $\begingroup$ (further explanation) If you take $\delta^*$ as the smallest $\delta_n,ドル you can rewrite all terms as $\exp\{(\delta^*-\delta_n)/\epsilon\},ドル which means all terms go to zero with $\epsilon$ except one, the one for which $\delta^*-\delta_n=0$. $\endgroup$ Commented Jan 11, 2015 at 12:55
  • $\begingroup$ @Xi'an Do you care to provide more elaboration? What do you mean "the smallest $\delta_n$ then gather the whole weight in the limit"? And how does the term for which $\delta^* - \delta_n$ = 0 evaluate to unity? I mean, the numerator is 0, right? $\endgroup$ Commented Jan 11, 2015 at 13:40

1 Answer 1

10
$\begingroup$

Let us write $$\|\textbf x_n - \boldsymbol \mu_k\|^2=\delta_k,円.$$ Then $$\frac{\pi_k \exp\{ - \| \textbf x_n - \boldsymbol \mu_k\|^2 / 2 \epsilon\}}{\sum_{j=1}^K \pi_j \exp\{ - \| \textbf x_n - \boldsymbol \mu_j\|^2 / 2 \epsilon\}}=\frac{\pi_k \exp\{ - \delta_k/ 2 \epsilon\}}{\sum_{j=1}^K \pi_j \exp\{ - \delta_j/ 2 \epsilon\}}$$ If we take$$\delta^*=\min_n\delta_n,,円$$ we have \begin{align*} \frac{\pi_k \exp\{ - \delta_k/ 2 \epsilon\}}{\sum_{j=1}^K \pi_j \exp\{ - \delta_j/ 2 \epsilon\}}&=\frac{\pi_k \exp\{(\delta^*- \delta_k)/ 2 \epsilon\}}{\sum_{j=1}^K \pi_j \exp\{(\delta^* - \delta_j)/ 2 \epsilon\}} \end{align*} where $\delta^*-\delta_k<0$ except for $k=k^*$ where $\delta^*-\delta_{k^*}=0$. So, for all $k\ne k^*,ドル $$\lim_{\epsilon\to 0} \frac{\pi_k \exp\{(\delta^*- \delta_k)/ 2 \epsilon\}}{\sum_{j=1}^K \pi_j \exp\{(\delta^* - \delta_j)/ 2 \epsilon\}}=\lim_{\epsilon\to 0} \frac{\pi_k \exp\{(\delta^*- \delta_k)/ 2 \epsilon\}}{\pi_{k^*}+\sum_{j\ne k^*} \pi_j \exp\{(\delta^* - \delta_j)/ 2 \epsilon\}}=0$$ since, for $a>0,ドル $$\lim_{\epsilon\to 0}\exp\{-a/\epsilon \}=0$$ while $$\lim_{\epsilon\to 0} \frac{\pi_{k^*} \exp\{(\delta^*- \delta_{k^*})/ 2 \epsilon\}}{\sum_{j=1}^K \pi_j \exp\{(\delta^* - \delta_j)/ 2 \epsilon\}}=\lim_{\epsilon\to 0} \frac{\pi_{k^*} \times 1}{\pi_{k^*}+\sum_{j\ne k^*} \pi_j \exp\{(\delta^* - \delta_j)/ 2 \epsilon\}}=1$$

answered Jan 11, 2015 at 19:15
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.