next up previous contents index
Next: The Bernoulli model Up: Naive Bayes text classification Previous: Naive Bayes text classification Contents Index


Relation to multinomial unigram language model

The multinomial NB model is formally identical to the multinomial unigram language model (Section 12.2.1 , page 12.2.1 ). In particular, Equation 113 is a special case of Equation 104 from page 12.2.1 , which we repeat here for $\lambda=1$:

The document $d$ in text classification (Equation 113) takes the role of the query in language modeling (Equation 120) and the classes $c$ in text classification take the role of the documents $d$ in language modeling. We used Equation 120 to rank documents according to the probability that they are relevant to the query $q$. In NB classification, we are usually only interested in the top-ranked class.

We also used MLE estimates in Section 12.2.2 (page [*]) and encountered the problem of zero estimates owing to sparse data (page 12.2.2 ); but instead of add-one smoothing, we used a mixture of two distributions to address the problem there. Add-one smoothing is closely related to add-$\frac{1}{2}$ smoothing in Section 11.3.4 (page [*]).

Exercises.


next up previous contents index
Next: The Bernoulli model Up: Naive Bayes text classification Previous: Naive Bayes text classification Contents Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009年04月07日

AltStyle によって変換されたページ (->オリジナル) /