Distribution of Mutual Information
Author: Marcus Hutter (2001)Comments: 8 LaTeX pagesSubj-class: Artificial IntelligenceACM-class:
I.2Reference: Advances in Neural Information Processing Systems, 14 (NIPS-2001) 399-406Report-no: IDSIA-13-01 and cs.AI/0112019Slides: PostScript - PDF
Keywords: Mutual Information, Cross Entropy, Dirichlet distribution, Second order distribution, expectation and variance of mutual information.
Abstract: The mutual information of two random variables i and j with joint probabilities tij is commonly used in learning Bayesian nets as well as in many other fields. The chances tij are usually estimated by the empirical sampling frequency nij/n leading to a point estimate I(nij/n) for the mutual information. To answer questions like "is I(nij/n) consistent with zero?" or "what is the probability that the true mutual information is much larger than the point estimate?" one has to go beyond the point estimate. In the Bayesian framework one can answer these questions by utilizing a (second order) prior distribution p(t) comprising prior information about t. From the prior p(t) one can compute the posterior p(t|n), from which the distribution p(I|n) of the mutual information can be calculated. We derive reliable and quickly computable approximations for p(I|n). We concentrate on the mean, variance, skewness, and kurtosis, and non-informative priors. For the mean we also give an exact expression. Numerical issues and the range of validity are discussed.
Table of Contents
- Introduction
- Mutual Information Distribution
- Results for I under the Dirichlet P(oste)rior
- Approximation of Expectation and Variance of I
- The Second Order Dirichlet Distribution
- Exact Value for E[I]
- Generalizations
- Numerics
@InProceedings{Hutter:01xentropy,
author = "Marcus Hutter",
title = "Distribution of Mutual Information",
_number = "IDSIA-13-01",
booktitle = "Advances in Neural Information Processing Systems 14",
editor = "T. G. Dietterich and S. Becker and Z. Ghahramani",
publisher = "MIT Press",
address = "Cambridge, MA",
pages = "399--406",
year = "2002",
url = "http://www.hutter1.net/ai/xentropy.htm",
url2 = "http://arxiv.org/abs/cs.AI/0112019",
ftp = "ftp://ftp.idsia.ch/pub/techrep/IDSIA-13-01.ps.gz",
categories = "I.2. [Artificial Intelligence]",
keywords = "Mutual Information, Cross Entropy, Dirichlet distribution, Second
order distribution, expectation and variance of mutual
information.",
abstract = "The mutual information of two random variables i and j with joint
probabilities t_ij is commonly used in learning Bayesian nets as
well as in many other fields. The chances t_ij are usually
estimated by the empirical sampling frequency n_ij/n leading to a
point estimate I(n_ij/n) for the mutual information. To answer
questions like ``is I(n_ij/n) consistent with zero?'' or ``what is
the probability that the true mutual information is much larger
than the point estimate?'' one has to go beyond the point estimate.
In the Bayesian framework one can answer these questions by
utilizing a (second order) prior distribution p(t) comprising
prior information about t. From the prior p(t) one can compute the
posterior p(t|n), from which the distribution p(I|n) of the mutual
information can be calculated. We derive reliable and quickly
computable approximations for p(I|n). We concentrate on the mean,
variance, skewness, and kurtosis, and non-informative priors. For
the mean we also give an exact expression. Numerical issues and
the range of validity are discussed.",
}