Normal-Wishart distribution
| Normal-Wishart | |||
|---|---|---|---|
| Notation | {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu )} | ||
| Parameters |
{\displaystyle {\boldsymbol {\mu }}_{0}\in \mathbb {R} ^{D},円} location (vector of real) {\displaystyle \lambda >0,円} (real) {\displaystyle \mathbf {W} \in \mathbb {R} ^{D\times D}} scale matrix (pos. def.) {\displaystyle \nu >D-1,円} (real) | ||
| Support | {\displaystyle {\boldsymbol {\mu }}\in \mathbb {R} ^{D};{\boldsymbol {\Lambda }}\in \mathbb {R} ^{D\times D}} covariance matrix (pos. def.) | ||
| {\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Lambda }}|{\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu )={\mathcal {N}}({\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},(\lambda {\boldsymbol {\Lambda }})^{-1})\ {\mathcal {W}}({\boldsymbol {\Lambda }}|\mathbf {W} ,\nu )} | |||
In probability theory and statistics, the normal-Wishart distribution (or Gaussian-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and precision matrix (the inverse of the covariance matrix).[1]
Definition
[edit ]Suppose
- {\displaystyle {\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Lambda }}\sim {\mathcal {N}}({\boldsymbol {\mu }}_{0},(\lambda {\boldsymbol {\Lambda }})^{-1})}
has a multivariate normal distribution with mean {\displaystyle {\boldsymbol {\mu }}_{0}} and covariance matrix {\displaystyle (\lambda {\boldsymbol {\Lambda }})^{-1}}, where
- {\displaystyle {\boldsymbol {\Lambda }}|\mathbf {W} ,\nu \sim {\mathcal {W}}({\boldsymbol {\Lambda }}|\mathbf {W} ,\nu )}
has a Wishart distribution. Then {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})} has a normal-Wishart distribution, denoted as
- {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu ).}
Characterization
[edit ]Probability density function
[edit ]- {\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Lambda }}|{\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu )={\mathcal {N}}({\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},(\lambda {\boldsymbol {\Lambda }})^{-1})\ {\mathcal {W}}({\boldsymbol {\Lambda }}|\mathbf {W} ,\nu )}
Properties
[edit ]Scaling
[edit ]Marginal distributions
[edit ]By construction, the marginal distribution over {\displaystyle {\boldsymbol {\Lambda }}} is a Wishart distribution, and the conditional distribution over {\displaystyle {\boldsymbol {\mu }}} given {\displaystyle {\boldsymbol {\Lambda }}} is a multivariate normal distribution. The marginal distribution over {\displaystyle {\boldsymbol {\mu }}} is a multivariate t-distribution.
Posterior distribution of the parameters
[edit ]After making {\displaystyle n} observations {\displaystyle {\boldsymbol {x}}_{1},\dots ,{\boldsymbol {x}}_{n}}, the posterior distribution of the parameters is
- {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{n},\lambda _{n},\mathbf {W} _{n},\nu _{n}),}
where
- {\displaystyle \lambda _{n}=\lambda +n,}
- {\displaystyle {\boldsymbol {\mu }}_{n}={\frac {\lambda {\boldsymbol {\mu }}_{0}+n{\boldsymbol {\bar {x}}}}{\lambda +n}},}
- {\displaystyle \nu _{n}=\nu +n,}
- {\displaystyle \mathbf {W} _{n}^{-1}=\mathbf {W} ^{-1}+\sum _{i=1}^{n}({\boldsymbol {x}}_{i}-{\boldsymbol {\bar {x}}})({\boldsymbol {x}}_{i}-{\boldsymbol {\bar {x}}})^{T}+{\frac {n\lambda }{n+\lambda }}({\boldsymbol {\bar {x}}}-{\boldsymbol {\mu }}_{0})({\boldsymbol {\bar {x}}}-{\boldsymbol {\mu }}_{0})^{T}.}[2]
Generating normal-Wishart random variates
[edit ]Generation of random variates is straightforward:
- Sample {\displaystyle {\boldsymbol {\Lambda }}} from a Wishart distribution with parameters {\displaystyle \mathbf {W} } and {\displaystyle \nu }
- Sample {\displaystyle {\boldsymbol {\mu }}} from a multivariate normal distribution with mean {\displaystyle {\boldsymbol {\mu }}_{0}} and variance {\displaystyle (\lambda {\boldsymbol {\Lambda }})^{-1}}
Related distributions
[edit ]- The normal-inverse Wishart distribution is essentially the same distribution parameterized by variance rather than precision.
- The normal-gamma distribution is the one-dimensional equivalent.
- The multivariate normal distribution and Wishart distribution are the component distributions out of which this distribution is made.
Notes
[edit ]- ^ Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media. Page 690.
- ^ Cross Validated, https://stats.stackexchange.com/q/324925
References
[edit ]- Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.