Negative multinomial distribution

Probability distribution

Notation	${\textrm {NM}}(x_{0},,円\mathbf {p} )$ {\displaystyle {\textrm {NM}}(x_{0},,円\mathbf {p} )}
Parameters	$x_{0}>0$ {\displaystyle x_{0}>0} — the number of failures before the experiment is stopped, $\mathbf {p}$ {\displaystyle \mathbf {p} } ∈ R^m — m-vector of "success" probabilities, p₀ = 1 − (p₁+...+p_m) — the probability of a "failure".
Support	$x_{i}\in \{0,1,2,\ldots \},1\leq i\leq m$ {\displaystyle x_{i}\in \{0,1,2,\ldots \},1\leq i\leq m}
PMF	$\Gamma \!\left(\sum _{i=0}^{m}{x_{i}}\right){\frac {p_{0}^{x_{0}}}{\Gamma (x_{0})}}\prod _{i=1}^{m}{\frac {p_{i}^{x_{i}}}{x_{i}!}},$ {\displaystyle \Gamma \!\left(\sum _{i=0}^{m}{x_{i}}\right){\frac {p_{0}^{x_{0}}}{\Gamma (x_{0})}}\prod _{i=1}^{m}{\frac {p_{i}^{x_{i}}}{x_{i}!}},} where Γ(x) is the Gamma function.
Mean	${\tfrac {x_{0}}{p_{0}}},円\mathbf {p}$ {\displaystyle {\tfrac {x_{0}}{p_{0}}},円\mathbf {p} }
Variance	${\tfrac {x_{0}}{p_{0}^{2}}},円\mathbf {pp} '+{\tfrac {x_{0}}{p_{0}}},円\operatorname {diag} (\mathbf {p} )$ {\displaystyle {\tfrac {x_{0}}{p_{0}^{2}}},円\mathbf {pp} '+{\tfrac {x_{0}}{p_{0}}},円\operatorname {diag} (\mathbf {p} )}
MGF	${\bigg (}{\frac {p_{0}}{1-\sum _{j=1}^{m}p_{j}e^{t_{j}}}}{\bigg )}^{\!x_{0}}$ {\displaystyle {\bigg (}{\frac {p_{0}}{1-\sum _{j=1}^{m}p_{j}e^{t_{j}}}}{\bigg )}^{\!x_{0}}}
CF	${\bigg (}{\frac {p_{0}}{1-\sum _{j=1}^{m}p_{j}e^{it_{j}}}}{\bigg )}^{\!x_{0}}$ {\displaystyle {\bigg (}{\frac {p_{0}}{1-\sum _{j=1}^{m}p_{j}e^{it_{j}}}}{\bigg )}^{\!x_{0}}}

In probability theory and statistics, the negative multinomial distribution is a generalization of the negative binomial distribution (NB(x₀, p)) to more than two outcomes.^[1]

As with the univariate negative binomial distribution, if the parameter $x_{0}$ {\displaystyle x_{0}} is a positive integer, the negative multinomial distribution has an urn model interpretation. Suppose we have an experiment that generates m+1≥2 possible outcomes, {X₀,...,X_m}, each occurring with non-negative probabilities {p₀,...,p_m} respectively. If sampling proceeded until n observations were made, then {X₀,...,X_m} would have been multinomially distributed. However, if the experiment is stopped once X₀ reaches the predetermined value x₀ (assuming x₀ is a positive integer), then the distribution of the m-tuple {X₁,...,X_m} is negative multinomial. These variables are not multinomially distributed because their sum X₁+...+X_m is not fixed, being a draw from a negative binomial distribution.

Properties

[edit ]

Marginal distributions

[edit ]

If m-dimensional x is partitioned as follows $\mathbf {X} ={\begin{bmatrix}\mathbf {X} ^{(1)}\\\mathbf {X} ^{(2)}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}n\times 1\\(m-n)\times 1\end{bmatrix}}$ {\displaystyle \mathbf {X} ={\begin{bmatrix}\mathbf {X} ^{(1)}\\\mathbf {X} ^{(2)}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}n\times 1\\(m-n)\times 1\end{bmatrix}}} and accordingly ${\boldsymbol {p}}$ {\displaystyle {\boldsymbol {p}}} ${\boldsymbol {p}}={\begin{bmatrix}{\boldsymbol {p}}^{(1)}\\{\boldsymbol {p}}^{(2)}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}n\times 1\\(m-n)\times 1\end{bmatrix}}$ {\displaystyle {\boldsymbol {p}}={\begin{bmatrix}{\boldsymbol {p}}^{(1)}\\{\boldsymbol {p}}^{(2)}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}n\times 1\\(m-n)\times 1\end{bmatrix}}} and let $q=1-\sum _{i}p_{i}^{(2)}=p_{0}+\sum _{i}p_{i}^{(1)}$ {\displaystyle q=1-\sum _{i}p_{i}^{(2)}=p_{0}+\sum _{i}p_{i}^{(1)}}

The marginal distribution of ${\boldsymbol {X}}^{(1)}$ {\displaystyle {\boldsymbol {X}}^{(1)}} is $\mathrm {NM} (x_{0},p_{0}/q,{\boldsymbol {p}}^{(1)}/q)$ {\displaystyle \mathrm {NM} (x_{0},p_{0}/q,{\boldsymbol {p}}^{(1)}/q)}. That is the marginal distribution is also negative multinomial with the ${\boldsymbol {p}}^{(2)}$ {\displaystyle {\boldsymbol {p}}^{(2)}} removed and the remaining p's properly scaled so as to add to one.

The univariate marginal $m=1$ {\displaystyle m=1} is said to have a negative binomial distribution.

Conditional distributions

[edit ]

The conditional distribution of $\mathbf {X} ^{(1)}$ {\displaystyle \mathbf {X} ^{(1)}} given $\mathbf {X} ^{(2)}=\mathbf {x} ^{(2)}$ {\displaystyle \mathbf {X} ^{(2)}=\mathbf {x} ^{(2)}} is ${\textstyle \mathrm {NM} (x_{0}+\sum {x_{i}^{(2)}},\mathbf {p} ^{(1)})}$ {\textstyle \mathrm {NM} (x_{0}+\sum {x_{i}^{(2)}},\mathbf {p} ^{(1)})}. That is, $\Pr(\mathbf {x} ^{(1)}\mid \mathbf {x} ^{(2)},x_{0},\mathbf {p} )=\Gamma \!\left(\sum _{i=0}^{m}{x_{i}}\right){\frac {(1-\sum _{i=1}^{n}{p_{i}^{(1)}})^{x_{0}+\sum _{i=1}^{m-n}x_{i}^{(2)}}}{\Gamma (x_{0}+\sum _{i=1}^{m-n}x_{i}^{(2)})}}\prod _{i=1}^{n}{\frac {(p_{i}^{(1)})^{x_{i}}}{(x_{i}^{(1)})!}}.$ {\displaystyle \Pr(\mathbf {x} ^{(1)}\mid \mathbf {x} ^{(2)},x_{0},\mathbf {p} )=\Gamma \!\left(\sum _{i=0}^{m}{x_{i}}\right){\frac {(1-\sum _{i=1}^{n}{p_{i}^{(1)}})^{x_{0}+\sum _{i=1}^{m-n}x_{i}^{(2)}}}{\Gamma (x_{0}+\sum _{i=1}^{m-n}x_{i}^{(2)})}}\prod _{i=1}^{n}{\frac {(p_{i}^{(1)})^{x_{i}}}{(x_{i}^{(1)})!}}.}

Independent sums

[edit ]

If $\mathbf {X} _{1}\sim \mathrm {NM} (r_{1},\mathbf {p} )$ {\displaystyle \mathbf {X} _{1}\sim \mathrm {NM} (r_{1},\mathbf {p} )} and If $\mathbf {X} _{2}\sim \mathrm {NM} (r_{2},\mathbf {p} )$ {\displaystyle \mathbf {X} _{2}\sim \mathrm {NM} (r_{2},\mathbf {p} )} are independent, then $\mathbf {X} _{1}+\mathbf {X} _{2}\sim \mathrm {NM} (r_{1}+r_{2},\mathbf {p} )$ {\displaystyle \mathbf {X} _{1}+\mathbf {X} _{2}\sim \mathrm {NM} (r_{1}+r_{2},\mathbf {p} )}. Similarly and conversely, it is easy to see from the characteristic function that the negative multinomial is infinitely divisible.

Aggregation

[edit ]

If $\mathbf {X} =(X_{1},\ldots ,X_{m})\sim \operatorname {NM} (x_{0},(p_{1},\ldots ,p_{m}))$ {\displaystyle \mathbf {X} =(X_{1},\ldots ,X_{m})\sim \operatorname {NM} (x_{0},(p_{1},\ldots ,p_{m}))} then, if the random variables with subscripts i and j are dropped from the vector and replaced by their sum, $\mathbf {X} '=(X_{1},\ldots ,X_{i}+X_{j},\ldots ,X_{m})\sim \operatorname {NM} (x_{0},(p_{1},\ldots ,p_{i}+p_{j},\ldots ,p_{m})).$ {\displaystyle \mathbf {X} '=(X_{1},\ldots ,X_{i}+X_{j},\ldots ,X_{m})\sim \operatorname {NM} (x_{0},(p_{1},\ldots ,p_{i}+p_{j},\ldots ,p_{m})).}

This aggregation property may be used to derive the marginal distribution of $X_{i}$ {\displaystyle X_{i}} mentioned above.

Correlation matrix

[edit ]

The entries of the correlation matrix are $\rho (X_{i},X_{i})=1.$ {\displaystyle \rho (X_{i},X_{i})=1.} $\rho (X_{i},X_{j})={\frac {\operatorname {cov} (X_{i},X_{j})}{\sqrt {\operatorname {var} (X_{i})\operatorname {var} (X_{j})}}}={\sqrt {\frac {p_{i}p_{j}}{(p_{0}+p_{i})(p_{0}+p_{j})}}}.$ {\displaystyle \rho (X_{i},X_{j})={\frac {\operatorname {cov} (X_{i},X_{j})}{\sqrt {\operatorname {var} (X_{i})\operatorname {var} (X_{j})}}}={\sqrt {\frac {p_{i}p_{j}}{(p_{0}+p_{i})(p_{0}+p_{j})}}}.}

Parameter estimation

[edit ]

Method of Moments

[edit ]

If we let the mean vector of the negative multinomial be ${\boldsymbol {\mu }}={\frac {x_{0}}{p_{0}}}\mathbf {p}$ {\displaystyle {\boldsymbol {\mu }}={\frac {x_{0}}{p_{0}}}\mathbf {p} } and covariance matrix ${\boldsymbol {\Sigma }}={\tfrac {x_{0}}{p_{0}^{2}}},円\mathbf {p} \mathbf {p} '+{\tfrac {x_{0}}{p_{0}}},円\operatorname {diag} (\mathbf {p} ),$ {\displaystyle {\boldsymbol {\Sigma }}={\tfrac {x_{0}}{p_{0}^{2}}},円\mathbf {p} \mathbf {p} '+{\tfrac {x_{0}}{p_{0}}},円\operatorname {diag} (\mathbf {p} ),} then it is easy to show through properties of determinants that ${\textstyle |{\boldsymbol {\Sigma }}|={\frac {1}{p_{0}}}\prod _{i=1}^{m}{\mu _{i}}}$ {\textstyle |{\boldsymbol {\Sigma }}|={\frac {1}{p_{0}}}\prod _{i=1}^{m}{\mu _{i}}}. From this, it can be shown that $x_{0}={\frac {\sum {\mu _{i}}\prod {\mu _{i}}}{|{\boldsymbol {\Sigma }}|-\prod {\mu _{i}}}}$ {\displaystyle x_{0}={\frac {\sum {\mu _{i}}\prod {\mu _{i}}}{|{\boldsymbol {\Sigma }}|-\prod {\mu _{i}}}}} and $\mathbf {p} ={\frac {|{\boldsymbol {\Sigma }}|-\prod {\mu _{i}}}{|{\boldsymbol {\Sigma }}|\sum {\mu _{i}}}}{\boldsymbol {\mu }}.$ {\displaystyle \mathbf {p} ={\frac {|{\boldsymbol {\Sigma }}|-\prod {\mu _{i}}}{|{\boldsymbol {\Sigma }}|\sum {\mu _{i}}}}{\boldsymbol {\mu }}.}

Substituting sample moments yields the method of moments estimates ${\hat {x}}_{0}={\frac {(\sum _{i=1}^{m}{{\bar {x_{i}}})}\prod _{i=1}^{m}{\bar {x_{i}}}}{|\mathbf {S} |-\prod _{i=1}^{m}{\bar {x_{i}}}}}$ {\displaystyle {\hat {x}}_{0}={\frac {(\sum _{i=1}^{m}{{\bar {x_{i}}})}\prod _{i=1}^{m}{\bar {x_{i}}}}{|\mathbf {S} |-\prod _{i=1}^{m}{\bar {x_{i}}}}}} and ${\hat {\mathbf {p} }}=\left({\frac {|{\boldsymbol {S}}|-\prod _{i=1}^{m}{{\bar {x}}_{i}}}{|{\boldsymbol {S}}|\sum _{i=1}^{m}{{\bar {x}}_{i}}}}\right){\boldsymbol {\bar {x}}}$ {\displaystyle {\hat {\mathbf {p} }}=\left({\frac {|{\boldsymbol {S}}|-\prod _{i=1}^{m}{{\bar {x}}_{i}}}{|{\boldsymbol {S}}|\sum _{i=1}^{m}{{\bar {x}}_{i}}}}\right){\boldsymbol {\bar {x}}}}

Related distributions

[edit ]

Negative binomial distribution
Multinomial distribution
Inverted Dirichlet distribution, a conjugate prior for the negative multinomial
Dirichlet negative multinomial distribution

References

[edit ]

^ Le Gall, F. The modes of a negative multinomial distribution, Statistics & Probability Letters, Volume 76, Issue 6, 15 March 2006, Pages 619-624, ISSN 0167-7152, 10.1016/j.spl.200509009.

Waller LA and Zelterman D. (1997). Log-linear modeling with the negative multi- nomial distribution. Biometrics 53: 971–82.

with finite support	Benford Bernoulli Beta-binomial Binomial Categorical Hypergeometric Negative Poisson binomial Rademacher Soliton Discrete uniform Zipf Zipf–Mandelbrot
with infinite support	Beta negative binomial Borel Conway–Maxwell–Poisson Discrete phase-type Delaporte Extended negative binomial Flory–Schulz Gauss–Kuzmin Geometric Logarithmic Mixed Poisson Negative binomial Panjer Parabolic fractal Poisson Skellam Yule–Simon Zeta

Continuous
univariate

supported on a bounded interval	Arcsine ARGUS Balding–Nichols Bates Beta Generalized Beta rectangular Continuous Bernoulli Irwin–Hall Kumaraswamy Logit-normal Noncentral beta PERT Power function Raised cosine Reciprocal Triangular U-quadratic Uniform Wigner semicircle
supported on a semi-infinite interval	Benini Benktander 1st kind Benktander 2nd kind Beta prime Burr Chi Chi-squared Noncentral Inverse Scaled Dagum Davis Erlang Hyper Exponential Hyperexponential Hypoexponential Logarithmic F Noncentral Folded normal Fréchet Gamma Generalized Inverse gamma/Gompertz Gompertz Shifted Half-logistic Half-normal Hotelling's T-squared Hartman–Watson Inverse Gaussian Generalized Kolmogorov Lévy Log-Cauchy Log-Laplace Log-logistic Log-normal Log-t Lomax Matrix-exponential Maxwell–Boltzmann Maxwell–Jüttner Mittag-Leffler Nakagami Pareto Phase-type Poly-Weibull Rayleigh Relativistic Breit–Wigner Rice Truncated normal type-2 Gumbel Weibull Discrete Wilks's lambda
supported on the whole real line	Cauchy Exponential power Fisher's z Kaniadakis κ-Gaussian Gaussian q Generalized hyperbolic Generalized logistic (logistic-beta) Generalized normal Geometric stable Gumbel Holtsmark Hyperbolic secant Johnson's S_U Landau Laplace Asymmetric Logistic Noncentral t Normal (Gaussian) Normal-inverse Gaussian Skew normal Slash Stable Student's t Tracy–Widom Variance-gamma Voigt
with support whose type varies	Generalized chi-squared Generalized extreme value Generalized Pareto Marchenko–Pastur Kaniadakis κ-exponential Kaniadakis κ-Gamma Kaniadakis κ-Weibull Kaniadakis κ-Logistic Kaniadakis κ-Erlang q-exponential q-Gaussian q-Weibull Shifted log-logistic Tukey lambda