Jump to content
Wikipedia The Free Encyclopedia

Continuous Bernoulli distribution

From Wikipedia, the free encyclopedia
Probability distribution
Not to be confused with Bernoulli distribution.
Continuous Bernoulli distribution
Probability density function
Probability density function of the continuous Bernoulli distribution
Notation C B ( λ ) {\displaystyle {\mathcal {CB}}(\lambda )} {\displaystyle {\mathcal {CB}}(\lambda )}
Parameters λ ( 0 , 1 ) {\displaystyle \lambda \in (0,1)} {\displaystyle \lambda \in (0,1)}
Support x [ 0 , 1 ] {\displaystyle x\in [0,1]} {\displaystyle x\in [0,1]}
PDF C ( λ ) λ x ( 1 λ ) 1 x {\displaystyle C(\lambda )\lambda ^{x}(1-\lambda )^{1-x}\!} {\displaystyle C(\lambda )\lambda ^{x}(1-\lambda )^{1-x}\!}
where C ( λ ) = { 2 if  λ = 1 2 2 tanh 1 ( 1 2 λ ) 1 2 λ  otherwise {\displaystyle C(\lambda )={\begin{cases}2&{\text{if }}\lambda ={\frac {1}{2}}\\{\frac {2\tanh ^{-1}(1-2\lambda )}{1-2\lambda }}&{\text{ otherwise}}\end{cases}}} {\displaystyle C(\lambda )={\begin{cases}2&{\text{if }}\lambda ={\frac {1}{2}}\\{\frac {2\tanh ^{-1}(1-2\lambda )}{1-2\lambda }}&{\text{ otherwise}}\end{cases}}}
CDF { x  if  λ = 1 2 λ x ( 1 λ ) 1 x + λ 1 2 λ 1  otherwise {\displaystyle {\begin{cases}x&{\text{ if }}\lambda ={\frac {1}{2}}\\{\frac {\lambda ^{x}(1-\lambda )^{1-x}+\lambda -1}{2\lambda -1}}&{\text{ otherwise}}\end{cases}}\!} {\displaystyle {\begin{cases}x&{\text{ if }}\lambda ={\frac {1}{2}}\\{\frac {\lambda ^{x}(1-\lambda )^{1-x}+\lambda -1}{2\lambda -1}}&{\text{ otherwise}}\end{cases}}\!}
Mean E [ X ] = { 1 2  if  λ = 1 2 λ 2 λ 1 + 1 2 tanh 1 ( 1 2 λ )  otherwise {\displaystyle \operatorname {E} [X]={\begin{cases}{\frac {1}{2}}&{\text{ if }}\lambda ={\frac {1}{2}}\\{\frac {\lambda }{2\lambda -1}}+{\frac {1}{2\tanh ^{-1}(1-2\lambda )}}&{\text{ otherwise}}\end{cases}}\!} {\displaystyle \operatorname {E} [X]={\begin{cases}{\frac {1}{2}}&{\text{ if }}\lambda ={\frac {1}{2}}\\{\frac {\lambda }{2\lambda -1}}+{\frac {1}{2\tanh ^{-1}(1-2\lambda )}}&{\text{ otherwise}}\end{cases}}\!}
Variance var [ X ] = { 1 12  if  λ = 1 2 ( 1 λ ) λ ( 1 2 λ ) 2 + 1 ( 2 tanh 1 ( 1 2 λ ) ) 2  otherwise {\displaystyle \operatorname {var} [X]={\begin{cases}{\frac {1}{12}}&{\text{ if }}\lambda ={\frac {1}{2}}\\-{\frac {(1-\lambda )\lambda }{(1-2\lambda )^{2}}}+{\frac {1}{(2\tanh ^{-1}(1-2\lambda ))^{2}}}&{\text{ otherwise}}\end{cases}}\!} {\displaystyle \operatorname {var} [X]={\begin{cases}{\frac {1}{12}}&{\text{ if }}\lambda ={\frac {1}{2}}\\-{\frac {(1-\lambda )\lambda }{(1-2\lambda )^{2}}}+{\frac {1}{(2\tanh ^{-1}(1-2\lambda ))^{2}}}&{\text{ otherwise}}\end{cases}}\!}

In probability theory, statistics, and machine learning, the continuous Bernoulli distribution[1] [2] [3] is a family of continuous probability distributions parameterized by a single shape parameter λ ( 0 , 1 ) {\displaystyle \lambda \in (0,1)} {\displaystyle \lambda \in (0,1)}, defined on the unit interval x [ 0 , 1 ] {\displaystyle x\in [0,1]} {\displaystyle x\in [0,1]}, by:

p ( x | λ ) λ x ( 1 λ ) 1 x . {\displaystyle p(x|\lambda )\propto \lambda ^{x}(1-\lambda )^{1-x}.} {\displaystyle p(x|\lambda )\propto \lambda ^{x}(1-\lambda )^{1-x}.}

The continuous Bernoulli distribution arises in deep learning and computer vision, specifically in the context of variational autoencoders,[4] [5] for modeling the pixel intensities of natural images. As such, it defines a proper probabilistic counterpart for the commonly used binary cross entropy loss, which is often applied to continuous, [ 0 , 1 ] {\displaystyle [0,1]} {\displaystyle [0,1]}-valued data.[6] [7] [8] [9] This practice amounts to ignoring the normalizing constant of the continuous Bernoulli distribution, since the binary cross entropy loss only defines a true log-likelihood for discrete, { 0 , 1 } {\displaystyle \{0,1\}} {\displaystyle \{0,1\}}-valued data.

The continuous Bernoulli also defines an exponential family of distributions. Writing η = log ( λ / ( 1 λ ) ) {\displaystyle \eta =\log \left(\lambda /(1-\lambda )\right)} {\displaystyle \eta =\log \left(\lambda /(1-\lambda )\right)} for the natural parameter, the density can be rewritten in canonical form: p ( x | η ) exp ( η x ) {\displaystyle p(x|\eta )\propto \exp(\eta x)} {\displaystyle p(x|\eta )\propto \exp(\eta x)}.

Statistical inference

[edit ]

Given a sample of N {\displaystyle N} {\displaystyle N} points x 1 , , x n {\displaystyle x_{1},\dots ,x_{n}} {\displaystyle x_{1},\dots ,x_{n}} with x i [ 0 , 1 ] i {\displaystyle x_{i}\in [0,1],円\forall i} {\displaystyle x_{i}\in [0,1],円\forall i}, the maximum likelihood estimator of λ {\displaystyle \lambda } {\displaystyle \lambda } is the empirical mean,

λ ^ = x ¯ = 1 N i = 1 n x i . {\displaystyle {\hat {\lambda }}={\bar {x}}={\frac {1}{N}}\sum _{i=1}^{n}x_{i}.} {\displaystyle {\hat {\lambda }}={\bar {x}}={\frac {1}{N}}\sum _{i=1}^{n}x_{i}.}

Equivalently, the estimator for the natural parameter η {\displaystyle \eta } {\displaystyle \eta } is the logit of x ¯ {\displaystyle {\bar {x}}} {\displaystyle {\bar {x}}},

η ^ = logit ( x ¯ ) = log ( x ¯ / ( 1 x ¯ ) ) . {\displaystyle {\hat {\eta }}={\text{logit}}({\bar {x}})=\log({\bar {x}}/(1-{\bar {x}})).} {\displaystyle {\hat {\eta }}={\text{logit}}({\bar {x}})=\log({\bar {x}}/(1-{\bar {x}})).}
[edit ]

Bernoulli distribution

[edit ]

The continuous Bernoulli can be thought of as a continuous relaxation of the Bernoulli distribution, which is defined on the discrete set { 0 , 1 } {\displaystyle \{0,1\}} {\displaystyle \{0,1\}} by the probability mass function:

p ( x ) = p x ( 1 p ) 1 x , {\displaystyle p(x)=p^{x}(1-p)^{1-x},} {\displaystyle p(x)=p^{x}(1-p)^{1-x},}

where p {\displaystyle p} {\displaystyle p} is a scalar parameter between 0 and 1. Applying this same functional form on the continuous interval [ 0 , 1 ] {\displaystyle [0,1]} {\displaystyle [0,1]} results in the continuous Bernoulli probability density function, up to a normalizing constant.

Beta distribution

[edit ]

The Beta distribution has the density function:

p ( x ) x α 1 ( 1 x ) β 1 , {\displaystyle p(x)\propto x^{\alpha -1}(1-x)^{\beta -1},} {\displaystyle p(x)\propto x^{\alpha -1}(1-x)^{\beta -1},}

which can be re-written as:

p ( x ) x 1 α 1 1 x 2 α 2 1 , {\displaystyle p(x)\propto x_{1}^{\alpha _{1}-1}x_{2}^{\alpha _{2}-1},} {\displaystyle p(x)\propto x_{1}^{\alpha _{1}-1}x_{2}^{\alpha _{2}-1},}

where α 1 , α 2 {\displaystyle \alpha _{1},\alpha _{2}} {\displaystyle \alpha _{1},\alpha _{2}} are positive scalar parameters, and ( x 1 , x 2 ) {\displaystyle (x_{1},x_{2})} {\displaystyle (x_{1},x_{2})} represents an arbitrary point inside the 1-simplex, Δ 1 = { ( x 1 , x 2 ) : x 1 > 0 , x 2 > 0 , x 1 + x 2 = 1 } {\displaystyle \Delta ^{1}=\{(x_{1},x_{2}):x_{1}>0,x_{2}>0,x_{1}+x_{2}=1\}} {\displaystyle \Delta ^{1}=\{(x_{1},x_{2}):x_{1}>0,x_{2}>0,x_{1}+x_{2}=1\}}. Switching the role of the parameter and the argument in this density function, we obtain:

p ( x ) α 1 x 1 α 2 x 2 . {\displaystyle p(x)\propto \alpha _{1}^{x_{1}}\alpha _{2}^{x_{2}}.} {\displaystyle p(x)\propto \alpha _{1}^{x_{1}}\alpha _{2}^{x_{2}}.}

This family is only identifiable up to the linear constraint α 1 + α 2 = 1 {\displaystyle \alpha _{1}+\alpha _{2}=1} {\displaystyle \alpha _{1}+\alpha _{2}=1}, whence we obtain:

p ( x ) λ x 1 ( 1 λ ) x 2 , {\displaystyle p(x)\propto \lambda ^{x_{1}}(1-\lambda )^{x_{2}},} {\displaystyle p(x)\propto \lambda ^{x_{1}}(1-\lambda )^{x_{2}},}

corresponding exactly to the continuous Bernoulli density.

Exponential distribution

[edit ]

An exponential distribution restricted to the unit interval is equivalent to a continuous Bernoulli distribution with appropriate[which? ] parameter.

Continuous categorical distribution

[edit ]

The multivariate generalization of the continuous Bernoulli is called the continuous-categorical.[10]

References

[edit ]
  1. ^ Loaiza-Ganem, G., & Cunningham, J. P. (2019). The continuous Bernoulli: fixing a pervasive error in variational autoencoders. In Advances in Neural Information Processing Systems (pp. 13266-13276).
  2. ^ PyTorch Distributions. https://pytorch.org/docs/stable/distributions.html#continuousbernoulli
  3. ^ Tensorflow Probability. https://www.tensorflow.org/probability/api_docs/python/tfp/edward2/ContinuousBernoulli Archived 2020年11月25日 at the Wayback Machine
  4. ^ Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
  5. ^ Kingma, D. P., & Welling, M. (2014, April). Stochastic gradient VB and the variational auto-encoder. In Second International Conference on Learning Representations, ICLR (Vol. 19).
  6. ^ Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016, June). Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning (pp. 1558-1566).
  7. ^ Jiang, Z., Zheng, Y., Tan, H., Tang, B., & Zhou, H. (2017, August). Variational deep embedding: an unsupervised and generative approach to clustering. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 1965-1972).
  8. ^ PyTorch VAE tutorial: https://github.com/pytorch/examples/tree/master/vae.
  9. ^ Keras VAE tutorial: https://blog.keras.io/building-autoencoders-in-keras.html.
  10. ^ Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. In 36th International Conference on Machine Learning, ICML 2020. International Machine Learning Society (IMLS).
Discrete
univariate
with finite
support
with infinite
support
Continuous
univariate
supported on a
bounded interval
supported on a
semi-infinite
interval
supported
on the whole
real line
with support
whose type varies
Mixed
univariate
continuous-
discrete
Multivariate
(joint)
Directional
Degenerate
and singular
Degenerate
Dirac delta function
Singular
Cantor
Families

AltStyle によって変換されたページ (->オリジナル) /