Continuous Bernoulli distribution

Probability distribution

Not to be confused with Bernoulli distribution.

Continuous Bernoulli distribution
Probability density function Probability density function of the continuous Bernoulli distribution
Notation	${\mathcal {CB}}(\lambda )$ {\displaystyle {\mathcal {CB}}(\lambda )}
Parameters	$\lambda \in (0,1)$ {\displaystyle \lambda \in (0,1)}
Support	$x\in [0,1]$ {\displaystyle x\in [0,1]}
PDF	$C(\lambda )\lambda ^{x}(1-\lambda )^{1-x}\!$ {\displaystyle C(\lambda )\lambda ^{x}(1-\lambda )^{1-x}\!} where $C(\lambda )={\begin{cases}2&{\text{if }}\lambda ={\frac {1}{2}}\\{\frac {2\tanh ^{-1}(1-2\lambda )}{1-2\lambda }}&{\text{ otherwise}}\end{cases}}$ {\displaystyle C(\lambda )={\begin{cases}2&{\text{if }}\lambda ={\frac {1}{2}}\\{\frac {2\tanh ^{-1}(1-2\lambda )}{1-2\lambda }}&{\text{ otherwise}}\end{cases}}}
CDF	${\begin{cases}x&{\text{ if }}\lambda ={\frac {1}{2}}\\{\frac {\lambda ^{x}(1-\lambda )^{1-x}+\lambda -1}{2\lambda -1}}&{\text{ otherwise}}\end{cases}}\!$ {\displaystyle {\begin{cases}x&{\text{ if }}\lambda ={\frac {1}{2}}\\{\frac {\lambda ^{x}(1-\lambda )^{1-x}+\lambda -1}{2\lambda -1}}&{\text{ otherwise}}\end{cases}}\!}
Mean	$\operatorname {E} [X]={\begin{cases}{\frac {1}{2}}&{\text{ if }}\lambda ={\frac {1}{2}}\\{\frac {\lambda }{2\lambda -1}}+{\frac {1}{2\tanh ^{-1}(1-2\lambda )}}&{\text{ otherwise}}\end{cases}}\!$ {\displaystyle \operatorname {E} [X]={\begin{cases}{\frac {1}{2}}&{\text{ if }}\lambda ={\frac {1}{2}}\\{\frac {\lambda }{2\lambda -1}}+{\frac {1}{2\tanh ^{-1}(1-2\lambda )}}&{\text{ otherwise}}\end{cases}}\!}
Variance	$\operatorname {var} [X]={\begin{cases}{\frac {1}{12}}&{\text{ if }}\lambda ={\frac {1}{2}}\\-{\frac {(1-\lambda )\lambda }{(1-2\lambda )^{2}}}+{\frac {1}{(2\tanh ^{-1}(1-2\lambda ))^{2}}}&{\text{ otherwise}}\end{cases}}\!$ {\displaystyle \operatorname {var} [X]={\begin{cases}{\frac {1}{12}}&{\text{ if }}\lambda ={\frac {1}{2}}\\-{\frac {(1-\lambda )\lambda }{(1-2\lambda )^{2}}}+{\frac {1}{(2\tanh ^{-1}(1-2\lambda ))^{2}}}&{\text{ otherwise}}\end{cases}}\!}

In probability theory, statistics, and machine learning, the continuous Bernoulli distribution^[1]^[2]^[3] is a family of continuous probability distributions parameterized by a single shape parameter $\lambda \in (0,1)$ {\displaystyle \lambda \in (0,1)}, defined on the unit interval $x\in [0,1]$ {\displaystyle x\in [0,1]}, by:

p(x|\lambda )\propto \lambda ^{x}(1-\lambda )^{1-x}.

{\displaystyle p(x|\lambda )\propto \lambda ^{x}(1-\lambda )^{1-x}.}

The continuous Bernoulli distribution arises in deep learning and computer vision, specifically in the context of variational autoencoders,^[4]^[5] for modeling the pixel intensities of natural images. As such, it defines a proper probabilistic counterpart for the commonly used binary cross entropy loss, which is often applied to continuous, $[0,1]$ {\displaystyle [0,1]}-valued data.^[6]^[7]^[8]^[9] This practice amounts to ignoring the normalizing constant of the continuous Bernoulli distribution, since the binary cross entropy loss only defines a true log-likelihood for discrete, $\{0,1\}$ {\displaystyle \{0,1\}}-valued data.

The continuous Bernoulli also defines an exponential family of distributions. Writing $\eta =\log \left(\lambda /(1-\lambda )\right)$ {\displaystyle \eta =\log \left(\lambda /(1-\lambda )\right)} for the natural parameter, the density can be rewritten in canonical form: $p(x|\eta )\propto \exp(\eta x)$ {\displaystyle p(x|\eta )\propto \exp(\eta x)}.

Statistical inference

[edit ]

Given a sample of $N$ {\displaystyle N} points $x_{1},\dots ,x_{n}$ {\displaystyle x_{1},\dots ,x_{n}} with $x_{i}\in [0,1],円\forall i$ {\displaystyle x_{i}\in [0,1],円\forall i}, the maximum likelihood estimator of $\lambda$ {\displaystyle \lambda } is the empirical mean,

{\hat {\lambda }}={\bar {x}}={\frac {1}{N}}\sum _{i=1}^{n}x_{i}.

{\displaystyle {\hat {\lambda }}={\bar {x}}={\frac {1}{N}}\sum _{i=1}^{n}x_{i}.}

Equivalently, the estimator for the natural parameter $\eta$ {\displaystyle \eta } is the logit of ${\bar {x}}$ {\displaystyle {\bar {x}}},

{\hat {\eta }}={\text{logit}}({\bar {x}})=\log({\bar {x}}/(1-{\bar {x}})).

{\displaystyle {\hat {\eta }}={\text{logit}}({\bar {x}})=\log({\bar {x}}/(1-{\bar {x}})).}

Related distributions

[edit ]

Bernoulli distribution

[edit ]

The continuous Bernoulli can be thought of as a continuous relaxation of the Bernoulli distribution, which is defined on the discrete set $\{0,1\}$ {\displaystyle \{0,1\}} by the probability mass function:

p(x)=p^{x}(1-p)^{1-x},

{\displaystyle p(x)=p^{x}(1-p)^{1-x},}

where $p$ {\displaystyle p} is a scalar parameter between 0 and 1. Applying this same functional form on the continuous interval $[0,1]$ {\displaystyle [0,1]} results in the continuous Bernoulli probability density function, up to a normalizing constant.

Beta distribution

[edit ]

The Beta distribution has the density function:

p(x)\propto x^{\alpha -1}(1-x)^{\beta -1},

{\displaystyle p(x)\propto x^{\alpha -1}(1-x)^{\beta -1},}

which can be re-written as:

p(x)\propto x_{1}^{\alpha _{1}-1}x_{2}^{\alpha _{2}-1},

{\displaystyle p(x)\propto x_{1}^{\alpha _{1}-1}x_{2}^{\alpha _{2}-1},}

where $\alpha _{1},\alpha _{2}$ {\displaystyle \alpha _{1},\alpha _{2}} are positive scalar parameters, and $(x_{1},x_{2})$ {\displaystyle (x_{1},x_{2})} represents an arbitrary point inside the 1-simplex, $\Delta ^{1}=\{(x_{1},x_{2}):x_{1}>0,x_{2}>0,x_{1}+x_{2}=1\}$ {\displaystyle \Delta ^{1}=\{(x_{1},x_{2}):x_{1}>0,x_{2}>0,x_{1}+x_{2}=1\}}. Switching the role of the parameter and the argument in this density function, we obtain:

p(x)\propto \alpha _{1}^{x_{1}}\alpha _{2}^{x_{2}}.

{\displaystyle p(x)\propto \alpha _{1}^{x_{1}}\alpha _{2}^{x_{2}}.}

This family is only identifiable up to the linear constraint $\alpha _{1}+\alpha _{2}=1$ {\displaystyle \alpha _{1}+\alpha _{2}=1}, whence we obtain:

p(x)\propto \lambda ^{x_{1}}(1-\lambda )^{x_{2}},

{\displaystyle p(x)\propto \lambda ^{x_{1}}(1-\lambda )^{x_{2}},}

corresponding exactly to the continuous Bernoulli density.

Exponential distribution

[edit ]

An exponential distribution restricted to the unit interval is equivalent to a continuous Bernoulli distribution with appropriate^{[which? ]} parameter.

Continuous categorical distribution

[edit ]

The multivariate generalization of the continuous Bernoulli is called the continuous-categorical.^[10]

References

[edit ]

^ Loaiza-Ganem, G., & Cunningham, J. P. (2019). The continuous Bernoulli: fixing a pervasive error in variational autoencoders. In Advances in Neural Information Processing Systems (pp. 13266-13276).
^ PyTorch Distributions. https://pytorch.org/docs/stable/distributions.html#continuousbernoulli
^ Tensorflow Probability. https://www.tensorflow.org/probability/api_docs/python/tfp/edward2/ContinuousBernoulli Archived 2020年11月25日 at the Wayback Machine
^ Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
^ Kingma, D. P., & Welling, M. (2014, April). Stochastic gradient VB and the variational auto-encoder. In Second International Conference on Learning Representations, ICLR (Vol. 19).
^ Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016, June). Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning (pp. 1558-1566).
^ Jiang, Z., Zheng, Y., Tan, H., Tang, B., & Zhou, H. (2017, August). Variational deep embedding: an unsupervised and generative approach to clustering. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 1965-1972).
^ PyTorch VAE tutorial: https://github.com/pytorch/examples/tree/master/vae.
^ Keras VAE tutorial: https://blog.keras.io/building-autoencoders-in-keras.html.
^ Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. In 36th International Conference on Machine Learning, ICML 2020. International Machine Learning Society (IMLS).

v
t
e

Probability distributions (list)

Discrete
univariate

with finite support	Benford Bernoulli Beta-binomial Binomial Categorical Hypergeometric Negative Poisson binomial Rademacher Soliton Discrete uniform Zipf Zipf–Mandelbrot
with infinite support	Beta negative binomial Borel Conway–Maxwell–Poisson Discrete phase-type Delaporte Extended negative binomial Flory–Schulz Gauss–Kuzmin Geometric Logarithmic Mixed Poisson Negative binomial Panjer Parabolic fractal Poisson Skellam Yule–Simon Zeta

Continuous
univariate

supported on a bounded interval	Arcsine ARGUS Balding–Nichols Bates Beta Generalized Beta rectangular Continuous Bernoulli Irwin–Hall Kumaraswamy Logit-normal Noncentral beta PERT Power function Raised cosine Reciprocal Triangular U-quadratic Uniform Wigner semicircle
supported on a semi-infinite interval	Benini Benktander 1st kind Benktander 2nd kind Beta prime Burr Chi Chi-squared Noncentral Inverse Scaled Dagum Davis Erlang Hyper Exponential Hyperexponential Hypoexponential Logarithmic F Noncentral Folded normal Fréchet Gamma Generalized Inverse gamma/Gompertz Gompertz Shifted Half-logistic Half-normal Hotelling's T-squared Hartman–Watson Inverse Gaussian Generalized Kolmogorov Lévy Log-Cauchy Log-Laplace Log-logistic Log-normal Log-t Lomax Matrix-exponential Maxwell–Boltzmann Maxwell–Jüttner Mittag-Leffler Nakagami Pareto Phase-type Poly-Weibull Rayleigh Relativistic Breit–Wigner Rice Truncated normal type-2 Gumbel Weibull Discrete Wilks's lambda
supported on the whole real line	Cauchy Exponential power Fisher's z Kaniadakis κ-Gaussian Gaussian q Generalized hyperbolic Generalized logistic (logistic-beta) Generalized normal Geometric stable Gumbel Holtsmark Hyperbolic secant Johnson's S_U Landau Laplace Asymmetric Logistic Noncentral t Normal (Gaussian) Normal-inverse Gaussian Skew normal Slash Stable Student's t Tracy–Widom Variance-gamma Voigt
with support whose type varies	Generalized chi-squared Generalized extreme value Generalized Pareto Marchenko–Pastur Kaniadakis κ-exponential Kaniadakis κ-Gamma Kaniadakis κ-Weibull Kaniadakis κ-Logistic Kaniadakis κ-Erlang q-exponential q-Gaussian q-Weibull Shifted log-logistic Tukey lambda