Disintegration theorem
In mathematics, the disintegration theorem is a result in measure theory and probability theory. It rigorously defines the idea of a non-trivial "restriction" of a measure to a measure zero subset of the measure space in question. It is related to the existence of conditional probability measures. In a sense, "disintegration" is the opposite process to the construction of a product measure.
Motivation
[edit ]Consider the unit square {\displaystyle S=[0,1]\times [0,1]} in the Euclidean plane {\displaystyle \mathbb {R} ^{2}}. Consider the probability measure {\displaystyle \mu } defined on {\displaystyle S} by the restriction of two-dimensional Lebesgue measure {\displaystyle \lambda ^{2}} to {\displaystyle S}. That is, the probability of an event {\displaystyle E\subseteq S} is simply the area of {\displaystyle E}. We assume {\displaystyle E} is a measurable subset of {\displaystyle S}.
Consider a one-dimensional subset of {\displaystyle S} such as the line segment {\displaystyle L_{x}=\{x\}\times [0,1]}. {\displaystyle L_{x}} has {\displaystyle \mu }-measure zero; every subset of {\displaystyle L_{x}} is a {\displaystyle \mu }-null set; since the Lebesgue measure space is a complete measure space, {\displaystyle E\subseteq L_{x}\implies \mu (E)=0.}
While true, this is somewhat unsatisfying. It would be nice to say that {\displaystyle \mu } "restricted to" {\displaystyle L_{x}} is the one-dimensional Lebesgue measure {\displaystyle \lambda ^{1}}, rather than the zero measure. The probability of a "two-dimensional" event {\displaystyle E} could then be obtained as an integral of the one-dimensional probabilities of the vertical "slices" {\displaystyle E\cap L_{x}}: more formally, if {\displaystyle \mu _{x}} denotes one-dimensional Lebesgue measure on {\displaystyle L_{x}}, then {\displaystyle \mu (E)=\int _{[0,1]}\mu _{x}(E\cap L_{x}),円\mathrm {d} x} for any "nice" {\displaystyle E\subseteq S}. The disintegration theorem makes this argument rigorous in the context of measures on metric spaces.
Statement of the theorem
[edit ](Hereafter, {\displaystyle {\mathcal {P}}(X)} will denote the collection of Borel probability measures on a topological space {\displaystyle (X,T)}.) The assumptions of the theorem are as follows:
- Let {\displaystyle Y} and {\displaystyle X} be two Radon spaces (i.e. a topological space such that every Borel probability measure on it is inner regular, e.g. separably metrizable spaces; in particular, every probability measure on it is outright a Radon measure).
- Let {\displaystyle \mu \in {\mathcal {P}}(Y)}.
- Let {\displaystyle \pi :Y\to X} be a Borel-measurable function. Here one should think of {\displaystyle \pi } as a function to "disintegrate" {\displaystyle Y}, in the sense of partitioning {\displaystyle Y} into {\displaystyle \{\pi ^{-1}(x)\ |\ x\in X\}}. For example, for the motivating example above, one can define {\displaystyle \pi ((a,b))=a}, {\displaystyle (a,b)\in [0,1]\times [0,1]}, which gives that {\displaystyle \pi ^{-1}(a)=a\times [0,1]}, a slice we want to capture.
- Let {\displaystyle \nu \in {\mathcal {P}}(X)} be the pushforward measure {\displaystyle \nu =\pi _{*}(\mu )=\mu \circ \pi ^{-1}}. This measure provides the distribution of {\displaystyle x} (which corresponds to the events {\displaystyle \pi ^{-1}(x)}).
The conclusion of the theorem: There exists a {\displaystyle \nu }-almost everywhere uniquely determined family of probability measures {\displaystyle \{\mu _{x}\}_{x\in X}\subseteq {\mathcal {P}}(Y)}, which provides a "disintegration" of {\displaystyle \mu } into {\displaystyle \{\mu _{x}\}_{x\in X}}, such that:
- the function {\displaystyle x\mapsto \mu _{x}} is Borel measurable, in the sense that {\displaystyle x\mapsto \mu _{x}(B)} is a Borel-measurable function for each Borel-measurable set {\displaystyle B\subseteq Y};
- {\displaystyle \mu _{x}} "lives on" the fiber {\displaystyle \pi ^{-1}(x)}: for {\displaystyle \nu }-almost all {\displaystyle x\in X}, {\displaystyle \mu _{x}\left(Y\setminus \pi ^{-1}(x)\right)=0,} and so {\displaystyle \mu _{x}(E)=\mu _{x}(E\cap \pi ^{-1}(x))};
- for every Borel-measurable function {\displaystyle f:Y\to [0,\infty ]}, {\displaystyle \int _{Y}f(y),円\mathrm {d} \mu (y)=\int _{X}\int _{\pi ^{-1}(x)}f(y),円\mathrm {d} \mu _{x}(y),円\mathrm {d} \nu (x).} In particular, for any event {\displaystyle E\subseteq Y}, taking {\displaystyle f} to be the indicator function of {\displaystyle E},[1] {\displaystyle \mu (E)=\int _{X}\mu _{x}(E),円\mathrm {d} \nu (x).}
Applications
[edit ]Product spaces
[edit ]The original example was a special case of the problem of product spaces, to which the disintegration theorem applies.
When {\displaystyle Y} is written as a Cartesian product {\displaystyle Y=X_{1}\times X_{2}} and {\displaystyle \pi _{i}:Y\to X_{i}} is the natural projection, then each fibre {\displaystyle \pi _{1}^{-1}(x_{1})} can be canonically identified with {\displaystyle X_{2}} and there exists a Borel family of probability measures {\displaystyle \{\mu _{x_{1}}\}_{x_{1}\in X_{1}}} in {\displaystyle {\mathcal {P}}(X_{2})} (which is {\displaystyle (\pi _{1})_{*}(\mu )}-almost everywhere uniquely determined) such that {\displaystyle \mu =\int _{X_{1}}\mu _{x_{1}},円\mu \left(\pi _{1}^{-1}(\mathrm {d} x_{1})\right)=\int _{X_{1}}\mu _{x_{1}},円\mathrm {d} (\pi _{1})_{*}(\mu )(x_{1}),} which is in particular[clarification needed ] {\displaystyle \int _{X_{1}\times X_{2}}f(x_{1},x_{2}),円\mu (\mathrm {d} x_{1},\mathrm {d} x_{2})=\int _{X_{1}}\left(\int _{X_{2}}f(x_{1},x_{2})\mu (\mathrm {d} x_{2}\mid x_{1})\right)\mu \left(\pi _{1}^{-1}(\mathrm {d} x_{1})\right)} and {\displaystyle \mu (A\times B)=\int _{A}\mu \left(B\mid x_{1}\right),円\mu \left(\pi _{1}^{-1}(\mathrm {d} x_{1})\right).}
The relation to conditional expectation is given by the identities {\displaystyle \operatorname {E} (f\mid \pi _{1})(x_{1})=\int _{X_{2}}f(x_{1},x_{2})\mu (\mathrm {d} x_{2}\mid x_{1}),} {\displaystyle \mu (A\times B\mid \pi _{1})(x_{1})=1_{A}(x_{1})\cdot \mu (B\mid x_{1}).}
Vector calculus
[edit ]The disintegration theorem can also be seen as justifying the use of a "restricted" measure in vector calculus. For instance, in Stokes' theorem as applied to a vector field flowing through a compact surface {\displaystyle \Sigma \subset \mathbb {R} ^{3}}, it is implicit that the "correct" measure on {\displaystyle \Sigma } is the disintegration of three-dimensional Lebesgue measure {\displaystyle \lambda ^{3}} on {\displaystyle \Sigma }, and that the disintegration of this measure on ∂Σ is the same as the disintegration of {\displaystyle \lambda ^{3}} on {\displaystyle \partial \Sigma }.[2]
Conditional distributions
[edit ]The disintegration theorem can be applied to give a rigorous treatment of conditional probability distributions in statistics, while avoiding purely abstract formulations of conditional probability.[3] The theorem is related to the Borel–Kolmogorov paradox, for example.
See also
[edit ]- Ionescu-Tulcea theorem – Probability theorem
- Joint probability distribution – Type of probability distribution
- Copula (statistics) – Statistical distribution for dependence between random variables
- Conditional expectation – Expected value of a random variable given that certain conditions are known to occur
- Borel–Kolmogorov paradox – Conditional probability paradox
- Regular conditional probability
- Lifting theory
References
[edit ]- ^ Dellacherie, C.; Meyer, P.-A. (1978). Probabilities and Potential . North-Holland Mathematics Studies. Amsterdam: North-Holland. ISBN 0-7204-0701-X.
- ^ Ambrosio, L.; Gigli, N.; Savaré, G. (2005). Gradient Flows in Metric Spaces and in the Space of Probability Measures. ETH Zürich, Birkhäuser Verlag, Basel. ISBN 978-3-7643-2428-5.
- ^ Chang, J.T.; Pollard, D. (1997). "Conditioning as disintegration" (PDF). Statistica Neerlandica. 51 (3): 287. CiteSeerX 10.1.1.55.7544 . doi:10.1111/1467-9574.00056. S2CID 16749932.