Continuous mapping theorem
In probability theory, the continuous mapping theorem states that continuous functions preserve limits even if their arguments are sequences of random variables. A continuous function, in Heine's definition, is such a function that maps convergent sequences into convergent sequences: if xn → x then g(xn) → g(x). The continuous mapping theorem states that this will also be true if we replace the deterministic sequence {xn} with a sequence of random variables {Xn}, and replace the standard notion of convergence of real numbers "→" with one of the types of convergence of random variables.
This theorem was first proved by Henry Mann and Abraham Wald in 1943,[1] and it is therefore sometimes called the Mann–Wald theorem.[2] Meanwhile, Denis Sargan refers to it as the general transformation theorem.[3]
Statement
[edit ]Let {Xn}, X be random elements defined on a metric space S. Suppose a function g: S→S′ (where S′ is another metric space) has the set of discontinuity points Dg such that Pr[X ∈ Dg] = 0. Then[4] [5]
- {\displaystyle {\begin{aligned}X_{n}\ \xrightarrow {\text{d}} \ X\quad &\Rightarrow \quad g(X_{n})\ \xrightarrow {\text{d}} \ g(X);\\[6pt]X_{n}\ \xrightarrow {\text{p}} \ X\quad &\Rightarrow \quad g(X_{n})\ \xrightarrow {\text{p}} \ g(X);\\[6pt]X_{n}\ \xrightarrow {\!\!{\text{a.s.}}\!\!} \ X\quad &\Rightarrow \quad g(X_{n})\ \xrightarrow {\!\!{\text{a.s.}}\!\!} \ g(X).\end{aligned}}}
where the superscripts, "d", "p", and "a.s." denote convergence in distribution, convergence in probability, and almost sure convergence respectively.
Proof
[edit ]Spaces S and S′ are equipped with certain metrics. For simplicity we will denote both of these metrics using the |x − y| notation, even though the metrics may be arbitrary and not necessarily Euclidean.
Convergence in distribution
[edit ]We will need a particular statement from the portmanteau theorem: that convergence in distribution {\displaystyle X_{n}\xrightarrow {d} X} is equivalent to
- {\displaystyle \mathbb {E} f(X_{n})\to \mathbb {E} f(X)} for every bounded continuous functional f.
So it suffices to prove that {\displaystyle \mathbb {E} f(g(X_{n}))\to \mathbb {E} f(g(X))} for every bounded continuous functional f. For simplicity we assume g continuous. Note that {\displaystyle F=f\circ g} is itself a bounded continuous functional. And so the claim follows from the statement above. The general case is slightly more technical.
Convergence in probability
[edit ]Fix an arbitrary ε > 0. Then for any δ > 0 consider the set Bδ defined as
- {\displaystyle B_{\delta }={\big \{}x\in S\mid x\notin D_{g}:\ \exists y\in S:\ |x-y|<\delta ,,円|g(x)-g(y)|>\varepsilon {\big \}}.}
This is the set of continuity points x of the function g(·) for which it is possible to find, within the δ-neighborhood of x, a point which maps outside the ε-neighborhood of g(x). By definition of continuity, this set shrinks as δ goes to zero, so that limδ → 0Bδ = ∅.
Now suppose that |g(X) − g(Xn)| > ε. This implies that at least one of the following is true: either |X−Xn| ≥ δ, or X ∈ Dg, or X∈Bδ. In terms of probabilities this can be written as
- {\displaystyle \Pr {\big (}{\big |}g(X_{n})-g(X){\big |}>\varepsilon {\big )}\leq \Pr {\big (}|X_{n}-X|\geq \delta {\big )}+\Pr(X\in B_{\delta })+\Pr(X\in D_{g}).}
On the right-hand side, the first term converges to zero as n → ∞ for any fixed δ, by the definition of convergence in probability of the sequence {Xn}. The second term converges to zero as δ → 0, since the set Bδ shrinks to an empty set. And the last term is identically equal to zero by assumption of the theorem. Therefore, the conclusion is that
- {\displaystyle \lim _{n\to \infty }\Pr {\big (}{\big |}g(X_{n})-g(X){\big |}>\varepsilon {\big )}=0,}
which means that g(Xn) converges to g(X) in probability.
Almost sure convergence
[edit ]By definition of the continuity of the function g(·),
- {\displaystyle \lim _{n\to \infty }X_{n}(\omega )=X(\omega )\quad \Rightarrow \quad \lim _{n\to \infty }g(X_{n}(\omega ))=g(X(\omega ))}
at each point X(ω) where g(·) is continuous. Therefore,
- {\displaystyle {\begin{aligned}\Pr \left(\lim _{n\to \infty }g(X_{n})=g(X)\right)&\geq \Pr \left(\lim _{n\to \infty }g(X_{n})=g(X),\ X\notin D_{g}\right)\\&\geq \Pr \left(\lim _{n\to \infty }X_{n}=X,\ X\notin D_{g}\right)=1,\end{aligned}}}
because the intersection of two almost sure events is almost sure.
By definition, we conclude that g(Xn) converges to g(X) almost surely.
See also
[edit ]References
[edit ]- ^ Mann, H. B.; Wald, A. (1943). "On Stochastic Limit and Order Relationships". Annals of Mathematical Statistics . 14 (3): 217–226. doi:10.1214/aoms/1177731415 . JSTOR 2235800.
- ^ Amemiya, Takeshi (1985). Advanced Econometrics. Cambridge, MA: Harvard University Press. p. 88. ISBN 0-674-00560-0.
- ^ Sargan, Denis (1988). Lectures on Advanced Econometric Theory. Oxford: Basil Blackwell. pp. 4–8. ISBN 0-631-14956-2.
- ^ Billingsley, Patrick (1969). Convergence of Probability Measures. John Wiley & Sons. p. 31 (Corollary 1). ISBN 0-471-07242-7.
- ^ van der Vaart, A. W. (1998). Asymptotic Statistics. New York: Cambridge University Press. p. 7 (Theorem 2.3). ISBN 0-521-49603-9.