- Source: Proofs of convergence of random variables
This article is supplemental for “Convergence of random variables” and provides proofs for selected results.
Several results will be established using the portmanteau lemma: A sequence {Xn} converges in distribution to X if and only if any of the following conditions are met:
E
[
f
(
X
n
)
]
→
E
[
f
(
X
)
]
{\displaystyle \mathbb {E} [f(X_{n})]\to \mathbb {E} [f(X)]}
for all bounded, continuous functions
f
{\displaystyle f}
;
E
[
f
(
X
n
)
]
→
E
[
f
(
X
)
]
{\displaystyle \mathbb {E} [f(X_{n})]\to \mathbb {E} [f(X)]}
for all bounded, Lipschitz functions
f
{\displaystyle f}
;
lim sup
Pr
(
X
n
∈
C
)
≤
Pr
(
X
∈
C
)
{\displaystyle \limsup \operatorname {Pr} (X_{n}\in C)\leq \operatorname {Pr} (X\in C)}
for all closed sets
C
{\displaystyle C}
;
Convergence almost surely implies convergence in probability
X
n
→
a
s
X
⇒
X
n
→
p
X
{\displaystyle X_{n}\ {\overset {\mathrm {as} }{\rightarrow }}\ X\quad \Rightarrow \quad X_{n}\ {\overset {p}{\rightarrow }}\ X}
Proof: If
{
X
n
}
{\displaystyle \{X_{n}\}}
converges to
X
{\displaystyle X}
almost surely, it means that the set of points
O
=
{
ω
∣
lim
X
n
(
ω
)
≠
X
(
ω
)
}
{\displaystyle O=\{\omega \mid \lim X_{n}(\omega )\neq X(\omega )\}}
has measure zero. Now fix
ε
>
0
{\displaystyle \varepsilon >0}
and consider a sequence of sets
A
n
=
⋃
m
≥
n
{
|
X
m
−
X
|
>
ε
}
{\displaystyle A_{n}=\bigcup _{m\geq n}\left\{\left|X_{m}-X\right|>\varepsilon \right\}}
This sequence of sets is decreasing (
A
n
⊇
A
n
+
1
⊇
…
{\displaystyle A_{n}\supseteq A_{n+1}\supseteq \ldots }
) towards the set
A
∞
=
⋂
n
≥
1
A
n
.
{\displaystyle A_{\infty }=\bigcap _{n\geq 1}A_{n}.}
The probabilities of this sequence are also decreasing, so
lim
Pr
(
A
n
)
=
Pr
(
A
∞
)
{\displaystyle \lim \operatorname {Pr} (A_{n})=\operatorname {Pr} (A_{\infty })}
; we shall show now that this number is equal to zero. Now for any point
ω
{\displaystyle \omega }
outside of
O
{\displaystyle O}
we have
lim
X
n
(
ω
)
=
X
(
ω
)
{\displaystyle \lim X_{n}(\omega )=X(\omega )}
, which implies that
|
X
n
(
ω
)
−
X
(
ω
)
|
<
ε
{\displaystyle \left|X_{n}(\omega )-X(\omega )\right|<\varepsilon }
for all
n
≥
N
{\displaystyle n\geq N}
for some
N
{\displaystyle N}
. In particular, for such
n
{\displaystyle n}
the point
ω
{\displaystyle \omega }
will not lie in
A
n
{\displaystyle A_{n}}
, and hence won't lie in
A
∞
{\displaystyle A_{\infty }}
. Therefore,
A
∞
⊆
O
{\displaystyle A_{\infty }\subseteq O}
and so
Pr
(
A
∞
)
=
0
{\displaystyle \operatorname {Pr} (A_{\infty })=0}
.
Finally, by continuity from above,
Pr
(
|
X
n
−
X
|
>
ε
)
≤
Pr
(
A
n
)
→
n
→
∞
0
,
{\displaystyle \operatorname {Pr} \left(|X_{n}-X|>\varepsilon \right)\leq \operatorname {Pr} (A_{n})\ {\underset {n\to \infty }{\rightarrow }}0,}
which by definition means that
X
n
{\displaystyle X_{n}}
converges in probability to
X
{\displaystyle X}
.
Convergence in probability does not imply almost sure convergence in the discrete case
If Xn are independent random variables assuming value one with probability 1/n and zero otherwise, then Xn converges to zero in probability but not almost surely. This can be verified using the Borel–Cantelli lemmas.
Convergence in probability implies convergence in distribution
X
n
→
p
X
⇒
X
n
→
d
X
,
{\displaystyle X_{n}\ {\xrightarrow {p}}\ X\quad \Rightarrow \quad X_{n}\ {\xrightarrow {d}}\ X,}
= Proof for the case of scalar random variables
=Lemma. Let X, Y be random variables, let a be a real number and ε > 0. Then
Pr
(
Y
≤
a
)
≤
Pr
(
X
≤
a
+
ε
)
+
Pr
(
|
Y
−
X
|
>
ε
)
.
{\displaystyle \operatorname {Pr} (Y\leq a)\leq \operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} (|Y-X|>\varepsilon ).}
Proof of lemma:
Pr
(
Y
≤
a
)
=
Pr
(
Y
≤
a
,
X
≤
a
+
ε
)
+
Pr
(
Y
≤
a
,
X
>
a
+
ε
)
≤
Pr
(
X
≤
a
+
ε
)
+
Pr
(
Y
−
X
≤
a
−
X
,
a
−
X
<
−
ε
)
≤
Pr
(
X
≤
a
+
ε
)
+
Pr
(
Y
−
X
<
−
ε
)
≤
Pr
(
X
≤
a
+
ε
)
+
Pr
(
Y
−
X
<
−
ε
)
+
Pr
(
Y
−
X
>
ε
)
=
Pr
(
X
≤
a
+
ε
)
+
Pr
(
|
Y
−
X
|
>
ε
)
{\displaystyle {\begin{aligned}\operatorname {Pr} (Y\leq a)&=\operatorname {Pr} (Y\leq a,\ X\leq a+\varepsilon )+\operatorname {Pr} (Y\leq a,\ X>a+\varepsilon )\\&\leq \operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} (Y-X\leq a-X,\ a-X<-\varepsilon )\\&\leq \operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} (Y-X<-\varepsilon )\\&\leq \operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} (Y-X<-\varepsilon )+\operatorname {Pr} (Y-X>\varepsilon )\\&=\operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} (|Y-X|>\varepsilon )\end{aligned}}}
Shorter proof of the lemma:
We have
{
Y
≤
a
}
⊂
{
X
≤
a
+
ε
}
∪
{
|
Y
−
X
|
>
ε
}
{\displaystyle {\begin{aligned}\{Y\leq a\}\subset \{X\leq a+\varepsilon \}\cup \{|Y-X|>\varepsilon \}\end{aligned}}}
for if
Y
≤
a
{\displaystyle Y\leq a}
and
|
Y
−
X
|
≤
ε
{\displaystyle |Y-X|\leq \varepsilon }
, then
X
≤
a
+
ε
{\displaystyle X\leq a+\varepsilon }
. Hence by the union bound,
Pr
(
Y
≤
a
)
≤
Pr
(
X
≤
a
+
ε
)
+
Pr
(
|
Y
−
X
|
>
ε
)
.
{\displaystyle {\begin{aligned}\operatorname {Pr} (Y\leq a)\leq \operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} (|Y-X|>\varepsilon ).\end{aligned}}}
Proof of the theorem: Recall that in order to prove convergence in distribution, one must show that the sequence of cumulative distribution functions converges to the FX at every point where FX is continuous. Let a be such a point. For every ε > 0, due to the preceding lemma, we have:
Pr
(
X
n
≤
a
)
≤
Pr
(
X
≤
a
+
ε
)
+
Pr
(
|
X
n
−
X
|
>
ε
)
Pr
(
X
≤
a
−
ε
)
≤
Pr
(
X
n
≤
a
)
+
Pr
(
|
X
n
−
X
|
>
ε
)
{\displaystyle {\begin{aligned}\operatorname {Pr} (X_{n}\leq a)&\leq \operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} (|X_{n}-X|>\varepsilon )\\\operatorname {Pr} (X\leq a-\varepsilon )&\leq \operatorname {Pr} (X_{n}\leq a)+\operatorname {Pr} (|X_{n}-X|>\varepsilon )\end{aligned}}}
So, we have
Pr
(
X
≤
a
−
ε
)
−
Pr
(
|
X
n
−
X
|
>
ε
)
≤
Pr
(
X
n
≤
a
)
≤
Pr
(
X
≤
a
+
ε
)
+
Pr
(
|
X
n
−
X
|
>
ε
)
.
{\displaystyle \operatorname {Pr} (X\leq a-\varepsilon )-\operatorname {Pr} \left(\left|X_{n}-X\right|>\varepsilon \right)\leq \operatorname {Pr} (X_{n}\leq a)\leq \operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} \left(\left|X_{n}-X\right|>\varepsilon \right).}
Taking the limit as n → ∞, we obtain:
F
X
(
a
−
ε
)
≤
lim
n
→
∞
Pr
(
X
n
≤
a
)
≤
F
X
(
a
+
ε
)
,
{\displaystyle F_{X}(a-\varepsilon )\leq \lim _{n\to \infty }\operatorname {Pr} (X_{n}\leq a)\leq F_{X}(a+\varepsilon ),}
where FX(a) = Pr(X ≤ a) is the cumulative distribution function of X. This function is continuous at a by assumption, and therefore both FX(a−ε) and FX(a+ε) converge to FX(a) as ε → 0+. Taking this limit, we obtain
lim
n
→
∞
Pr
(
X
n
≤
a
)
=
Pr
(
X
≤
a
)
,
{\displaystyle \lim _{n\to \infty }\operatorname {Pr} (X_{n}\leq a)=\operatorname {Pr} (X\leq a),}
which means that {Xn} converges to X in distribution.
= Proof for the generic case
=The implication follows for when Xn is a random vector by using this property proved later on this page and by taking Xn = X in the statement of that property.
Convergence in distribution to a constant implies convergence in probability
X
n
→
d
c
⇒
X
n
→
p
c
,
{\displaystyle X_{n}\ {\xrightarrow {d}}\ c\quad \Rightarrow \quad X_{n}\ {\xrightarrow {p}}\ c,}
provided c is a constant.
Proof: Fix ε > 0. Let Bε(c) be the open ball of radius ε around point c, and Bε(c)c its complement. Then
Pr
(
|
X
n
−
c
|
≥
ε
)
=
Pr
(
X
n
∈
B
ε
(
c
)
c
)
.
{\displaystyle \operatorname {Pr} \left(|X_{n}-c|\geq \varepsilon \right)=\operatorname {Pr} \left(X_{n}\in B_{\varepsilon }(c)^{c}\right).}
By the portmanteau lemma (part C), if Xn converges in distribution to c, then the limsup of the latter probability must be less than or equal to Pr(c ∈ Bε(c)c), which is obviously equal to zero. Therefore,
lim
n
→
∞
Pr
(
|
X
n
−
c
|
≥
ε
)
≤
lim sup
n
→
∞
Pr
(
|
X
n
−
c
|
≥
ε
)
=
lim sup
n
→
∞
Pr
(
X
n
∈
B
ε
(
c
)
c
)
≤
Pr
(
c
∈
B
ε
(
c
)
c
)
=
0
{\displaystyle {\begin{aligned}\lim _{n\to \infty }\operatorname {Pr} \left(\left|X_{n}-c\right|\geq \varepsilon \right)&\leq \limsup _{n\to \infty }\operatorname {Pr} \left(\left|X_{n}-c\right|\geq \varepsilon \right)\\&=\limsup _{n\to \infty }\operatorname {Pr} \left(X_{n}\in B_{\varepsilon }(c)^{c}\right)\\&\leq \operatorname {Pr} \left(c\in B_{\varepsilon }(c)^{c}\right)=0\end{aligned}}}
which by definition means that Xn converges to c in probability.
Convergence in probability to a sequence converging in distribution implies convergence to the same distribution
|
Y
n
−
X
n
|
→
p
0
,
X
n
→
d
X
⇒
Y
n
→
d
X
{\displaystyle |Y_{n}-X_{n}|\ {\xrightarrow {p}}\ 0,\ \ X_{n}\ {\xrightarrow {d}}\ X\ \quad \Rightarrow \quad Y_{n}\ {\xrightarrow {d}}\ X}
Proof: We will prove this theorem using the portmanteau lemma, part B. As required in that lemma, consider any bounded function f (i.e. |f(x)| ≤ M) which is also Lipschitz:
∃
K
>
0
,
∀
x
,
y
:
|
f
(
x
)
−
f
(
y
)
|
≤
K
|
x
−
y
|
.
{\displaystyle \exists K>0,\forall x,y:\quad |f(x)-f(y)|\leq K|x-y|.}
Take some ε > 0 and majorize the expression |E[f(Yn)] − E[f(Xn)]| as
|
E
[
f
(
Y
n
)
]
−
E
[
f
(
X
n
)
]
|
≤
E
[
|
f
(
Y
n
)
−
f
(
X
n
)
|
]
=
E
[
|
f
(
Y
n
)
−
f
(
X
n
)
|
1
{
|
Y
n
−
X
n
|
<
ε
}
]
+
E
[
|
f
(
Y
n
)
−
f
(
X
n
)
|
1
{
|
Y
n
−
X
n
|
≥
ε
}
]
≤
E
[
K
|
Y
n
−
X
n
|
1
{
|
Y
n
−
X
n
|
<
ε
}
]
+
E
[
2
M
1
{
|
Y
n
−
X
n
|
≥
ε
}
]
≤
K
ε
Pr
(
|
Y
n
−
X
n
|
<
ε
)
+
2
M
Pr
(
|
Y
n
−
X
n
|
≥
ε
)
≤
K
ε
+
2
M
Pr
(
|
Y
n
−
X
n
|
≥
ε
)
{\displaystyle {\begin{aligned}\left|\operatorname {E} \left[f(Y_{n})\right]-\operatorname {E} \left[f(X_{n})\right]\right|&\leq \operatorname {E} \left[\left|f(Y_{n})-f(X_{n})\right|\right]\\&=\operatorname {E} \left[\left|f(Y_{n})-f(X_{n})\right|\mathbf {1} _{\left\{|Y_{n}-X_{n}|<\varepsilon \right\}}\right]+\operatorname {E} \left[\left|f(Y_{n})-f(X_{n})\right|\mathbf {1} _{\left\{|Y_{n}-X_{n}|\geq \varepsilon \right\}}\right]\\&\leq \operatorname {E} \left[K\left|Y_{n}-X_{n}\right|\mathbf {1} _{\left\{|Y_{n}-X_{n}|<\varepsilon \right\}}\right]+\operatorname {E} \left[2M\mathbf {1} _{\left\{|Y_{n}-X_{n}|\geq \varepsilon \right\}}\right]\\&\leq K\varepsilon \operatorname {Pr} \left(\left|Y_{n}-X_{n}\right|<\varepsilon \right)+2M\operatorname {Pr} \left(\left|Y_{n}-X_{n}\right|\geq \varepsilon \right)\\&\leq K\varepsilon +2M\operatorname {Pr} \left(\left|Y_{n}-X_{n}\right|\geq \varepsilon \right)\end{aligned}}}
(here 1{...} denotes the indicator function; the expectation of the indicator function is equal to the probability of corresponding event). Therefore,
|
E
[
f
(
Y
n
)
]
−
E
[
f
(
X
)
]
|
≤
|
E
[
f
(
Y
n
)
]
−
E
[
f
(
X
n
)
]
|
+
|
E
[
f
(
X
n
)
]
−
E
[
f
(
X
)
]
|
≤
K
ε
+
2
M
Pr
(
|
Y
n
−
X
n
|
≥
ε
)
+
|
E
[
f
(
X
n
)
]
−
E
[
f
(
X
)
]
|
.
{\displaystyle {\begin{aligned}\left|\operatorname {E} \left[f(Y_{n})\right]-\operatorname {E} \left[f(X)\right]\right|&\leq \left|\operatorname {E} \left[f(Y_{n})\right]-\operatorname {E} \left[f(X_{n})\right]\right|+\left|\operatorname {E} \left[f(X_{n})\right]-\operatorname {E} \left[f(X)\right]\right|\\&\leq K\varepsilon +2M\operatorname {Pr} \left(|Y_{n}-X_{n}|\geq \varepsilon \right)+\left|\operatorname {E} \left[f(X_{n})\right]-\operatorname {E} \left[f(X)\right]\right|.\end{aligned}}}
If we take the limit in this expression as n → ∞, the second term will go to zero since {Yn−Xn} converges to zero in probability; and the third term will also converge to zero, by the portmanteau lemma and the fact that Xn converges to X in distribution. Thus
lim
n
→
∞
|
E
[
f
(
Y
n
)
]
−
E
[
f
(
X
)
]
|
≤
K
ε
.
{\displaystyle \lim _{n\to \infty }\left|\operatorname {E} \left[f(Y_{n})\right]-\operatorname {E} \left[f(X)\right]\right|\leq K\varepsilon .}
Since ε was arbitrary, we conclude that the limit must in fact be equal to zero, and therefore E[f(Yn)] → E[f(X)], which again by the portmanteau lemma implies that {Yn} converges to X in distribution. QED.
Convergence of one sequence in distribution and another to a constant implies joint convergence in distribution
X
n
→
d
X
,
Y
n
→
p
c
⇒
(
X
n
,
Y
n
)
→
d
(
X
,
c
)
{\displaystyle X_{n}\ {\xrightarrow {d}}\ X,\ \ Y_{n}\ {\xrightarrow {p}}\ c\ \quad \Rightarrow \quad (X_{n},Y_{n})\ {\xrightarrow {d}}\ (X,c)}
provided c is a constant.
Proof: We will prove this statement using the portmanteau lemma, part A.
First we want to show that (Xn, c) converges in distribution to (X, c). By the portmanteau lemma this will be true if we can show that E[f(Xn, c)] → E[f(X, c)] for any bounded continuous function f(x, y). So let f be such arbitrary bounded continuous function. Now consider the function of a single variable g(x) := f(x, c). This will obviously be also bounded and continuous, and therefore by the portmanteau lemma for sequence {Xn} converging in distribution to X, we will have that E[g(Xn)] → E[g(X)]. However the latter expression is equivalent to “E[f(Xn, c)] → E[f(X, c)]”, and therefore we now know that (Xn, c) converges in distribution to (X, c).
Secondly, consider |(Xn, Yn) − (Xn, c)| = |Yn − c|. This expression converges in probability to zero because Yn converges in probability to c. Thus we have demonstrated two facts:
{
|
(
X
n
,
Y
n
)
−
(
X
n
,
c
)
|
→
p
0
,
(
X
n
,
c
)
→
d
(
X
,
c
)
.
{\displaystyle {\begin{cases}\left|(X_{n},Y_{n})-(X_{n},c)\right|\ {\xrightarrow {p}}\ 0,\\(X_{n},c)\ {\xrightarrow {d}}\ (X,c).\end{cases}}}
By the property proved earlier, these two facts imply that (Xn, Yn) converge in distribution to (X, c).
Convergence of two sequences in probability implies joint convergence in probability
X
n
→
p
X
,
Y
n
→
p
Y
⇒
(
X
n
,
Y
n
)
→
p
(
X
,
Y
)
{\displaystyle X_{n}\ {\xrightarrow {p}}\ X,\ \ Y_{n}\ {\xrightarrow {p}}\ Y\ \quad \Rightarrow \quad (X_{n},Y_{n})\ {\xrightarrow {p}}\ (X,Y)}
Proof:
Pr
(
|
(
X
n
,
Y
n
)
−
(
X
,
Y
)
|
≥
ε
)
≤
Pr
(
|
X
n
−
X
|
+
|
Y
n
−
Y
|
≥
ε
)
≤
Pr
(
|
X
n
−
X
|
≥
ε
/
2
)
+
Pr
(
|
Y
n
−
Y
|
≥
ε
/
2
)
{\displaystyle {\begin{aligned}\operatorname {Pr} \left(\left|(X_{n},Y_{n})-(X,Y)\right|\geq \varepsilon \right)&\leq \operatorname {Pr} \left(|X_{n}-X|+|Y_{n}-Y|\geq \varepsilon \right)\\&\leq \operatorname {Pr} \left(|X_{n}-X|\geq \varepsilon /2\right)+\operatorname {Pr} \left(|Y_{n}-Y|\geq \varepsilon /2\right)\end{aligned}}}
where the last step follows by the pigeonhole principle and the sub-additivity of the probability measure. Each of the probabilities on the right-hand side converge to zero as n → ∞ by definition of the convergence of {Xn} and {Yn} in probability to X and Y respectively. Taking the limit we conclude that the left-hand side also converges to zero, and therefore the sequence {(Xn, Yn)} converges in probability to {(X, Y)}.
See also
Convergence of random variables
References
Kata Kunci Pencarian:
- Proofs of convergence of random variables
- Convergence of random variables
- Law of large numbers
- Probability theory
- Convergence proof techniques
- Normal distribution
- Lévy's continuity theorem
- Dominated convergence theorem
- Central limit theorem
- Outline of probability
Murder Party (2007)
No More Posts Available.
No more pages to load.