- Source: Delta method
In statistics, the delta method is a method of deriving the asymptotic distribution of a random variable. It is applicable when the random variable being considered can be defined as a differentiable function of a random variable which is asymptotically Gaussian.
History
The delta method was derived from propagation of error, and the idea behind was known in the early 20th century. Its statistical application can be traced as far back as 1928 by T. L. Kelley. A formal description of the method was presented by J. L. Doob in 1935. Robert Dorfman also described a version of it in 1938.
Univariate delta method
While the delta method generalizes easily to a multivariate setting, careful motivation of the technique is more easily demonstrated in univariate terms. Roughly, if there is a sequence of random variables Xn satisfying
n
[
X
n
−
θ
]
→
D
N
(
0
,
σ
2
)
,
{\displaystyle {{\sqrt {n}}[X_{n}-\theta ]\,{\xrightarrow {D}}\,{\mathcal {N}}(0,\sigma ^{2})},}
where θ and σ2 are finite valued constants and
→
D
{\displaystyle {\xrightarrow {D}}}
denotes convergence in distribution, then
n
[
g
(
X
n
)
−
g
(
θ
)
]
→
D
N
(
0
,
σ
2
⋅
[
g
′
(
θ
)
]
2
)
{\displaystyle {{\sqrt {n}}[g(X_{n})-g(\theta )]\,{\xrightarrow {D}}\,{\mathcal {N}}(0,\sigma ^{2}\cdot [g'(\theta )]^{2})}}
for any function g satisfying the property that its first derivative, evaluated at
θ
{\displaystyle \theta }
,
g
′
(
θ
)
{\displaystyle g'(\theta )}
exists and is non-zero valued.
The intuition of the delta method is that any such g function, in a "small enough" range of the function, can be approximated via a first order Taylor series (which is basically a linear function). If the random variable is roughly normal then a linear transformation of it is also normal. Small range can be achieved when approximating the function around the mean, when the variance is "small enough". When g is applied to a random variable such as the mean, the delta method would tend to work better as the sample size increases, since it would help reduce the variance, and thus the taylor approximation would be applied to a smaller range of the function g at the point of interest.
= Proof in the univariate case
=Demonstration of this result is fairly straightforward under the assumption that
g
(
x
)
{\displaystyle g(x)}
is differentiable near the neighborhood of
θ
{\displaystyle \theta }
and
g
′
(
x
)
{\displaystyle g'(x)}
is continuous at
θ
{\displaystyle \theta }
with
g
′
(
θ
)
≠
0
{\displaystyle g'(\theta )\neq 0}
. To begin, we use the mean value theorem (i.e.: the first order approximation of a Taylor series using Taylor's theorem):
g
(
X
n
)
=
g
(
θ
)
+
g
′
(
θ
~
)
(
X
n
−
θ
)
,
{\displaystyle g(X_{n})=g(\theta )+g'({\tilde {\theta }})(X_{n}-\theta ),}
where
θ
~
{\displaystyle {\tilde {\theta }}}
lies between Xn and θ.
Note that since
X
n
→
P
θ
{\displaystyle X_{n}\,{\xrightarrow {P}}\,\theta }
and
|
θ
~
−
θ
|
<
|
X
n
−
θ
|
{\displaystyle |{\tilde {\theta }}-\theta |<|X_{n}-\theta |}
, it must be that
θ
~
→
P
θ
{\displaystyle {\tilde {\theta }}\,{\xrightarrow {P}}\,\theta }
and since g′(θ) is continuous, applying the continuous mapping theorem yields
g
′
(
θ
~
)
→
P
g
′
(
θ
)
,
{\displaystyle g'({\tilde {\theta }})\,{\xrightarrow {P}}\,g'(\theta ),}
where
→
P
{\displaystyle {\xrightarrow {P}}}
denotes convergence in probability.
Rearranging the terms and multiplying by
n
{\displaystyle {\sqrt {n}}}
gives
n
[
g
(
X
n
)
−
g
(
θ
)
]
=
g
′
(
θ
~
)
n
[
X
n
−
θ
]
.
{\displaystyle {\sqrt {n}}[g(X_{n})-g(\theta )]=g'\left({\tilde {\theta }}\right){\sqrt {n}}[X_{n}-\theta ].}
Since
n
[
X
n
−
θ
]
→
D
N
(
0
,
σ
2
)
{\displaystyle {{\sqrt {n}}[X_{n}-\theta ]{\xrightarrow {D}}{\mathcal {N}}(0,\sigma ^{2})}}
by assumption, it follows immediately from appeal to Slutsky's theorem that
n
[
g
(
X
n
)
−
g
(
θ
)
]
→
D
N
(
0
,
σ
2
[
g
′
(
θ
)
]
2
)
.
{\displaystyle {{\sqrt {n}}[g(X_{n})-g(\theta )]{\xrightarrow {D}}{\mathcal {N}}(0,\sigma ^{2}[g'(\theta )]^{2})}.}
This concludes the proof.
Proof with an explicit order of approximation
Alternatively, one can add one more step at the end, to obtain the order of approximation:
n
[
g
(
X
n
)
−
g
(
θ
)
]
=
g
′
(
θ
~
)
n
[
X
n
−
θ
]
=
n
[
X
n
−
θ
]
[
g
′
(
θ
~
)
+
g
′
(
θ
)
−
g
′
(
θ
)
]
=
n
[
X
n
−
θ
]
[
g
′
(
θ
)
]
+
n
[
X
n
−
θ
]
[
g
′
(
θ
~
)
−
g
′
(
θ
)
]
=
n
[
X
n
−
θ
]
[
g
′
(
θ
)
]
+
O
p
(
1
)
⋅
o
p
(
1
)
=
n
[
X
n
−
θ
]
[
g
′
(
θ
)
]
+
o
p
(
1
)
{\displaystyle {\begin{aligned}{\sqrt {n}}[g(X_{n})-g(\theta )]&=g'\left({\tilde {\theta }}\right){\sqrt {n}}[X_{n}-\theta ]\\[5pt]&={\sqrt {n}}[X_{n}-\theta ]\left[g'({\tilde {\theta }})+g'(\theta )-g'(\theta )\right]\\[5pt]&={\sqrt {n}}[X_{n}-\theta ]\left[g'(\theta )\right]+{\sqrt {n}}[X_{n}-\theta ]\left[g'({\tilde {\theta }})-g'(\theta )\right]\\[5pt]&={\sqrt {n}}[X_{n}-\theta ]\left[g'(\theta )\right]+O_{p}(1)\cdot o_{p}(1)\\[5pt]&={\sqrt {n}}[X_{n}-\theta ]\left[g'(\theta )\right]+o_{p}(1)\end{aligned}}}
This suggests that the error in the approximation converges to 0 in probability.
Multivariate delta method
By definition, a consistent estimator B converges in probability to its true value β, and often a central limit theorem can be applied to obtain asymptotic normality:
n
(
B
−
β
)
→
D
N
(
0
,
Σ
)
,
{\displaystyle {\sqrt {n}}\left(B-\beta \right)\,{\xrightarrow {D}}\,N\left(0,\Sigma \right),}
where n is the number of observations and Σ is a (symmetric positive semi-definite) covariance matrix. Suppose we want to estimate the variance of a scalar-valued function h of the estimator B. Keeping only the first two terms of the Taylor series, and using vector notation for the gradient, we can estimate h(B) as
h
(
B
)
≈
h
(
β
)
+
∇
h
(
β
)
T
⋅
(
B
−
β
)
{\displaystyle h(B)\approx h(\beta )+\nabla h(\beta )^{T}\cdot (B-\beta )}
which implies the variance of h(B) is approximately
Var
(
h
(
B
)
)
≈
Var
(
h
(
β
)
+
∇
h
(
β
)
T
⋅
(
B
−
β
)
)
=
Var
(
h
(
β
)
+
∇
h
(
β
)
T
⋅
B
−
∇
h
(
β
)
T
⋅
β
)
=
Var
(
∇
h
(
β
)
T
⋅
B
)
=
∇
h
(
β
)
T
⋅
Cov
(
B
)
⋅
∇
h
(
β
)
=
∇
h
(
β
)
T
⋅
Σ
n
⋅
∇
h
(
β
)
{\displaystyle {\begin{aligned}\operatorname {Var} \left(h(B)\right)&\approx \operatorname {Var} \left(h(\beta )+\nabla h(\beta )^{T}\cdot (B-\beta )\right)\\[5pt]&=\operatorname {Var} \left(h(\beta )+\nabla h(\beta )^{T}\cdot B-\nabla h(\beta )^{T}\cdot \beta \right)\\[5pt]&=\operatorname {Var} \left(\nabla h(\beta )^{T}\cdot B\right)\\[5pt]&=\nabla h(\beta )^{T}\cdot \operatorname {Cov} (B)\cdot \nabla h(\beta )\\[5pt]&=\nabla h(\beta )^{T}\cdot {\frac {\Sigma }{n}}\cdot \nabla h(\beta )\end{aligned}}}
One can use the mean value theorem (for real-valued functions of many variables) to see that this does not rely on taking first order approximation.
The delta method therefore implies that
n
(
h
(
B
)
−
h
(
β
)
)
→
D
N
(
0
,
∇
h
(
β
)
T
⋅
Σ
⋅
∇
h
(
β
)
)
{\displaystyle {\sqrt {n}}\left(h(B)-h(\beta )\right)\,{\xrightarrow {D}}\,N\left(0,\nabla h(\beta )^{T}\cdot \Sigma \cdot \nabla h(\beta )\right)}
or in univariate terms,
n
(
h
(
B
)
−
h
(
β
)
)
→
D
N
(
0
,
σ
2
⋅
(
h
′
(
β
)
)
2
)
.
{\displaystyle {\sqrt {n}}\left(h(B)-h(\beta )\right)\,{\xrightarrow {D}}\,N\left(0,\sigma ^{2}\cdot \left(h^{\prime }(\beta )\right)^{2}\right).}
Example: the binomial proportion
Suppose Xn is binomial with parameters
p
∈
(
0
,
1
]
{\displaystyle p\in (0,1]}
and n. Since
n
[
X
n
n
−
p
]
→
D
N
(
0
,
p
(
1
−
p
)
)
,
{\displaystyle {{\sqrt {n}}\left[{\frac {X_{n}}{n}}-p\right]\,{\xrightarrow {D}}\,N(0,p(1-p))},}
we can apply the Delta method with g(θ) = log(θ) to see
n
[
log
(
X
n
n
)
−
log
(
p
)
]
→
D
N
(
0
,
p
(
1
−
p
)
[
1
/
p
]
2
)
{\displaystyle {{\sqrt {n}}\left[\log \left({\frac {X_{n}}{n}}\right)-\log(p)\right]\,{\xrightarrow {D}}\,N(0,p(1-p)[1/p]^{2})}}
Hence, even though for any finite n, the variance of
log
(
X
n
n
)
{\displaystyle \log \left({\frac {X_{n}}{n}}\right)}
does not actually exist (since Xn can be zero), the asymptotic variance of
log
(
X
n
n
)
{\displaystyle \log \left({\frac {X_{n}}{n}}\right)}
does exist and is equal to
1
−
p
n
p
.
{\displaystyle {\frac {1-p}{np}}.}
Note that since p>0,
Pr
(
X
n
n
>
0
)
→
1
{\displaystyle \Pr \left({\frac {X_{n}}{n}}>0\right)\rightarrow 1}
as
n
→
∞
{\displaystyle n\rightarrow \infty }
, so with probability converging to one,
log
(
X
n
n
)
{\displaystyle \log \left({\frac {X_{n}}{n}}\right)}
is finite for large n.
Moreover, if
p
^
{\displaystyle {\hat {p}}}
and
q
^
{\displaystyle {\hat {q}}}
are estimates of different group rates from independent samples of sizes n and m respectively, then the logarithm of the estimated relative risk
p
^
q
^
{\displaystyle {\frac {\hat {p}}{\hat {q}}}}
has asymptotic variance equal to
1
−
p
p
n
+
1
−
q
q
m
.
{\displaystyle {\frac {1-p}{p\,n}}+{\frac {1-q}{q\,m}}.}
This is useful to construct a hypothesis test or to make a confidence interval for the relative risk.
Alternative form
The delta method is often used in a form that is essentially identical to that above, but without the assumption that Xn or B is asymptotically normal. Often the only context is that the variance is "small". The results then just give approximations to the means and covariances of the transformed quantities. For example, the formulae presented in Klein (1953, p. 258) are:
Var
(
h
r
)
=
∑
i
(
∂
h
r
∂
B
i
)
2
Var
(
B
i
)
+
∑
i
∑
j
≠
i
(
∂
h
r
∂
B
i
)
(
∂
h
r
∂
B
j
)
Cov
(
B
i
,
B
j
)
Cov
(
h
r
,
h
s
)
=
∑
i
(
∂
h
r
∂
B
i
)
(
∂
h
s
∂
B
i
)
Var
(
B
i
)
+
∑
i
∑
j
≠
i
(
∂
h
r
∂
B
i
)
(
∂
h
s
∂
B
j
)
Cov
(
B
i
,
B
j
)
{\displaystyle {\begin{aligned}\operatorname {Var} \left(h_{r}\right)=&\sum _{i}\left({\frac {\partial h_{r}}{\partial B_{i}}}\right)^{2}\operatorname {Var} \left(B_{i}\right)+\sum _{i}\sum _{j\neq i}\left({\frac {\partial h_{r}}{\partial B_{i}}}\right)\left({\frac {\partial h_{r}}{\partial B_{j}}}\right)\operatorname {Cov} \left(B_{i},B_{j}\right)\\\operatorname {Cov} \left(h_{r},h_{s}\right)=&\sum _{i}\left({\frac {\partial h_{r}}{\partial B_{i}}}\right)\left({\frac {\partial h_{s}}{\partial B_{i}}}\right)\operatorname {Var} \left(B_{i}\right)+\sum _{i}\sum _{j\neq i}\left({\frac {\partial h_{r}}{\partial B_{i}}}\right)\left({\frac {\partial h_{s}}{\partial B_{j}}}\right)\operatorname {Cov} \left(B_{i},B_{j}\right)\end{aligned}}}
where hr is the rth element of h(B) and Bi is the ith element of B.
Second-order delta method
When g′(θ) = 0 the delta method cannot be applied. However, if g′′(θ) exists and is not zero, the second-order delta method can be applied. By the Taylor expansion,
n
[
g
(
X
n
)
−
g
(
θ
)
]
=
1
2
n
[
X
n
−
θ
]
2
[
g
″
(
θ
)
]
+
o
p
(
1
)
{\displaystyle n[g(X_{n})-g(\theta )]={\frac {1}{2}}n[X_{n}-\theta ]^{2}\left[g''(\theta )\right]+o_{p}(1)}
, so that the variance of
g
(
X
n
)
{\displaystyle g\left(X_{n}\right)}
relies on up to the 4th moment of
X
n
{\displaystyle X_{n}}
.
The second-order delta method is also useful in conducting a more accurate approximation of
g
(
X
n
)
{\displaystyle g\left(X_{n}\right)}
's distribution when sample size is small.
n
[
g
(
X
n
)
−
g
(
θ
)
]
=
n
[
X
n
−
θ
]
g
′
(
θ
)
+
1
2
n
[
X
n
−
θ
]
2
g
″
(
θ
)
+
o
p
(
1
)
{\displaystyle {\sqrt {n}}[g(X_{n})-g(\theta )]={\sqrt {n}}[X_{n}-\theta ]g'(\theta )+{\frac {1}{2}}{\sqrt {n}}[X_{n}-\theta ]^{2}g''(\theta )+o_{p}(1)}
.
For example, when
X
n
{\displaystyle X_{n}}
follows the standard normal distribution,
g
(
X
n
)
{\displaystyle g\left(X_{n}\right)}
can be approximated as the weighted sum of a standard normal and a chi-square with degree-of-freedom of 1.
Nonparametric delta method
A version of the delta method exists in nonparametric statistics. Let
X
i
∼
F
{\displaystyle X_{i}\sim F}
be an independent and identically distributed random variable with a sample of size
n
{\displaystyle n}
with an empirical distribution function
F
^
n
{\displaystyle {\hat {F}}_{n}}
, and let
T
{\displaystyle T}
be a functional. If
T
{\displaystyle T}
is Hadamard differentiable with respect to the Chebyshev metric, then
T
(
F
^
n
)
−
T
(
F
)
se
^
→
D
N
(
0
,
1
)
{\displaystyle {\frac {T({\hat {F}}_{n})-T(F)}{\widehat {\text{se}}}}\xrightarrow {D} N(0,1)}
where
se
^
=
τ
^
n
{\displaystyle {\widehat {\text{se}}}={\frac {\hat {\tau }}{\sqrt {n}}}}
and
τ
^
2
=
1
n
∑
i
=
1
n
L
^
2
(
X
i
)
{\displaystyle {\hat {\tau }}^{2}={\frac {1}{n}}\sum _{i=1}^{n}{\hat {L}}^{2}(X_{i})}
, with
L
^
(
x
)
=
L
F
^
n
(
δ
x
)
{\displaystyle {\hat {L}}(x)=L_{{\hat {F}}_{n}}(\delta _{x})}
denoting the empirical influence function for
T
{\displaystyle T}
. A nonparametric
(
1
−
α
)
{\displaystyle (1-\alpha )}
pointwise asymptotic confidence interval for
T
(
F
)
{\displaystyle T(F)}
is therefore given by
T
(
F
^
n
)
±
z
α
/
2
se
^
{\displaystyle T({\hat {F}}_{n})\pm z_{\alpha /2}{\widehat {\text{se}}}}
where
z
q
{\displaystyle z_{q}}
denotes the
q
{\displaystyle q}
-quantile of the standard normal. See Wasserman (2006) p. 19f. for details and examples.
See also
Taylor expansions for the moments of functions of random variables
Variance-stabilizing transformation
References
Further reading
Oehlert, G. W. (1992). "A Note on the Delta Method". The American Statistician. 46 (1): 27–29. doi:10.1080/00031305.1992.10475842. JSTOR 2684406.
Wolter, Kirk M. (1985). "Taylor Series Methods". Introduction to Variance Estimation. New York: Springer. pp. 221–247. ISBN 0-387-96119-4.
Wasserman, Larry (2006). All of Nonparametric Statistics. New York: Springer. pp. 19–20. ISBN 0-387-25145-6.
External links
Asmussen, Søren (2005). "Some Applications of the Delta Method" (PDF). Lecture notes. Aarhus University. Archived from the original (PDF) on May 25, 2015.
Feiveson, Alan H. "Explanation of the delta method". Stata Corp.
Kata Kunci Pencarian:
- Sayap delta
- Gong Nekara
- Kalkulus
- Bakteri
- Definisi limit (ε, δ)
- Transformasi Laplace
- Fungsi Green
- Rumpun suku bangsa Austronesia
- Metode Newton
- Pulau Sumba
- Delta method
- Limit of a function
- Delta
- Variance-stabilizing transformation
- Harmonic mean
- Explicit and implicit methods
- Crank–Nicolson method
- Finite difference method
- Quasi-Newton method
- Log-normal distribution