- Source: Central limit theorem for directional statistics
In probability theory, the central limit theorem states conditions under which the average of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed.
Directional statistics is the subdiscipline of statistics that deals with directions (unit vectors in Rn), axes (lines through the origin in Rn) or rotations in Rn. The means and variances of directional quantities are all finite, so that the central limit theorem may be applied to the particular case of directional statistics.
This article will deal only with unit vectors in 2-dimensional space (R2) but the method described can be extended to the general case.
The central limit theorem
A sample of angles
θ
i
{\displaystyle \theta _{i}}
are measured, and since they are indefinite to within a factor of
2
π
{\displaystyle 2\pi }
, the complex definite quantity
z
i
=
e
i
θ
i
=
cos
(
θ
i
)
+
i
sin
(
θ
i
)
{\displaystyle z_{i}=e^{i\theta _{i}}=\cos(\theta _{i})+i\sin(\theta _{i})}
is used as the random variate. The probability distribution from which the sample is drawn may be characterized by its moments, which may be expressed in Cartesian and polar form:
m
n
=
E
(
z
n
)
=
C
n
+
i
S
n
=
R
n
e
i
θ
n
{\displaystyle m_{n}=E(z^{n})=C_{n}+iS_{n}=R_{n}e^{i\theta _{n}}\,}
It follows that:
C
n
=
E
(
cos
(
n
θ
)
)
{\displaystyle C_{n}=E(\cos(n\theta ))\,}
S
n
=
E
(
sin
(
n
θ
)
)
{\displaystyle S_{n}=E(\sin(n\theta ))\,}
R
n
=
|
E
(
z
n
)
|
=
C
n
2
+
S
n
2
{\displaystyle R_{n}=|E(z^{n})|={\sqrt {C_{n}^{2}+S_{n}^{2}}}\,}
θ
n
=
arg
(
E
(
z
n
)
)
{\displaystyle \theta _{n}=\arg(E(z^{n}))\,}
Sample moments for N trials are:
m
n
¯
=
1
N
∑
i
=
1
N
z
i
n
=
C
n
¯
+
i
S
n
¯
=
R
n
¯
e
i
θ
n
¯
{\displaystyle {\overline {m_{n}}}={\frac {1}{N}}\sum _{i=1}^{N}z_{i}^{n}={\overline {C_{n}}}+i{\overline {S_{n}}}={\overline {R_{n}}}e^{i{\overline {\theta _{n}}}}}
where
C
n
¯
=
1
N
∑
i
=
1
N
cos
(
n
θ
i
)
{\displaystyle {\overline {C_{n}}}={\frac {1}{N}}\sum _{i=1}^{N}\cos(n\theta _{i})}
S
n
¯
=
1
N
∑
i
=
1
N
sin
(
n
θ
i
)
{\displaystyle {\overline {S_{n}}}={\frac {1}{N}}\sum _{i=1}^{N}\sin(n\theta _{i})}
R
n
¯
=
1
N
∑
i
=
1
N
|
z
i
n
|
{\displaystyle {\overline {R_{n}}}={\frac {1}{N}}\sum _{i=1}^{N}|z_{i}^{n}|}
θ
n
¯
=
1
N
∑
i
=
1
N
arg
(
z
i
n
)
{\displaystyle {\overline {\theta _{n}}}={\frac {1}{N}}\sum _{i=1}^{N}\arg(z_{i}^{n})}
The vector [
C
1
¯
,
S
1
¯
{\displaystyle {\overline {C_{1}}},{\overline {S_{1}}}}
] may be used as a representation of the sample mean
(
m
1
¯
)
{\displaystyle ({\overline {m_{1}}})}
and may be taken as a 2-dimensional random variate. The bivariate central limit theorem states that the joint probability distribution for
C
1
¯
{\displaystyle {\overline {C_{1}}}}
and
S
1
¯
{\displaystyle {\overline {S_{1}}}}
in the limit of a large number of samples is given by:
[
C
1
¯
,
S
1
¯
]
→
d
N
(
[
C
1
,
S
1
]
,
Σ
/
N
)
{\displaystyle [{\overline {C_{1}}},{\overline {S_{1}}}]{\xrightarrow {d}}{\mathcal {N}}([C_{1},S_{1}],\Sigma /N)}
where
N
(
)
{\displaystyle {\mathcal {N}}()}
is the bivariate normal distribution and
Σ
{\displaystyle \Sigma }
is the covariance matrix for the circular distribution:
Σ
=
[
σ
C
C
σ
C
S
σ
S
C
σ
S
S
]
{\displaystyle \Sigma ={\begin{bmatrix}\sigma _{CC}&\sigma _{CS}\\\sigma _{SC}&\sigma _{SS}\end{bmatrix}}\quad }
σ
C
C
=
E
(
cos
2
θ
)
−
E
(
cos
θ
)
2
{\displaystyle \sigma _{CC}=E(\cos ^{2}\theta )-E(\cos \theta )^{2}\,}
σ
C
S
=
σ
S
C
=
E
(
cos
θ
sin
θ
)
−
E
(
cos
θ
)
E
(
sin
θ
)
{\displaystyle \sigma _{CS}=\sigma _{SC}=E(\cos \theta \sin \theta )-E(\cos \theta )E(\sin \theta )\,}
σ
S
S
=
E
(
sin
2
θ
)
−
E
(
sin
θ
)
2
{\displaystyle \sigma _{SS}=E(\sin ^{2}\theta )-E(\sin \theta )^{2}\,}
Note that the bivariate normal distribution is defined over the entire plane, while the mean is confined to be in the unit ball (on or inside the unit circle). This means that the integral of the limiting (bivariate normal) distribution over the unit ball will not be equal to unity, but rather approach unity as N approaches infinity.
It is desired to state the limiting bivariate distribution in terms of the moments of the distribution.
Covariance matrix in terms of moments
Using multiple angle trigonometric identities
C
2
=
E
(
cos
(
2
θ
)
)
=
E
(
cos
2
θ
−
1
)
=
E
(
1
−
sin
2
θ
)
{\displaystyle C_{2}=E(\cos(2\theta ))=E(\cos ^{2}\theta -1)=E(1-\sin ^{2}\theta )\,}
S
2
=
E
(
sin
(
2
θ
)
)
=
E
(
2
cos
θ
sin
θ
)
{\displaystyle S_{2}=E(\sin(2\theta ))=E(2\cos \theta \sin \theta )\,}
It follows that:
σ
C
C
=
E
(
cos
2
θ
)
−
E
(
cos
θ
)
2
=
1
2
(
1
+
C
2
−
2
C
1
2
)
{\displaystyle \sigma _{CC}=E(\cos ^{2}\theta )-E(\cos \theta )^{2}={\frac {1}{2}}\left(1+C_{2}-2C_{1}^{2}\right)}
σ
C
S
=
E
(
cos
θ
sin
θ
)
−
E
(
cos
θ
)
E
(
sin
θ
)
=
1
2
(
S
2
−
2
C
1
S
1
)
{\displaystyle \sigma _{CS}=E(\cos \theta \sin \theta )-E(\cos \theta )E(\sin \theta )={\frac {1}{2}}\left(S_{2}-2C_{1}S_{1}\right)}
σ
S
S
=
E
(
sin
2
θ
)
−
E
(
sin
θ
)
2
=
1
2
(
1
−
C
2
−
2
S
1
2
)
{\displaystyle \sigma _{SS}=E(\sin ^{2}\theta )-E(\sin \theta )^{2}={\frac {1}{2}}\left(1-C_{2}-2S_{1}^{2}\right)}
The covariance matrix is now expressed in terms of the moments of the circular distribution.
The central limit theorem may also be expressed in terms of the polar components of the mean. If
P
(
C
1
¯
,
S
1
¯
)
d
C
1
¯
d
S
1
¯
{\displaystyle P({\overline {C_{1}}},{\overline {S_{1}}})d{\overline {C_{1}}}d{\overline {S_{1}}}}
is the probability of finding the mean in area element
d
C
1
¯
d
S
1
¯
{\displaystyle d{\overline {C_{1}}}d{\overline {S_{1}}}}
, then that probability may also be written
P
(
R
1
¯
cos
(
θ
1
¯
)
,
R
1
¯
sin
(
θ
1
¯
)
)
R
1
¯
d
R
1
¯
d
θ
1
¯
{\displaystyle P({\overline {R_{1}}}\cos({\overline {\theta _{1}}}),{\overline {R_{1}}}\sin({\overline {\theta _{1}}})){\overline {R_{1}}}d{\overline {R_{1}}}d{\overline {\theta _{1}}}}
.
References
Kata Kunci Pencarian:
- Central limit theorem for directional statistics
- Central limit theorem
- Directional statistics
- List of statistics articles
- Circular uniform distribution
- Multivariate normal distribution
- Path analysis (statistics)
- Alternative hypothesis
- Stochastic calculus
- Mathematics education in the United States