- Source: V-statistic
V-statistics are a class of statistics named for Richard von Mises who developed their asymptotic distribution theory in a fundamental paper in 1947. V-statistics are closely related to U-statistics (U for "unbiased") introduced by Wassily Hoeffding in 1948. A V-statistic is a statistical function (of a sample) defined by a particular statistical functional of a probability distribution.
Statistical functions
Statistics that can be represented as functionals
T
(
F
n
)
{\displaystyle T(F_{n})}
of the empirical distribution function
(
F
n
)
{\displaystyle (F_{n})}
are called statistical functionals. Differentiability of the functional T plays a key role in the von Mises approach; thus von Mises considers differentiable statistical functionals.
= Examples of statistical functions
=The k-th central moment is the functional
T
(
F
)
=
∫
(
x
−
μ
)
k
d
F
(
x
)
{\displaystyle T(F)=\int (x-\mu )^{k}\,dF(x)}
, where
μ
=
E
[
X
]
{\displaystyle \mu =E[X]}
is the expected value of X. The associated statistical function is the sample k-th central moment,
T
n
=
m
k
=
T
(
F
n
)
=
1
n
∑
i
=
1
n
(
x
i
−
x
¯
)
k
.
{\displaystyle T_{n}=m_{k}=T(F_{n})={\frac {1}{n}}\sum _{i=1}^{n}(x_{i}-{\overline {x}})^{k}.}
The chi-squared goodness-of-fit statistic is a statistical function T(Fn), corresponding to the statistical functional
T
(
F
)
=
∑
i
=
1
k
(
∫
A
i
d
F
−
p
i
)
2
p
i
,
{\displaystyle T(F)=\sum _{i=1}^{k}{\frac {(\int _{A_{i}}\,dF-p_{i})^{2}}{p_{i}}},}
where Ai are the k cells and pi are the specified probabilities of the cells under the null hypothesis.
The Cramér–von-Mises and Anderson–Darling goodness-of-fit statistics are based on the functional
T
(
F
)
=
∫
(
F
(
x
)
−
F
0
(
x
)
)
2
w
(
x
;
F
0
)
d
F
0
(
x
)
,
{\displaystyle T(F)=\int (F(x)-F_{0}(x))^{2}\,w(x;F_{0})\,dF_{0}(x),}
where w(x; F0) is a specified weight function and F0 is a specified null distribution. If w is the identity function then T(Fn) is the well known Cramér–von-Mises goodness-of-fit statistic; if
w
(
x
;
F
0
)
=
[
F
0
(
x
)
(
1
−
F
0
(
x
)
)
]
−
1
{\displaystyle w(x;F_{0})=[F_{0}(x)(1-F_{0}(x))]^{-1}}
then T(Fn) is the Anderson–Darling statistic.
= Representation as a V-statistic
=Suppose x1, ..., xn is a sample. In typical applications the statistical function has a representation as the V-statistic
V
m
n
=
1
n
m
∑
i
1
=
1
n
⋯
∑
i
m
=
1
n
h
(
x
i
1
,
x
i
2
,
…
,
x
i
m
)
,
{\displaystyle V_{mn}={\frac {1}{n^{m}}}\sum _{i_{1}=1}^{n}\cdots \sum _{i_{m}=1}^{n}h(x_{i_{1}},x_{i_{2}},\dots ,x_{i_{m}}),}
where h is a symmetric kernel function. Serfling discusses how to find the kernel in practice. Vmn is called a V-statistic of degree m.
A symmetric kernel of degree 2 is a function h(x, y), such that h(x, y) = h(y, x) for all x and y in the domain of h. For samples x1, ..., xn, the corresponding V-statistic is defined
V
2
,
n
=
1
n
2
∑
i
=
1
n
∑
j
=
1
n
h
(
x
i
,
x
j
)
.
{\displaystyle V_{2,n}={\frac {1}{n^{2}}}\sum _{i=1}^{n}\sum _{j=1}^{n}h(x_{i},x_{j}).}
= Example of a V-statistic
=An example of a degree-2 V-statistic is the second central moment m2.
If h(x, y) = (x − y)2/2, the corresponding V-statistic is
V
2
,
n
=
1
n
2
∑
i
=
1
n
∑
j
=
1
n
1
2
(
x
i
−
x
j
)
2
=
1
n
∑
i
=
1
n
(
x
i
−
x
¯
)
2
,
{\displaystyle V_{2,n}={\frac {1}{n^{2}}}\sum _{i=1}^{n}\sum _{j=1}^{n}{\frac {1}{2}}(x_{i}-x_{j})^{2}={\frac {1}{n}}\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2},}
which is the maximum likelihood estimator of variance. With the same kernel, the corresponding U-statistic is the (unbiased) sample variance:
s
2
=
(
n
2
)
−
1
∑
i
<
j
1
2
(
x
i
−
x
j
)
2
=
1
n
−
1
∑
i
=
1
n
(
x
i
−
x
¯
)
2
{\displaystyle s^{2}={n \choose 2}^{-1}\sum _{i
.
Asymptotic distribution
In examples 1–3, the asymptotic distribution of the statistic is different: in (1) it is normal, in (2) it is chi-squared, and in (3) it is a weighted sum of chi-squared variables.
Von Mises' approach is a unifying theory that covers all of the cases above. Informally, the type of asymptotic distribution of a statistical function depends on the order of "degeneracy," which is determined by which term is the first non-vanishing term in the Taylor expansion of the functional T. In case it is the linear term, the limit distribution is normal; otherwise higher order types of distributions arise (under suitable conditions such that a central limit theorem holds).
There are a hierarchy of cases parallel to asymptotic theory of U-statistics. Let A(m) be the property defined by:
A(m):
Var(h(X1, ..., Xk)) = 0 for k < m, and Var(h(X1, ..., Xk)) > 0 for k = m;
nm/2Rmn tends to zero (in probability). (Rmn is the remainder term in the Taylor series for T.)
Case m = 1 (Non-degenerate kernel):
If A(1) is true, the statistic is a sample mean and the Central Limit Theorem implies that T(Fn) is asymptotically normal.
In the variance example (4), m2 is asymptotically normal with mean
σ
2
{\displaystyle \sigma ^{2}}
and variance
(
μ
4
−
σ
4
)
/
n
{\displaystyle (\mu _{4}-\sigma ^{4})/n}
, where
μ
4
=
E
(
X
−
E
(
X
)
)
4
{\displaystyle \mu _{4}=E(X-E(X))^{4}}
.
Case m = 2 (Degenerate kernel):
Suppose A(2) is true, and
E
[
h
2
(
X
1
,
X
2
)
]
<
∞
,
E
|
h
(
X
1
,
X
1
)
|
<
∞
,
{\displaystyle E[h^{2}(X_{1},X_{2})]<\infty ,\,E|h(X_{1},X_{1})|<\infty ,}
and
E
[
h
(
x
,
X
1
)
]
≡
0
{\displaystyle E[h(x,X_{1})]\equiv 0}
. Then nV2,n converges in distribution to a weighted sum of independent chi-squared variables:
n
V
2
,
n
⟶
d
∑
k
=
1
∞
λ
k
Z
k
2
,
{\displaystyle nV_{2,n}{\stackrel {d}{\longrightarrow }}\sum _{k=1}^{\infty }\lambda _{k}Z_{k}^{2},}
where
Z
k
{\displaystyle Z_{k}}
are independent standard normal variables and
λ
k
{\displaystyle \lambda _{k}}
are constants that depend on the distribution F and the functional T. In this case the asymptotic distribution is called a quadratic form of centered Gaussian random variables. The statistic V2,n is called a degenerate kernel V-statistic. The V-statistic associated with the Cramer–von Mises functional (Example 3) is an example of a degenerate kernel V-statistic.
See also
U-statistic
Asymptotic distribution
Asymptotic theory (statistics)
Notes
References
Kata Kunci Pencarian:
- Agama di Malaysia
- Generasi Z
- Singapura
- SMA Unggul Del
- Statistika
- Wabah Yustinianus
- Community Shield Indonesia 2010
- Pandemi Covid-19 di Michigan
- Arema Indonesia musim 2011–12
- KLIA Transit
- V-statistic
- Statistic
- U-statistic
- Sufficient statistic
- List of statistics articles
- N50, L50, and related statistics
- F-test
- Durbin–Watson statistic
- Kolmogorov–Smirnov test
- DSM-5