- Source: Ball divergence
- Aaron Kwok
- Argentina
- Unta
- Foraminifera
- Daftar karya tentang Perusahaan Hindia Timur Belanda
- Ball divergence
- Divergence
- Divergence theorem
- Bregman divergence
- List of Dragon Ball characters
- Ball python
- Football
- Volume of an n-ball
- Divergence-from-randomness model
- Powerhouse Animation Studios
Artikel: Ball divergence GudangMovies21 Rebahinxxi
Ball divergence is a non-parametric two-sample statistical test method in metric spaces. It measures the difference between two population probability distributions by integrating the difference over all balls in the space. Therefore, its value is zero if and only if the two probability measures are the same. Similar to common non-parametric test methods, ball divergence calculates the p-value through permutation tests.
Background
Distinguishing between two unknown samples in multivariate data is an important and challenging task. Previously, a more common non-parametric two-sample test method was the energy distance test. However, the effectiveness of the energy distance test relies on the assumption of moment conditions, making it less effective for extremely imbalanced data (where one sample size is disproportionately larger than the other). To address this issue, Chen, Dou, and Qiao proposed a non-parametric multivariate test method using ensemble subsampling nearest neighbors (ESS-NN) for imbalanced data. This method effectively handles imbalanced data and increases the test's power by fixing the size of the smaller group while increasing the size of the larger group.
Additionally, Gretton et al. introduced the maximum mean discrepancy (MMD) for the two-sample problem. Both methods require additional parameter settings, such as the number of groups 𝑘 in ESS-NN and the kernel function in MMD. Ball divergence addresses the two-sample test problem for extremely imbalanced samples without introducing other parameters.
Definition
Let's start with the population ball divergence. Suppose that we have a metric space (
V
,
‖
⋅
‖
{\displaystyle V,\|\cdot \|}
), where norm
‖
⋅
‖
{\displaystyle \|\cdot \|}
introduces a metric
ρ
{\displaystyle \rho }
for two point
u
,
v
{\displaystyle u,v}
in space
V
{\displaystyle V}
by
ρ
(
u
,
v
)
=
‖
u
−
v
‖
{\displaystyle \rho (u,v)=\|u-v\|}
. Besides, we use
B
¯
(
u
,
ρ
(
u
,
v
)
)
{\displaystyle {\bar {B}}(u,\rho (u,v))}
to show a closed ball with the center
u
{\displaystyle u}
and radius
ρ
(
u
,
v
)
{\displaystyle \rho (u,v)}
. Then, the population ball divergence of Borel probability measures
μ
,
ν
{\displaystyle \mu ,\nu }
is
B
D
(
μ
,
ν
)
=
∬
V
×
V
[
μ
−
ν
]
2
(
B
¯
(
u
,
ρ
(
u
,
v
)
)
)
(
μ
(
d
u
)
μ
(
d
v
)
+
ν
(
d
u
)
ν
(
d
u
)
)
.
{\displaystyle BD(\mu ,\nu )=\iint _{\mathrm {V} \times \mathrm {V} }[\mu -\nu ]^{2}({\bar {B}}(u,\rho (u,v)))(\mu (du)\mu (dv)+\nu (du)\nu (du)).}
For convenience, we can decompose the Ball Divergence into two parts:
A
=
∬
V
×
V
[
μ
−
ν
]
2
(
B
¯
(
u
,
ρ
(
u
,
v
)
)
)
μ
(
d
u
)
μ
(
d
v
)
,
{\displaystyle A=\iint _{V\times V}[\mu -\nu ]^{2}({\bar {B}}(u,\rho (u,v)))\mu (du)\mu (dv),}
and
C
=
∬
V
×
V
[
μ
−
ν
]
2
(
B
¯
(
u
,
ρ
(
u
,
v
)
)
)
ν
(
d
u
)
ν
(
d
v
)
.
{\displaystyle C=\iint _{V\times V}[\mu -\nu ]^{2}({\bar {B}}(u,\rho (u,v)))\nu (du)\nu (dv).}
Thus
B
D
(
μ
,
ν
)
=
A
+
C
.
{\displaystyle BD(\mu ,\nu )=A+C.}
Next, we will introduce the sample ball divergence. Let
δ
(
x
,
y
,
z
)
=
I
(
z
∈
B
¯
(
x
,
ρ
(
x
,
y
)
)
)
{\displaystyle \delta (x,y,z)=I(z\in {\bar {B}}(x,\rho (x,y)))}
denote whether point
z
{\displaystyle z}
locates in the ball
B
¯
(
x
,
ρ
(
x
,
y
)
)
{\displaystyle {\bar {B}}(x,\rho (x,y))}
. Given two independent samples
{
X
1
,
…
,
X
n
}
{\displaystyle \{X_{1},\ldots ,X_{n}\}}
form
μ
{\displaystyle \mu }
and
{
Y
1
,
…
,
Y
m
}
{\displaystyle \{Y_{1},\ldots ,Y_{m}\}}
form
ν
{\displaystyle \nu }
A
i
j
X
=
1
n
∑
u
=
1
n
δ
(
X
i
,
X
j
,
X
u
)
,
A
i
j
Y
=
1
m
∑
v
=
1
m
δ
(
X
i
,
X
j
,
Y
v
)
,
C
k
l
X
=
1
n
∑
u
=
1
n
δ
(
Y
k
,
Y
l
,
X
u
)
,
C
i
j
Y
=
1
m
∑
v
=
1
m
δ
(
Y
k
,
Y
l
,
Y
v
)
,
{\displaystyle {\begin{aligned}&A_{ij}^{X}={\frac {1}{n}}\sum _{u=1}^{n}\delta \left(X_{i},X_{j},X_{u}\right),A_{ij}^{Y}={\frac {1}{m}}\sum _{v=1}^{m}\delta \left(X_{i},X_{j},Y_{v}\right),\\&C_{kl}^{X}={\frac {1}{n}}\sum _{u=1}^{n}\delta \left(Y_{k},Y_{l},X_{u}\right),C_{ij}^{Y}={\frac {1}{m}}\sum _{v=1}^{m}\delta \left(Y_{k},Y_{l},Y_{v}\right),\end{aligned}}}
where
A
i
j
X
{\displaystyle A_{ij}^{X}}
means the proportion of samples from the probability measure
μ
{\displaystyle \mu }
located in the ball
B
¯
(
X
i
,
ρ
(
X
i
,
X
j
)
)
{\displaystyle {\bar {B}}\left(X_{i},\rho \left(X_{i},X_{j}\right)\right)}
and
A
i
j
Y
{\displaystyle A_{ij}^{Y}}
means the proportion of samples from the probability measure
ν
{\displaystyle \nu }
located in the ball
B
¯
(
X
i
,
ρ
(
X
i
,
X
j
)
)
{\displaystyle {\bar {B}}\left(X_{i},\rho \left(X_{i},X_{j}\right)\right)}
. Meanwhile,
C
i
j
X
{\displaystyle C_{ij}^{X}}
and
C
i
j
Y
{\displaystyle C_{ij}^{Y}}
means the proportion of samples from the probability measure
μ
{\displaystyle \mu }
and
ν
{\displaystyle \nu }
located in the ball
B
¯
(
Y
i
,
ρ
(
Y
i
,
Y
j
)
)
{\displaystyle {\bar {B}}\left(Y_{i},\rho \left(Y_{i},Y_{j}\right)\right)}
. The sample versions of
A
{\displaystyle A}
and
C
{\displaystyle C}
are as follows
A
n
,
m
=
1
n
2
∑
i
,
j
=
1
n
(
A
i
j
X
−
A
i
j
Y
)
2
,
C
n
,
m
=
1
m
2
∑
k
,
l
=
1
m
(
C
k
l
X
−
C
k
l
Y
)
2
.
{\displaystyle A_{n,m}={\frac {1}{n^{2}}}\sum _{i,j=1}^{n}\left(A_{ij}^{X}-A_{ij}^{Y}\right)^{2},\qquad C_{n,m}={\frac {1}{m^{2}}}\sum _{k,l=1}^{m}\left(C_{kl}^{X}-C_{kl}^{Y}\right)^{2}.}
Finally, we can give the sample ball divergence
B
D
n
,
m
=
A
n
,
m
+
C
n
,
m
.
{\displaystyle BD_{n,m}=A_{n,m}+C_{n,m}.}
Properties
{\displaystyle }
1.
B
D
(
μ
,
ν
)
≥
0
{\displaystyle BD(\mu ,\nu )\geq 0}
, where the equality holds if and only if
μ
=
ν
{\displaystyle \mu =\nu }
.
2. The square root of Ball Divergence does not satisfy the triangle inequality, so it is a symmetric divergence but not a metric.
3. BD can be generalized to the K-sample problem.Suppose that
μ
1
,
…
,
μ
K
{\displaystyle \mu _{1},\ldots ,\mu _{K}}
are
K
{\displaystyle K}
measures on Banach space.We can define that
D
(
μ
1
,
…
,
μ
K
)
=
∑
1
≤
l
≤
k
≤
K
∬
V
×
V
[
μ
k
−
μ
l
]
2
(
B
¯
(
u
,
ρ
(
u
,
v
)
)
)
(
μ
(
d
u
)
μ
(
d
v
)
+
ν
(
d
u
)
ν
(
d
u
)
)
.
{\displaystyle D(\mu _{1},\ldots ,\mu _{K})=\sum _{1\leq l\leq k\leq K}\iint _{\mathrm {V} \times \mathrm {V} }[\mu _{k}-\mu _{l}]^{2}({\bar {B}}(u,\rho (u,v)))(\mu (du)\mu (dv)+\nu (du)\nu (du)).}
Clearly, D(\mu_1, \ldots, \mu_K)=0 if and only if
μ
1
=
…
=
μ
K
{\displaystyle \mu _{1}=\ldots =\mu _{K}}
.
3.Consistency: We have
D
n
,
m
→
n
,
m
→
∞
a.s.
D
(
μ
,
v
)
,
{\displaystyle D_{n,m}{\xrightarrow[{n,m\rightarrow \infty }]{\text{ a.s. }}}D(\mu ,v),}
where
n
n
+
m
→
τ
{\displaystyle {\frac {n}{n+m}}\rightarrow \tau }
for some
τ
∈
[
0
,
1
]
{\displaystyle \tau \in [0,1]}
.
Define
ξ
(
x
,
y
,
z
1
,
z
2
)
=
δ
(
x
,
y
,
z
1
)
⋅
δ
(
x
,
y
,
z
2
)
{\displaystyle \xi (x,y,z_{1},z_{2})=\delta (x,y,z_{1})\cdot \delta (x,y,z_{2})}
, and then let
Q
(
x
,
y
;
x
′
,
y
′
)
=
(
ϕ
A
(
2
,
0
)
(
x
,
x
′
)
+
ϕ
A
(
1
,
1
)
(
x
,
y
)
+
ϕ
A
(
1
,
1
)
(
x
′
,
y
′
)
+
ϕ
A
(
0
,
2
)
(
y
,
y
′
)
)
,
{\displaystyle Q\left(x,y;x^{\prime },y^{\prime }\right)=\left(\phi _{A}^{(2,0)}\left(x,x^{\prime }\right)+\phi _{A}^{(1,1)}(x,y)+\phi _{A}^{(1,1)}\left(x^{\prime },y^{\prime }\right)+\phi _{A}^{(0,2)}\left(y,y^{\prime }\right)\right),}
where
ϕ
A
(
2
,
0
)
(
x
,
x
′
)
=
E
[
ξ
(
X
1
,
X
2
,
x
,
x
′
)
]
+
E
[
ξ
(
X
1
,
X
2
,
Y
,
Y
3
)
]
−
E
[
ξ
(
X
1
,
X
2
,
x
,
Y
)
]
−
E
[
ξ
(
X
1
,
X
2
,
x
′
,
Y
3
)
]
ϕ
A
(
1
,
1
)
(
x
,
y
)
=
E
[
ξ
(
X
1
,
X
2
,
x
,
X
3
)
]
+
E
[
ξ
(
X
1
,
X
2
,
y
,
Y
3
)
]
−
E
[
ξ
(
X
1
,
X
2
,
x
,
y
)
]
−
E
[
ξ
(
X
1
,
X
2
,
X
3
,
Y
3
)
]
ϕ
A
(
0
,
2
)
(
y
,
y
′
)
=
E
[
ξ
(
X
1
,
X
2
,
X
,
X
3
)
]
+
E
[
ξ
(
X
1
,
X
2
,
y
,
y
′
)
]
−
E
[
ξ
(
X
1
,
X
2
,
X
,
y
)
]
−
E
[
ξ
(
X
1
,
X
2
,
X
,
y
′
)
]
.
{\displaystyle {\begin{aligned}\phi _{A}^{(2,0)}\left(x,x^{\prime }\right)=&E\left[\xi \left(X_{1},X_{2},x,x^{\prime }\right)\right]+E\left[\xi \left(X_{1},X_{2},Y,Y_{3}\right)\right]\\&-E\left[\xi \left(X_{1},X_{2},x,Y\right)\right]-E\left[\xi \left(X_{1},X_{2},x^{\prime },Y_{3}\right)\right]\\\phi _{A}^{(1,1)}(x,y)=&E\left[\xi \left(X_{1},X_{2},x,X_{3}\right)\right]+E\left[\xi \left(X_{1},X_{2},y,Y_{3}\right)\right]\\&-E\left[\xi \left(X_{1},X_{2},x,y\right)\right]-E\left[\xi \left(X_{1},X_{2},X_{3},Y_{3}\right)\right]\\\phi _{A}^{(0,2)}\left(y,y^{\prime }\right)=&E\left[\xi \left(X_{1},X_{2},X,X_{3}\right)\right]+E\left[\xi \left(X_{1},X_{2},y,y^{\prime }\right)\right]\\&-E\left[\xi \left(X_{1},X_{2},X,y\right)\right]-E\left[\xi \left(X_{1},X_{2},X,y^{\prime }\right)\right].\end{aligned}}}
The function
Q
(
x
,
y
;
x
′
,
y
′
)
{\displaystyle Q\left(x,y;x^{\prime },y^{\prime }\right)}
has spectral decomposition:
Q
(
x
,
y
;
x
′
,
y
′
)
=
∑
k
=
1
∞
λ
k
f
k
(
x
,
y
)
f
k
(
x
′
,
y
′
)
,
{\displaystyle Q\left(x,y;x^{\prime },y^{\prime }\right)=\sum _{k=1}^{\infty }\lambda _{k}f_{k}(x,y)f_{k}\left(x^{\prime },y^{\prime }\right),}
where
λ
k
{\displaystyle \lambda _{k}}
and
f
k
{\displaystyle f_{k}}
are the eigenvalues and eigenfunctions of
Q
{\displaystyle Q}
. For
k
=
1
,
2
,
…
{\displaystyle k=1,2,\ldots }
,
Z
1
k
,
Z
2
k
{\displaystyle Z_{1k},Z_{2k}}
are i.i.d.
N
(
0
,
1
)
{\displaystyle N(0,1)}
, and
a
k
2
(
τ
)
=
(
1
−
τ
)
E
X
[
E
Y
f
k
(
X
,
Y
)
]
2
,
b
k
2
(
τ
)
=
τ
E
Y
[
E
X
f
k
(
X
,
Y
)
]
2
,
θ
=
2
E
[
E
(
δ
(
X
1
,
X
2
,
X
)
(
1
−
δ
(
X
1
,
X
2
,
Y
)
)
∣
X
1
,
X
2
)
]
.
{\displaystyle {\begin{aligned}a_{k}^{2}(\tau )&=(1-\tau )E_{X}\left[E_{Y}f_{k}(X,Y)\right]^{2},\quad b_{k}^{2}(\tau )=\tau E_{Y}\left[E_{X}f_{k}(X,Y)\right]^{2},\\\theta &=2E\left[E\left(\delta \left(X_{1},X_{2},X\right)\left(1-\delta \left(X_{1},X_{2},Y\right)\right)\mid X_{1},X_{2}\right)\right].\end{aligned}}}
4.Asymptotic distribution under the null hypothesis: Suppose that both
n
{\displaystyle n}
and
m
→
∞
{\displaystyle m\rightarrow \infty }
in such a way that
n
n
+
m
→
τ
,
0
≤
τ
≤
1
{\displaystyle {\frac {n}{n+m}}\rightarrow \tau ,0\leq \tau \leq 1}
. Under the null hypothesis, we have
n
m
n
+
m
B
D
n
,
m
→
n
→
∞
d
∑
k
=
1
∞
2
λ
k
[
(
a
k
(
τ
)
Z
1
k
+
b
k
(
τ
)
Z
2
k
)
2
−
(
a
k
2
(
τ
)
+
b
k
2
(
τ
)
)
]
+
θ
.
{\displaystyle {\frac {nm}{n+m}}BD_{n,m}{\xrightarrow[{n\rightarrow \infty }]{d}}\sum _{k=1}^{\infty }2\lambda _{k}\left[\left(a_{k}(\tau )Z_{1k}+b_{k}(\tau )Z_{2k}\right)^{2}-\left(a_{k}^{2}(\tau )+b_{k}^{2}(\tau )\right)\right]+\theta {\text{. }}}
5. Distribution under the alternative hypothesis: let
δ
1
,
0
2
=
Var
(
g
(
1
,
0
)
(
X
)
)
and
δ
0
,
1
2
=
Var
(
g
(
0
,
1
)
(
Y
)
)
.
{\displaystyle \delta _{1,0}^{2}=\operatorname {Var} \left(g^{(1,0)}(X)\right)\quad {\text{ and }}\quad \delta _{0,1}^{2}=\operatorname {Var} \left(g^{(0,1)}(Y)\right).}
Suppose that both
n
{\displaystyle n}
and
m
→
∞
{\displaystyle m\rightarrow \infty }
in such a way that
n
n
+
m
→
τ
,
0
≤
τ
≤
1
{\displaystyle {\frac {n}{n+m}}\rightarrow \tau ,0\leq \tau \leq 1}
. Under the alternative hypothesis, we have
n
m
n
+
m
(
B
D
n
,
m
−
B
D
(
μ
,
ν
)
)
d
n
→
∞
N
(
0
,
(
1
−
τ
)
δ
1
,
0
2
+
τ
δ
0
,
1
2
)
.
{\displaystyle {\sqrt {\frac {nm}{n+m}}}\left(BD_{n,m}-BD(\mu ,\nu )\right){\underset {n\rightarrow \infty }{d}}N\left(0,(1-\tau )\delta _{1,0}^{2}+\tau \delta _{0,1}^{2}\right).}
6. The test based on
D
n
,
m
{\displaystyle D_{n,m}}
is consistent against any general alternative
H
1
{\displaystyle H_{1}}
. More specifically,
lim
n
→
∞
Var
H
1
(
D
n
,
m
)
=
0
{\displaystyle \lim _{n\rightarrow \infty }\operatorname {Var} _{H_{1}}\left(D_{n,m}\right)=0}
and
Δ
(
η
)
:=
lim inf
n
→
∞
(
E
H
1
D
n
,
m
−
E
H
0
D
n
,
m
)
>
0.
{\displaystyle \Delta (\eta ):=\liminf _{n\rightarrow \infty }\left(E_{H_{1}}D_{n,m}-E_{H_{0}}D_{n,m}\right)>0.}
More importantly,
Δ
(
η
)
{\displaystyle \Delta (\eta )}
can also be expressed as
Δ
(
η
)
≡
D
(
μ
,
ν
)
,
{\displaystyle \Delta (\eta )\equiv D(\mu ,\nu ),}
which is independent of
η
{\displaystyle \eta }
.