- Source: Bhattacharyya distance
In statistics, the Bhattacharyya distance is a quantity which represents a notion of similarity between two probability distributions. It is closely related to the Bhattacharyya coefficient, which is a measure of the amount of overlap between two statistical samples or populations.
It is not a metric, despite being named a "distance", since it does not obey the triangle inequality.
History
Both the Bhattacharyya distance and the Bhattacharyya coefficient are named after Anil Kumar Bhattacharyya, a statistician who worked in the 1930s at the Indian Statistical Institute. He has developed this through a series of papers. He developed the method to measure the distance between two non-normal distributions and illustrated this with the classical multinomial populations, this work despite being submitted for publication in 1941, appeared almost five years later in Sankhya. Consequently, Professor Bhattacharyya started working toward developing a distance metric for probability distributions that are absolutely continuous with respect to the Lebesgue measure and published his progress in 1942, at Proceedings of the Indian Science Congress and the final work has appeared in 1943 in the Bulletin of the Calcutta Mathematical Society.
Definition
For probability distributions
P
{\displaystyle P}
and
Q
{\displaystyle Q}
on the same domain
X
{\displaystyle {\mathcal {X}}}
, the Bhattacharyya distance is defined as
D
B
(
P
,
Q
)
=
−
ln
(
B
C
(
P
,
Q
)
)
{\displaystyle D_{B}(P,Q)=-\ln \left(BC(P,Q)\right)}
where
B
C
(
P
,
Q
)
=
∑
x
∈
X
P
(
x
)
Q
(
x
)
{\displaystyle BC(P,Q)=\sum _{x\in {\mathcal {X}}}{\sqrt {P(x)Q(x)}}}
is the Bhattacharyya coefficient for discrete probability distributions.
For continuous probability distributions, with
P
(
d
x
)
=
p
(
x
)
d
x
{\displaystyle P(dx)=p(x)dx}
and
Q
(
d
x
)
=
q
(
x
)
d
x
{\displaystyle Q(dx)=q(x)dx}
where
p
(
x
)
{\displaystyle p(x)}
and
q
(
x
)
{\displaystyle q(x)}
are the probability density functions, the Bhattacharyya coefficient is defined as
B
C
(
P
,
Q
)
=
∫
X
p
(
x
)
q
(
x
)
d
x
{\displaystyle BC(P,Q)=\int _{\mathcal {X}}{\sqrt {p(x)q(x)}}\,dx}
.
More generally, given two probability measures
P
,
Q
{\displaystyle P,Q}
on a measurable space
(
X
,
B
)
{\displaystyle ({\mathcal {X}},{\mathcal {B}})}
, let
λ
{\displaystyle \lambda }
be a (sigma finite) measure such that
P
{\displaystyle P}
and
Q
{\displaystyle Q}
are absolutely continuous with respect to
λ
{\displaystyle \lambda }
i.e. such that
P
(
d
x
)
=
p
(
x
)
λ
(
d
x
)
{\displaystyle P(dx)=p(x)\lambda (dx)}
, and
Q
(
d
x
)
=
q
(
x
)
λ
(
d
x
)
{\displaystyle Q(dx)=q(x)\lambda (dx)}
for probability density functions
p
,
q
{\displaystyle p,q}
with respect to
λ
{\displaystyle \lambda }
defined
λ
{\displaystyle \lambda }
-almost everywhere. Such a measure, even such a probability measure, always exists, e.g.
λ
=
1
2
(
P
+
Q
)
{\displaystyle \lambda ={\tfrac {1}{2}}(P+Q)}
. Then define the Bhattacharyya measure on
(
X
,
B
)
{\displaystyle ({\mathcal {X}},{\mathcal {B}})}
by
b
c
(
d
x
|
P
,
Q
)
=
p
(
x
)
q
(
x
)
λ
(
d
x
)
=
P
(
d
x
)
λ
(
d
x
)
(
x
)
Q
(
d
x
)
λ
(
d
x
)
(
x
)
λ
(
d
x
)
.
{\displaystyle bc(dx|P,Q)={\sqrt {p(x)q(x)}}\,\lambda (dx)={\sqrt {{\frac {P(dx)}{\lambda (dx)}}(x){\frac {Q(dx)}{\lambda (dx)}}(x)}}\lambda (dx).}
It does not depend on the measure
λ
{\displaystyle \lambda }
, for if we choose a measure
μ
{\displaystyle \mu }
such that
λ
{\displaystyle \lambda }
and an other measure choice
λ
′
{\displaystyle \lambda '}
are absolutely continuous i.e.
λ
=
l
(
x
)
μ
{\displaystyle \lambda =l(x)\mu }
and
λ
′
=
l
′
(
x
)
μ
{\displaystyle \lambda '=l'(x)\mu }
, then
P
(
d
x
)
=
p
(
x
)
λ
(
d
x
)
=
p
′
(
x
)
λ
′
(
d
x
)
=
p
(
x
)
l
(
x
)
μ
(
d
x
)
=
p
′
(
x
)
l
′
(
x
)
μ
(
d
x
)
{\displaystyle P(dx)=p(x)\lambda (dx)=p'(x)\lambda '(dx)=p(x)l(x)\mu (dx)=p'(x)l'(x)\mu (dx)}
,
and similarly for
Q
{\displaystyle Q}
. We then have
b
c
(
d
x
|
P
,
Q
)
=
p
(
x
)
q
(
x
)
λ
(
d
x
)
=
p
(
x
)
q
(
x
)
l
(
x
)
μ
(
x
)
=
p
(
x
)
l
(
x
)
q
(
x
)
l
(
x
)
μ
(
d
x
)
=
p
′
(
x
)
l
′
(
x
)
q
′
(
x
)
l
′
(
x
)
μ
(
d
x
)
=
p
′
(
x
)
q
′
(
x
)
λ
′
(
d
x
)
{\displaystyle bc(dx|P,Q)={\sqrt {p(x)q(x)}}\,\lambda (dx)={\sqrt {p(x)q(x)}}\,l(x)\mu (x)={\sqrt {p(x)l(x)q(x)\,l(x)}}\mu (dx)={\sqrt {p'(x)l'(x)q'(x)l'(x)}}\,\mu (dx)={\sqrt {p'(x)q'(x)}}\,\lambda '(dx)}
.
We finally define the Bhattacharyya coefficient
B
C
(
P
,
Q
)
=
∫
X
b
c
(
d
x
|
P
,
Q
)
=
∫
X
p
(
x
)
q
(
x
)
λ
(
d
x
)
{\displaystyle BC(P,Q)=\int _{\mathcal {X}}bc(dx|P,Q)=\int _{\mathcal {X}}{\sqrt {p(x)q(x)}}\,\lambda (dx)}
.
By the above, the quantity
B
C
(
P
,
Q
)
{\displaystyle BC(P,Q)}
does not depend on
λ
{\displaystyle \lambda }
, and by the Cauchy inequality
0
≤
B
C
(
P
,
Q
)
≤
1
{\displaystyle 0\leq BC(P,Q)\leq 1}
. Using
P
(
d
x
)
=
p
(
x
)
λ
(
d
x
)
{\displaystyle P(dx)=p(x)\lambda (dx)}
, and
Q
(
d
x
)
=
q
(
x
)
λ
(
d
x
)
{\displaystyle Q(dx)=q(x)\lambda (dx)}
,
B
C
(
P
,
Q
)
=
∫
X
p
(
x
)
q
(
x
)
Q
(
d
x
)
=
∫
X
P
(
d
x
)
Q
(
d
x
)
Q
(
d
x
)
=
E
Q
[
P
(
d
x
)
Q
(
d
x
)
]
{\displaystyle BC(P,Q)=\int _{\mathcal {X}}{\sqrt {\frac {p(x)}{q(x)}}}Q(dx)=\int _{\mathcal {X}}{\sqrt {\frac {P(dx)}{Q(dx)}}}Q(dx)=E_{Q}\left[{\sqrt {\frac {P(dx)}{Q(dx)}}}\right]}
= Gaussian case
=Let
p
∼
N
(
μ
p
,
σ
p
2
)
{\displaystyle p\sim {\mathcal {N}}(\mu _{p},\sigma _{p}^{2})}
,
q
∼
N
(
μ
q
,
σ
q
2
)
{\displaystyle q\sim {\mathcal {N}}(\mu _{q},\sigma _{q}^{2})}
, where
N
(
μ
,
σ
2
)
{\displaystyle {\mathcal {N}}(\mu ,\sigma ^{2})}
is the normal distribution with mean
μ
{\displaystyle \mu }
and variance
σ
2
{\displaystyle \sigma ^{2}}
; then
D
B
(
p
,
q
)
=
1
4
(
μ
p
−
μ
q
)
2
σ
p
2
+
σ
q
2
+
1
2
ln
(
σ
p
2
+
σ
q
2
2
σ
p
σ
q
)
{\displaystyle D_{B}(p,q)={\frac {1}{4}}{\frac {(\mu _{p}-\mu _{q})^{2}}{\sigma _{p}^{2}+\sigma _{q}^{2}}}+{\frac {1}{2}}\ln \left({\frac {\sigma _{p}^{2}+\sigma _{q}^{2}}{2\sigma _{p}\sigma _{q}}}\right)}
.
And in general, given two multivariate normal distributions
p
i
=
N
(
μ
i
,
Σ
i
)
{\displaystyle p_{i}={\mathcal {N}}({\boldsymbol {\mu }}_{i},\,{\boldsymbol {\Sigma }}_{i})}
,
D
B
(
p
1
,
p
2
)
=
1
8
(
μ
1
−
μ
2
)
T
Σ
−
1
(
μ
1
−
μ
2
)
+
1
2
ln
(
det
Σ
det
Σ
1
det
Σ
2
)
{\displaystyle D_{B}(p_{1},p_{2})={1 \over 8}({\boldsymbol {\mu }}_{1}-{\boldsymbol {\mu }}_{2})^{T}{\boldsymbol {\Sigma }}^{-1}({\boldsymbol {\mu }}_{1}-{\boldsymbol {\mu }}_{2})+{1 \over 2}\ln \,\left({\det {\boldsymbol {\Sigma }} \over {\sqrt {\det {\boldsymbol {\Sigma }}_{1}\,\det {\boldsymbol {\Sigma }}_{2}}}}\right)}
,
where
Σ
=
Σ
1
+
Σ
2
2
.
{\displaystyle {\boldsymbol {\Sigma }}={{\boldsymbol {\Sigma }}_{1}+{\boldsymbol {\Sigma }}_{2} \over 2}.}
Note that the first term is a squared Mahalanobis distance.
Properties
0
≤
B
C
≤
1
{\displaystyle 0\leq BC\leq 1}
and
0
≤
D
B
≤
∞
{\displaystyle 0\leq D_{B}\leq \infty }
.
D
B
{\displaystyle D_{B}}
does not obey the triangle inequality, though the Hellinger distance
1
−
B
C
(
p
,
q
)
{\displaystyle {\sqrt {1-BC(p,q)}}}
does.
= Bounds on Bayes error
=The Bhattacharyya distance can be used to upper and lower bound the Bayes error rate:
1
2
−
1
2
1
−
4
ρ
2
≤
L
∗
≤
ρ
{\displaystyle {\frac {1}{2}}-{\frac {1}{2}}{\sqrt {1-4\rho ^{2}}}\leq L^{*}\leq \rho }
where
ρ
=
E
η
(
X
)
(
1
−
η
(
X
)
)
{\displaystyle \rho =\mathbb {E} {\sqrt {\eta (X)(1-\eta (X))}}}
and
η
(
X
)
=
P
(
Y
=
1
|
X
)
{\displaystyle \eta (X)=\mathbb {P} (Y=1|X)}
is the posterior probability.
Applications
The Bhattacharyya coefficient quantifies the "closeness" of two random statistical samples.
Given two sequences from distributions
P
,
Q
{\displaystyle P,Q}
, bin them into
n
{\displaystyle n}
buckets, and let the frequency of samples from
P
{\displaystyle P}
in bucket
i
{\displaystyle i}
be
p
i
{\displaystyle p_{i}}
, and similarly for
q
i
{\displaystyle q_{i}}
, then the sample Bhattacharyya coefficient is
B
C
(
p
,
q
)
=
∑
i
=
1
n
p
i
q
i
,
{\displaystyle BC(\mathbf {p} ,\mathbf {q} )=\sum _{i=1}^{n}{\sqrt {p_{i}q_{i}}},}
which is an estimator of
B
C
(
P
,
Q
)
{\displaystyle BC(P,Q)}
. The quality of estimation depends on the choice of buckets; too few buckets would overestimate
B
C
(
P
,
Q
)
{\displaystyle BC(P,Q)}
, while too many would underestimate.
A common task in classification is estimating the separability of classes. Up to a multiplicative factor, the squared Mahalanobis distance is a special case of the Bhattacharyya distance when the two classes are normally distributed with the same variances. When two classes have similar means but significantly different variances, the Mahalanobis distance would be close to zero, while the Bhattacharyya distance would not be.
The Bhattacharyya coefficient is used in the construction of polar codes.
The Bhattacharyya distance is used in feature extraction and selection, image processing, speaker recognition, phone clustering, and in genetics.
See also
Bhattacharyya angle
Kullback–Leibler divergence
Hellinger distance
Mahalanobis distance
Chernoff bound
Rényi entropy
F-divergence
Fidelity of quantum states
References
External links
"Bhattacharyya distance", Encyclopedia of Mathematics, EMS Press, 2001 [1994]
Statistical Intuition of Bhattacharyya's distance
Some of the properties of Bhattacharyya Distance
Nielsen, F.; Boltz, S. (2010). "The Burbea–Rao and Bhattacharyya centroids". IEEE Transactions on Information Theory. 57 (8): 5455–5466.
Kailath, T. (1967). "The Divergence and Bhattacharyya Distance Measures in Signal Selection". IEEE Transactions on Communication Technology. 15 (1): 52–60.
Djouadi, A.; Snorrason, O.; Garber, F. (1990). "The quality of Training-Sample estimates of the Bhattacharyya coefficient". IEEE Transactions on Pattern Analysis and Machine Intelligence. 12 (1): 92–97.
Kata Kunci Pencarian:
- My Name Is Khan
- Bhattacharyya distance
- Hellinger distance
- Anil Kumar Bhattacharyya
- Mahalanobis distance
- Statistical distance
- Kullback–Leibler divergence
- Similarity measure
- String metric
- Bhattacharya
- Truncated normal distribution