- Source: Variation of information
In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements). It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information. Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality.
Definition
Suppose we have two partitions
X
{\displaystyle X}
and
Y
{\displaystyle Y}
of a set
A
{\displaystyle A}
into disjoint subsets, namely
X
=
{
X
1
,
X
2
,
…
,
X
k
}
{\displaystyle X=\{X_{1},X_{2},\ldots ,X_{k}\}}
and
Y
=
{
Y
1
,
Y
2
,
…
,
Y
l
}
{\displaystyle Y=\{Y_{1},Y_{2},\ldots ,Y_{l}\}}
.
Let:
n
=
∑
i
|
X
i
|
=
∑
j
|
Y
j
|
=
|
A
|
{\displaystyle n=\sum _{i}|X_{i}|=\sum _{j}|Y_{j}|=|A|}
p
i
=
|
X
i
|
/
n
{\displaystyle p_{i}=|X_{i}|/n}
and
q
j
=
|
Y
j
|
/
n
{\displaystyle q_{j}=|Y_{j}|/n}
r
i
j
=
|
X
i
∩
Y
j
|
/
n
{\displaystyle r_{ij}=|X_{i}\cap Y_{j}|/n}
Then the variation of information between the two partitions is:
V
I
(
X
;
Y
)
=
−
∑
i
,
j
r
i
j
[
log
(
r
i
j
/
p
i
)
+
log
(
r
i
j
/
q
j
)
]
{\displaystyle \mathrm {VI} (X;Y)=-\sum _{i,j}r_{ij}\left[\log(r_{ij}/p_{i})+\log(r_{ij}/q_{j})\right]}
.
This is equivalent to the shared information distance between the random variables i and j with respect to the uniform probability measure on
A
{\displaystyle A}
defined by
μ
(
B
)
:=
|
B
|
/
n
{\displaystyle \mu (B):=|B|/n}
for
B
⊆
A
{\displaystyle B\subseteq A}
.
= Explicit information content
=We can rewrite this definition in terms that explicitly highlight the information content of this metric.
The set of all partitions of a set form a compact lattice where the partial order induces two operations, the meet
∧
{\displaystyle \wedge }
and the join
∨
{\displaystyle \vee }
, where the maximum
1
¯
{\displaystyle {\overline {\mathrm {1} }}}
is the partition with only one block, i.e., all elements grouped together, and the minimum is
0
¯
{\displaystyle {\overline {\mathrm {0} }}}
, the partition consisting of all elements as singletons. The meet of two partitions
X
{\displaystyle X}
and
Y
{\displaystyle Y}
is easy to understand as that partition formed by all pair intersections of one block of,
X
i
{\displaystyle X_{i}}
, of
X
{\displaystyle X}
and one,
Y
i
{\displaystyle Y_{i}}
, of
Y
{\displaystyle Y}
. It then follows that
X
∧
Y
⊆
X
{\displaystyle X\wedge Y\subseteq X}
and
X
∧
Y
⊆
Y
{\displaystyle X\wedge Y\subseteq Y}
.
Let's define the entropy of a partition
X
{\displaystyle X}
as
where
p
i
=
|
X
i
|
/
n
{\displaystyle p_{i}=|X_{i}|/n}
. Clearly,
H
(
1
¯
)
=
0
{\displaystyle H({\overline {\mathrm {1} }})=0}
and
H
(
0
¯
)
=
log
n
{\displaystyle H({\overline {\mathrm {0} }})=\log \,n}
. The entropy of a partition is a monotonous function on the lattice of partitions in the sense that
X
⊆
Y
⇒
H
(
X
)
≥
H
(
Y
)
{\displaystyle X\subseteq Y\Rightarrow H(X)\geq H(Y)}
.
Then the VI distance between
X
{\displaystyle X}
and
Y
{\displaystyle Y}
is given by
The difference
d
(
X
,
Y
)
≡
|
H
(
X
)
−
H
(
Y
)
|
{\displaystyle d(X,Y)\equiv |H\left(X\right)-H\left(Y\right)|}
is a pseudo-metric as
d
(
X
,
Y
)
=
0
{\displaystyle d(X,Y)=0}
doesn't necessarily imply that
X
=
Y
{\displaystyle X=Y}
. From the definition of
1
¯
{\displaystyle {\overline {\mathrm {1} }}}
, it is
V
I
(
X
,
1
)
=
H
(
X
)
{\displaystyle \mathrm {VI} (X,\mathrm {1} )\,=\,H\left(X\right)}
.
If in the Hasse diagram we draw an edge from every partition to the maximum
1
¯
{\displaystyle {\overline {\mathrm {1} }}}
and assign it a weight equal to the VI distance between the given partition and
1
¯
{\displaystyle {\overline {\mathrm {1} }}}
, we can interpret the VI distance as basically an average of differences of edge weights to the maximum
For
H
(
X
)
{\displaystyle H(X)}
as defined above, it holds that the joint information of two partitions coincides with the entropy of the meet
and we also have that
d
(
X
,
X
∧
Y
)
=
H
(
X
∧
Y
|
X
)
{\displaystyle d(X,X\wedge Y)\,=\,H(X\wedge Y|X)}
coincides with the conditional entropy of the meet (intersection)
X
∧
Y
{\displaystyle X\wedge Y}
relative to
X
{\displaystyle X}
.
Identities
The variation of information satisfies
V
I
(
X
;
Y
)
=
H
(
X
)
+
H
(
Y
)
−
2
I
(
X
,
Y
)
{\displaystyle \mathrm {VI} (X;Y)=H(X)+H(Y)-2I(X,Y)}
,
where
H
(
X
)
{\displaystyle H(X)}
is the entropy of
X
{\displaystyle X}
, and
I
(
X
,
Y
)
{\displaystyle I(X,Y)}
is mutual information between
X
{\displaystyle X}
and
Y
{\displaystyle Y}
with respect to the uniform probability measure on
A
{\displaystyle A}
. This can be rewritten as
V
I
(
X
;
Y
)
=
H
(
X
,
Y
)
−
I
(
X
,
Y
)
{\displaystyle \mathrm {VI} (X;Y)=H(X,Y)-I(X,Y)}
,
where
H
(
X
,
Y
)
{\displaystyle H(X,Y)}
is the joint entropy of
X
{\displaystyle X}
and
Y
{\displaystyle Y}
, or
V
I
(
X
;
Y
)
=
H
(
X
|
Y
)
+
H
(
Y
|
X
)
{\displaystyle \mathrm {VI} (X;Y)=H(X|Y)+H(Y|X)}
,
where
H
(
X
|
Y
)
{\displaystyle H(X|Y)}
and
H
(
Y
|
X
)
{\displaystyle H(Y|X)}
are the respective conditional entropies.
The variation of information can also be bounded, either in terms of the number of elements:
V
I
(
X
;
Y
)
≤
log
(
n
)
{\displaystyle \mathrm {VI} (X;Y)\leq \log(n)}
,
Or with respect to a maximum number of clusters,
K
∗
{\displaystyle K^{*}}
:
V
I
(
X
;
Y
)
≤
2
log
(
K
∗
)
{\displaystyle \mathrm {VI} (X;Y)\leq 2\log(K^{*})}
= Triangle inequality
=To verify the triangle inequality
V
I
(
X
;
Z
)
≤
V
I
(
X
;
Y
)
+
V
I
(
Y
;
Z
)
{\displaystyle \mathrm {VI} (X;Z)\leq \mathrm {VI} (X;Y)+\mathrm {VI} (Y;Z)}
, expand using the identity
V
I
(
X
;
Y
)
=
H
(
X
|
Y
)
+
H
(
Y
|
X
)
{\displaystyle \mathrm {VI} (X;Y)=H(X|Y)+H(Y|X)}
. It suffices to prove
H
(
X
|
Z
)
≤
H
(
X
|
Y
)
+
H
(
Y
|
Z
)
{\displaystyle H(X|Z)\leq H(X|Y)+H(Y|Z)}
The right side has a lower bound
H
(
X
|
Y
)
+
H
(
Y
|
Z
)
≥
H
(
X
|
Y
,
Z
)
+
H
(
Y
|
Z
)
=
H
(
X
,
Y
|
Z
)
{\displaystyle H(X|Y)+H(Y|Z)\geq H(X|Y,Z)+H(Y|Z)=H(X,Y|Z)}
which is no less than the left side.
References
Further reading
External links
Partanalyzer includes a C++ implementation of VI and other metrics and indices for analyzing partitions and clusterings
C++ implementation with MATLAB mex files
Kata Kunci Pencarian:
- Literasi informasi
- Ginkgo
- Ular sendok
- Volkswagen Polo Mk2
- Bahasa Inggris
- Ras manusia
- Delhi
- Medan magnet Bumi
- Budesonid
- Bencana Chernobyl
- Variation of information
- Kullback–Leibler divergence
- Mutual information
- Adjusted mutual information
- Coefficient of variation
- Design of experiments
- Conditional entropy
- Variation (music)
- Information theory
- Cluster analysis