- Source: Markov kernel
In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite state space.
Formal definition
Let
(
X
,
A
)
{\displaystyle (X,{\mathcal {A}})}
and
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
be measurable spaces. A Markov kernel with source
(
X
,
A
)
{\displaystyle (X,{\mathcal {A}})}
and target
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
, sometimes written as
κ
:
(
X
,
A
)
→
(
Y
,
B
)
{\displaystyle \kappa :(X,{\mathcal {A}})\to (Y,{\mathcal {B}})}
, is a function
κ
:
B
×
X
→
[
0
,
1
]
{\displaystyle \kappa :{\mathcal {B}}\times X\to [0,1]}
with the following properties:
For every (fixed)
B
0
∈
B
{\displaystyle B_{0}\in {\mathcal {B}}}
, the map
x
↦
κ
(
B
0
,
x
)
{\displaystyle x\mapsto \kappa (B_{0},x)}
is
A
{\displaystyle {\mathcal {A}}}
-measurable
For every (fixed)
x
0
∈
X
{\displaystyle x_{0}\in X}
, the map
B
↦
κ
(
B
,
x
0
)
{\displaystyle B\mapsto \kappa (B,x_{0})}
is a probability measure on
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
In other words it associates to each point
x
∈
X
{\displaystyle x\in X}
a probability measure
κ
(
d
y
|
x
)
:
B
↦
κ
(
B
,
x
)
{\displaystyle \kappa (dy|x):B\mapsto \kappa (B,x)}
on
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
such that, for every measurable set
B
∈
B
{\displaystyle B\in {\mathcal {B}}}
, the map
x
↦
κ
(
B
,
x
)
{\displaystyle x\mapsto \kappa (B,x)}
is measurable with respect to the
σ
{\displaystyle \sigma }
-algebra
A
{\displaystyle {\mathcal {A}}}
.
Examples
= Simple random walk on the integers
=Take
X
=
Y
=
Z
{\displaystyle X=Y=\mathbb {Z} }
, and
A
=
B
=
P
(
Z
)
{\displaystyle {\mathcal {A}}={\mathcal {B}}={\mathcal {P}}(\mathbb {Z} )}
(the power set of
Z
{\displaystyle \mathbb {Z} }
). Then a Markov kernel is fully determined by the probability it assigns to singletons
{
m
}
,
m
∈
Y
=
Z
{\displaystyle \{m\},\,m\in Y=\mathbb {Z} }
for each
n
∈
X
=
Z
{\displaystyle n\in X=\mathbb {Z} }
:
κ
(
B
|
n
)
=
∑
m
∈
B
κ
(
{
m
}
|
n
)
,
∀
n
∈
Z
,
∀
B
∈
B
{\displaystyle \kappa (B|n)=\sum _{m\in B}\kappa (\{m\}|n),\qquad \forall n\in \mathbb {Z} ,\,\forall B\in {\mathcal {B}}}
.
Now the random walk
κ
{\displaystyle \kappa }
that goes to the right with probability
p
{\displaystyle p}
and to the left with probability
1
−
p
{\displaystyle 1-p}
is defined by
κ
(
{
m
}
|
n
)
=
p
δ
m
,
n
+
1
+
(
1
−
p
)
δ
m
,
n
−
1
,
∀
n
,
m
∈
Z
{\displaystyle \kappa (\{m\}|n)=p\delta _{m,n+1}+(1-p)\delta _{m,n-1},\quad \forall n,m\in \mathbb {Z} }
where
δ
{\displaystyle \delta }
is the Kronecker delta. The transition probabilities
P
(
m
|
n
)
=
κ
(
{
m
}
|
n
)
{\displaystyle P(m|n)=\kappa (\{m\}|n)}
for the random walk are equivalent to the Markov kernel.
= General Markov processes with countable state space
=More generally take
X
{\displaystyle X}
and
Y
{\displaystyle Y}
both countable and
A
=
P
(
X
)
,
B
=
P
(
Y
)
{\displaystyle {\mathcal {A}}={\mathcal {P}}(X),\ {\mathcal {B}}={\mathcal {P}}(Y)}
.
Again a Markov kernel is defined by the probability it assigns to singleton sets for each
i
∈
X
{\displaystyle i\in X}
κ
(
B
|
i
)
=
∑
j
∈
B
κ
(
{
j
}
|
i
)
,
∀
i
∈
X
,
∀
B
∈
B
{\displaystyle \kappa (B|i)=\sum _{j\in B}\kappa (\{j\}|i),\qquad \forall i\in X,\,\forall B\in {\mathcal {B}}}
,
We define a Markov process by defining a transition probability
P
(
j
|
i
)
=
K
j
i
{\displaystyle P(j|i)=K_{ji}}
where the numbers
K
j
i
{\displaystyle K_{ji}}
define a (countable) stochastic matrix
(
K
j
i
)
{\displaystyle (K_{ji})}
i.e.
K
j
i
≥
0
,
∀
(
j
,
i
)
∈
Y
×
X
,
∑
j
∈
Y
K
j
i
=
1
,
∀
i
∈
X
.
{\displaystyle {\begin{aligned}K_{ji}&\geq 0,\qquad &\forall (j,i)\in Y\times X,\\\sum _{j\in Y}K_{ji}&=1,\qquad &\forall i\in X.\\\end{aligned}}}
We then define
κ
(
{
j
}
|
i
)
=
K
j
i
=
P
(
j
|
i
)
,
∀
i
∈
X
,
∀
B
∈
B
{\displaystyle \kappa (\{j\}|i)=K_{ji}=P(j|i),\qquad \forall i\in X,\quad \forall B\in {\mathcal {B}}}
.
Again the transition probability, the stochastic matrix and the Markov kernel are equivalent reformulations.
= Markov kernel defined by a kernel function and a measure
=Let
ν
{\displaystyle \nu }
be a measure on
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
, and
k
:
Y
×
X
→
[
0
,
∞
]
{\displaystyle k:Y\times X\to [0,\infty ]}
a measurable function with respect to the product
σ
{\displaystyle \sigma }
-algebra
A
⊗
B
{\displaystyle {\mathcal {A}}\otimes {\mathcal {B}}}
such that
∫
Y
k
(
y
,
x
)
ν
(
d
y
)
=
1
,
∀
x
∈
X
{\displaystyle \int _{Y}k(y,x)\nu (\mathrm {d} y)=1,\qquad \forall x\in X}
,
then
κ
(
d
y
|
x
)
=
k
(
y
,
x
)
ν
(
d
y
)
{\displaystyle \kappa (dy|x)=k(y,x)\nu (dy)}
i.e. the mapping
{
κ
:
B
×
X
→
[
0
,
1
]
κ
(
B
|
x
)
=
∫
B
k
(
y
,
x
)
ν
(
d
y
)
{\displaystyle {\begin{cases}\kappa :{\mathcal {B}}\times X\to [0,1]\\\kappa (B|x)=\int _{B}k(y,x)\nu (\mathrm {d} y)\end{cases}}}
defines a Markov kernel. This example generalises the countable Markov process example where
ν
{\displaystyle \nu }
was the counting measure. Moreover it encompasses other important examples such as the convolution kernels, in particular the Markov kernels defined by the heat equation. The latter example includes the Gaussian kernel on
X
=
Y
=
R
{\displaystyle X=Y=\mathbb {R} }
with
ν
(
d
x
)
=
d
x
{\displaystyle \nu (dx)=dx}
standard Lebesgue measure and
k
t
(
y
,
x
)
=
1
2
π
t
e
−
(
y
−
x
)
2
/
(
2
t
2
)
{\displaystyle k_{t}(y,x)={\frac {1}{{\sqrt {2\pi }}t}}e^{-(y-x)^{2}/(2t^{2})}}
.
= Measurable functions
=Take
(
X
,
A
)
{\displaystyle (X,{\mathcal {A}})}
and
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
arbitrary measurable spaces, and let
f
:
X
→
Y
{\displaystyle f:X\to Y}
be a measurable function. Now define
κ
(
d
y
|
x
)
=
δ
f
(
x
)
(
d
y
)
{\displaystyle \kappa (dy|x)=\delta _{f(x)}(dy)}
i.e.
κ
(
B
|
x
)
=
1
B
(
f
(
x
)
)
=
1
f
−
1
(
B
)
(
x
)
=
{
1
if
f
(
x
)
∈
B
0
otherwise
{\displaystyle \kappa (B|x)=\mathbf {1} _{B}(f(x))=\mathbf {1} _{f^{-1}(B)}(x)={\begin{cases}1&{\text{if }}f(x)\in B\\0&{\text{otherwise}}\end{cases}}}
for all
B
∈
B
{\displaystyle B\in {\mathcal {B}}}
.
Note that the indicator function
1
f
−
1
(
B
)
{\displaystyle \mathbf {1} _{f^{-1}(B)}}
is
A
{\displaystyle {\mathcal {A}}}
-measurable for all
B
∈
B
{\displaystyle B\in {\mathcal {B}}}
iff
f
{\displaystyle f}
is measurable.
This example allows us to think of a Markov kernel as a generalised function with a (in general) random rather than certain value. That is, it is a multivalued function where the values are not equally weighted.
= Galton–Watson process
=As a less obvious example, take
X
=
N
,
A
=
P
(
N
)
{\displaystyle X=\mathbb {N} ,{\mathcal {A}}={\mathcal {P}}(\mathbb {N} )}
, and
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
the real numbers
R
{\displaystyle \mathbb {R} }
with the standard sigma algebra of Borel sets. Then
κ
(
B
|
n
)
=
{
1
B
(
0
)
n
=
0
Pr
(
ξ
1
+
⋯
+
ξ
x
∈
B
)
n
≠
0
{\displaystyle \kappa (B|n)={\begin{cases}\mathbf {1} _{B}(0)&n=0\\\Pr(\xi _{1}+\cdots +\xi _{x}\in B)&n\neq 0\\\end{cases}}}
where
x
{\displaystyle x}
is the number of element at the state
n
{\displaystyle n}
,
ξ
i
{\displaystyle \xi _{i}}
are i.i.d. random variables (usually with mean 0) and where
1
B
{\displaystyle \mathbf {1} _{B}}
is the indicator function. For the simple case of coin flips this models the different levels of a Galton board.
Composition of Markov Kernels
Given measurable spaces
(
X
,
A
)
{\displaystyle (X,{\mathcal {A}})}
,
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
we consider a Markov kernel
κ
:
B
×
X
→
[
0
,
1
]
{\displaystyle \kappa :{\mathcal {B}}\times X\to [0,1]}
as a morphism
κ
:
X
→
Y
{\displaystyle \kappa :X\to Y}
. Intuitively, rather than assigning to each
x
∈
X
{\displaystyle x\in X}
a sharply defined point
y
∈
Y
{\displaystyle y\in Y}
the kernel assigns a "fuzzy" point in
Y
{\displaystyle Y}
which is only known with some level of uncertainty, much like actual physical measurements. If we have a third measurable space
(
Z
,
C
)
{\displaystyle (Z,{\mathcal {C}})}
, and probability kernels
κ
:
X
→
Y
{\displaystyle \kappa :X\to Y}
and
λ
:
Y
→
Z
{\displaystyle \lambda :Y\to Z}
, we can define a composition
λ
∘
κ
:
X
→
Z
{\displaystyle \lambda \circ \kappa :X\to Z}
by the Chapman-Kolmogorov equation
(
λ
∘
κ
)
(
d
z
|
x
)
=
∫
Y
λ
(
d
z
|
y
)
κ
(
d
y
|
x
)
{\displaystyle (\lambda \circ \kappa )(dz|x)=\int _{Y}\lambda (dz|y)\kappa (dy|x)}
.
The composition is associative by the Monotone Convergence Theorem and the identity function considered as a Markov kernel (i.e. the delta measure
κ
1
(
d
x
′
|
x
)
=
δ
x
(
d
x
′
)
{\displaystyle \kappa _{1}(dx'|x)=\delta _{x}(dx')}
) is the unit for this composition.
This composition defines the structure of a category on the measurable spaces with Markov kernels as morphisms, first defined by Lawvere, the category of Markov kernels.
Probability Space defined by Probability Distribution and a Markov Kernel
A composition of a probability space
(
X
,
A
,
P
X
)
{\displaystyle (X,{\mathcal {A}},P_{X})}
and a probability kernel
κ
:
(
X
,
A
)
→
(
Y
,
B
)
{\displaystyle \kappa :(X,{\mathcal {A}})\to (Y,{\mathcal {B}})}
defines a probability space
(
Y
,
B
,
P
Y
=
κ
∘
P
X
)
{\displaystyle (Y,{\mathcal {B}},P_{Y}=\kappa \circ P_{X})}
, where the probability measure is given by
P
Y
(
B
)
=
∫
X
∫
B
κ
(
d
y
|
x
)
P
X
(
d
x
)
=
∫
X
κ
(
B
|
x
)
P
X
(
d
x
)
=
E
P
X
κ
(
B
|
⋅
)
.
{\displaystyle P_{Y}(B)=\int _{X}\int _{B}\kappa (dy|x)P_{X}(dx)=\int _{X}\kappa (B|x)P_{X}(dx)=\mathbb {E} _{P_{X}}\kappa (B|\cdot ).}
Properties
= Semidirect product
=Let
(
X
,
A
,
P
)
{\displaystyle (X,{\mathcal {A}},P)}
be a probability space and
κ
{\displaystyle \kappa }
a Markov kernel from
(
X
,
A
)
{\displaystyle (X,{\mathcal {A}})}
to some
(
Y
,
B
)
{\displaystyle (Y,{\mathcal {B}})}
. Then there exists a unique measure
Q
{\displaystyle Q}
on
(
X
×
Y
,
A
⊗
B
)
{\displaystyle (X\times Y,{\mathcal {A}}\otimes {\mathcal {B}})}
, such that:
Q
(
A
×
B
)
=
∫
A
κ
(
B
|
x
)
P
(
d
x
)
,
∀
A
∈
A
,
∀
B
∈
B
.
{\displaystyle Q(A\times B)=\int _{A}\kappa (B|x)\,P(dx),\quad \forall A\in {\mathcal {A}},\quad \forall B\in {\mathcal {B}}.}
= Regular conditional distribution
=Let
(
S
,
Y
)
{\displaystyle (S,Y)}
be a Borel space,
X
{\displaystyle X}
a
(
S
,
Y
)
{\displaystyle (S,Y)}
-valued random variable on the measure space
(
Ω
,
F
,
P
)
{\displaystyle (\Omega ,{\mathcal {F}},P)}
and
G
⊆
F
{\displaystyle {\mathcal {G}}\subseteq {\mathcal {F}}}
a sub-
σ
{\displaystyle \sigma }
-algebra. Then there exists a Markov kernel
κ
{\displaystyle \kappa }
from
(
Ω
,
G
)
{\displaystyle (\Omega ,{\mathcal {G}})}
to
(
S
,
Y
)
{\displaystyle (S,Y)}
, such that
κ
(
⋅
,
B
)
{\displaystyle \kappa (\cdot ,B)}
is a version of the conditional expectation
E
[
1
{
X
∈
B
}
∣
G
]
{\displaystyle \mathbb {E} [\mathbf {1} _{\{X\in B\}}\mid {\mathcal {G}}]}
for every
B
∈
Y
{\displaystyle B\in Y}
, i.e.
P
(
X
∈
B
∣
G
)
=
E
[
1
{
X
∈
B
}
∣
G
]
=
κ
(
⋅
,
B
)
,
P
-a.s.
∀
B
∈
G
.
{\displaystyle P(X\in B\mid {\mathcal {G}})=\mathbb {E} \left[\mathbf {1} _{\{X\in B\}}\mid {\mathcal {G}}\right]=\kappa (\cdot ,B),\qquad P{\text{-a.s.}}\,\,\forall B\in {\mathcal {G}}.}
It is called regular conditional distribution of
X
{\displaystyle X}
given
G
{\displaystyle {\mathcal {G}}}
and is not uniquely defined.
Generalizations
Transition kernels generalize Markov kernels in the sense that for all
x
∈
X
{\displaystyle x\in X}
, the map
B
↦
κ
(
B
|
x
)
{\displaystyle B\mapsto \kappa (B|x)}
can be any type of (non negative) measure, not necessarily a probability measure.
External links
Markov kernel in nLab.
References
Bauer, Heinz (1996), Probability Theory, de Gruyter, ISBN 3-11-013935-9
§36. Kernels and semigroups of kernels
See also
Category of Markov kernels
Kata Kunci Pencarian:
- Statistika komputasi
- Peladen
- Matriks (matematika)
- Analitik prediktif
- Daftar matriks yang dinamakan
- Markov kernel
- Category of Markov kernels
- Markov chains on a measurable state space
- Chapman–Kolmogorov equation
- Giry monad
- Transition kernel
- Generative adversarial network
- De Finetti's theorem
- Markov operator
- List of things named after Andrey Markov