- Source: Generalized Dirichlet distribution
In statistics, the generalized Dirichlet distribution (GD) is a generalization of the Dirichlet distribution with a more general covariance structure and almost twice the number of parameters. Random vectors with a GD distribution are completely neutral.
The density function of
p
1
,
…
,
p
k
−
1
{\displaystyle p_{1},\ldots ,p_{k-1}}
is
[
∏
i
=
1
k
−
1
B
(
a
i
,
b
i
)
]
−
1
p
k
b
k
−
1
−
1
∏
i
=
1
k
−
1
[
p
i
a
i
−
1
(
∑
j
=
i
k
p
j
)
b
i
−
1
−
(
a
i
+
b
i
)
]
{\displaystyle \left[\prod _{i=1}^{k-1}B(a_{i},b_{i})\right]^{-1}p_{k}^{b_{k-1}-1}\prod _{i=1}^{k-1}\left[p_{i}^{a_{i}-1}\left(\sum _{j=i}^{k}p_{j}\right)^{b_{i-1}-(a_{i}+b_{i})}\right]}
where we define
p
k
=
1
−
∑
i
=
1
k
−
1
p
i
{\textstyle p_{k}=1-\sum _{i=1}^{k-1}p_{i}}
. Here
B
(
x
,
y
)
{\displaystyle B(x,y)}
denotes the Beta function. This reduces to the standard Dirichlet distribution if
b
i
−
1
=
a
i
+
b
i
{\displaystyle b_{i-1}=a_{i}+b_{i}}
for
2
⩽
i
⩽
k
−
1
{\displaystyle 2\leqslant i\leqslant k-1}
(
b
0
{\displaystyle b_{0}}
is arbitrary).
For example, if k=4, then the density function of
p
1
,
p
2
,
p
3
{\displaystyle p_{1},p_{2},p_{3}}
is
[
∏
i
=
1
3
B
(
a
i
,
b
i
)
]
−
1
p
1
a
1
−
1
p
2
a
2
−
1
p
3
a
3
−
1
p
4
b
3
−
1
(
p
2
+
p
3
+
p
4
)
b
1
−
(
a
2
+
b
2
)
(
p
3
+
p
4
)
b
2
−
(
a
3
+
b
3
)
{\displaystyle \left[\prod _{i=1}^{3}B(a_{i},b_{i})\right]^{-1}p_{1}^{a_{1}-1}p_{2}^{a_{2}-1}p_{3}^{a_{3}-1}p_{4}^{b_{3}-1}\left(p_{2}+p_{3}+p_{4}\right)^{b_{1}-\left(a_{2}+b_{2}\right)}\left(p_{3}+p_{4}\right)^{b_{2}-\left(a_{3}+b_{3}\right)}}
where
p
1
+
p
2
+
p
3
<
1
{\displaystyle p_{1}+p_{2}+p_{3}<1}
and
p
4
=
1
−
p
1
−
p
2
−
p
3
{\displaystyle p_{4}=1-p_{1}-p_{2}-p_{3}}
.
Connor and Mosimann define the PDF as they did for the following reason. Define random variables
z
1
,
…
,
z
k
−
1
{\displaystyle z_{1},\ldots ,z_{k-1}}
with
z
1
=
p
1
,
z
2
=
p
2
/
(
1
−
p
1
)
,
z
3
=
p
3
/
(
1
−
(
p
1
+
p
2
)
)
,
…
,
z
i
=
p
i
/
(
1
−
(
p
1
+
⋯
+
p
i
−
1
)
)
{\displaystyle z_{1}=p_{1},z_{2}=p_{2}/\left(1-p_{1}\right),z_{3}=p_{3}/\left(1-(p_{1}+p_{2})\right),\ldots ,z_{i}=p_{i}/\left(1-\left(p_{1}+\cdots +p_{i-1}\right)\right)}
. Then
p
1
,
…
,
p
k
{\displaystyle p_{1},\ldots ,p_{k}}
have the generalized Dirichlet distribution as parametrized above, if the
z
i
{\displaystyle z_{i}}
are independent beta with parameters
a
i
,
b
i
{\displaystyle a_{i},b_{i}}
,
i
=
1
,
…
,
k
−
1
{\displaystyle i=1,\ldots ,k-1}
.
Alternative form given by Wong
Wong gives the slightly more concise form for
x
1
+
⋯
+
x
k
≤
1
{\displaystyle x_{1}+\cdots +x_{k}\leq 1}
∏
i
=
1
k
x
i
α
i
−
1
(
1
−
x
1
−
⋯
−
x
i
)
γ
i
B
(
α
i
,
β
i
)
{\displaystyle \prod _{i=1}^{k}{\frac {x_{i}^{\alpha _{i}-1}\left(1-x_{1}-\cdots -x_{i}\right)^{\gamma _{i}}}{B(\alpha _{i},\beta _{i})}}}
where
γ
j
=
β
j
−
α
j
+
1
−
β
j
+
1
{\displaystyle \gamma _{j}=\beta _{j}-\alpha _{j+1}-\beta _{j+1}}
for
1
≤
j
≤
k
−
1
{\displaystyle 1\leq j\leq k-1}
and
γ
k
=
β
k
−
1
{\displaystyle \gamma _{k}=\beta _{k}-1}
. Note that Wong defines a distribution over a
k
{\displaystyle k}
dimensional space (implicitly defining
x
k
+
1
=
1
−
∑
i
=
1
k
x
i
{\textstyle x_{k+1}=1-\sum _{i=1}^{k}x_{i}}
) while Connor and Mosiman use a
k
−
1
{\displaystyle k-1}
dimensional space with
x
k
=
1
−
∑
i
=
1
k
−
1
x
i
{\textstyle x_{k}=1-\sum _{i=1}^{k-1}x_{i}}
.
General moment function
If
X
=
(
X
1
,
…
,
X
k
)
∼
G
D
k
(
α
1
,
…
,
α
k
;
β
1
,
…
,
β
k
)
{\displaystyle X=\left(X_{1},\ldots ,X_{k}\right)\sim GD_{k}\left(\alpha _{1},\ldots ,\alpha _{k};\beta _{1},\ldots ,\beta _{k}\right)}
, then
E
[
X
1
r
1
X
2
r
2
⋯
X
k
r
k
]
=
∏
j
=
1
k
Γ
(
α
j
+
β
j
)
Γ
(
α
j
+
r
j
)
Γ
(
β
j
+
δ
j
)
Γ
(
α
j
)
Γ
(
β
j
)
Γ
(
α
j
+
β
j
+
r
j
+
δ
j
)
{\displaystyle E\left[X_{1}^{r_{1}}X_{2}^{r_{2}}\cdots X_{k}^{r_{k}}\right]=\prod _{j=1}^{k}{\frac {\Gamma \left(\alpha _{j}+\beta _{j}\right)\Gamma \left(\alpha _{j}+r_{j}\right)\Gamma \left(\beta _{j}+\delta _{j}\right)}{\Gamma \left(\alpha _{j}\right)\Gamma \left(\beta _{j}\right)\Gamma \left(\alpha _{j}+\beta _{j}+r_{j}+\delta _{j}\right)}}}
where
δ
j
=
r
j
+
1
+
r
j
+
2
+
⋯
+
r
k
{\displaystyle \delta _{j}=r_{j+1}+r_{j+2}+\cdots +r_{k}}
for
j
=
1
,
2
,
⋯
,
k
−
1
{\displaystyle j=1,2,\cdots ,k-1}
and
δ
k
=
0
{\displaystyle \delta _{k}=0}
. Thus
E
(
X
j
)
=
α
j
α
j
+
β
j
∏
m
=
1
j
−
1
β
m
α
m
+
β
m
.
{\displaystyle E\left(X_{j}\right)={\frac {\alpha _{j}}{\alpha _{j}+\beta _{j}}}\prod _{m=1}^{j-1}{\frac {\beta _{m}}{\alpha _{m}+\beta _{m}}}.}
Reduction to standard Dirichlet distribution
As stated above, if
b
i
−
1
=
a
i
+
b
i
{\displaystyle b_{i-1}=a_{i}+b_{i}}
for
2
≤
i
≤
k
{\displaystyle 2\leq i\leq k}
then the distribution reduces to a standard Dirichlet. This condition is different from the usual case, in which setting the additional parameters of the generalized distribution to zero results in the original distribution. However, in the case of the GDD, this results in a very complicated density function.
Bayesian analysis
Suppose
X
=
(
X
1
,
…
,
X
k
)
∼
G
D
k
(
α
1
,
…
,
α
k
;
β
1
,
…
,
β
k
)
{\displaystyle X=\left(X_{1},\ldots ,X_{k}\right)\sim GD_{k}\left(\alpha _{1},\ldots ,\alpha _{k};\beta _{1},\ldots ,\beta _{k}\right)}
is generalized Dirichlet, and that
Y
∣
X
{\displaystyle Y\mid X}
is multinomial with
n
{\displaystyle n}
trials (here
Y
=
(
Y
1
,
…
,
Y
k
)
{\displaystyle Y=\left(Y_{1},\ldots ,Y_{k}\right)}
). Writing
Y
j
=
y
j
{\displaystyle Y_{j}=y_{j}}
for
1
≤
j
≤
k
{\displaystyle 1\leq j\leq k}
and
y
k
+
1
=
n
−
∑
i
=
1
k
y
i
{\textstyle y_{k+1}=n-\sum _{i=1}^{k}y_{i}}
the joint posterior of
X
|
Y
{\displaystyle X|Y}
is a generalized Dirichlet distribution with
X
∣
Y
∼
G
D
k
(
α
′
1
,
…
,
α
′
k
;
β
′
1
,
…
,
β
′
k
)
{\displaystyle X\mid Y\sim GD_{k}\left({\alpha '}_{1},\ldots ,{\alpha '}_{k};{\beta '}_{1},\ldots ,{\beta '}_{k}\right)}
where
α
′
j
=
α
j
+
y
j
{\displaystyle {\alpha '}_{j}=\alpha _{j}+y_{j}}
and
β
′
j
=
β
j
+
∑
i
=
j
+
1
k
+
1
y
i
{\displaystyle {\beta '}_{j}=\beta _{j}+\sum _{i=j+1}^{k+1}y_{i}}
for
1
⩽
j
⩽
k
.
{\displaystyle 1\leqslant j\leqslant k.}
Sampling experiment
Wong gives the following system as an example of how the Dirichlet and generalized Dirichlet distributions differ. He posits that a large urn contains balls of
k
+
1
{\displaystyle k+1}
different colours. The proportion of each colour is unknown. Write
X
=
(
X
1
,
…
,
X
k
)
{\displaystyle X=(X_{1},\ldots ,X_{k})}
for the proportion of the balls with colour
j
{\displaystyle j}
in the urn.
Experiment 1. Analyst 1 believes that
X
∼
D
(
α
1
,
…
,
α
k
,
α
k
+
1
)
{\displaystyle X\sim D(\alpha _{1},\ldots ,\alpha _{k},\alpha _{k+1})}
(ie,
X
{\displaystyle X}
is Dirichlet with parameters
α
i
{\displaystyle \alpha _{i}}
). The analyst then makes
k
+
1
{\displaystyle k+1}
glass boxes and puts
α
i
{\displaystyle \alpha _{i}}
marbles of colour
i
{\displaystyle i}
in box
i
{\displaystyle i}
(it is assumed that the
α
i
{\displaystyle \alpha _{i}}
are integers
≥
1
{\displaystyle \geq 1}
). Then analyst 1 draws a ball from the urn, observes its colour (say colour
j
{\displaystyle j}
) and puts it in box
j
{\displaystyle j}
. He can identify the correct box because they are transparent and the colours of the marbles within are visible. The process continues until
n
{\displaystyle n}
balls have been drawn. The posterior distribution is then Dirichlet with parameters being the number of marbles in each box.
Experiment 2. Analyst 2 believes that
X
{\displaystyle X}
follows a generalized Dirichlet distribution:
X
∼
G
D
(
α
1
,
…
,
α
k
;
β
1
,
…
,
β
k
)
{\displaystyle X\sim GD(\alpha _{1},\ldots ,\alpha _{k};\beta _{1},\ldots ,\beta _{k})}
. All parameters are again assumed to be positive integers. The analyst makes
k
+
1
{\displaystyle k+1}
wooden boxes. The boxes have two areas: one for balls and one for marbles. The balls are coloured but the marbles are not coloured. Then for
j
=
1
,
…
,
k
{\displaystyle j=1,\ldots ,k}
, he puts
α
j
{\displaystyle \alpha _{j}}
balls of colour
j
{\displaystyle j}
, and
β
j
{\displaystyle \beta _{j}}
marbles, in to box
j
{\displaystyle j}
. He then puts a ball of colour
k
+
1
{\displaystyle k+1}
in box
k
+
1
{\displaystyle k+1}
. The analyst then draws a ball from the urn. Because the boxes are wood, the analyst cannot tell which box to put the ball in (as he could in experiment 1 above); he also has a poor memory and cannot remember which box contains which colour balls. He has to discover which box is the correct one to put the ball in. He does this by opening box 1 and comparing the balls in it to the drawn ball. If the colours differ, the box is the wrong one. The analyst places a marble in box 1 and proceeds to box 2. He repeats the process until the balls in the box match the drawn ball, at which point he places the ball in the box with the other balls of matching colour. The analyst then draws another ball from the urn and repeats until
n
{\displaystyle n}
balls are drawn. The posterior is then generalized Dirichlet with parameters
α
{\displaystyle \alpha }
being the number of balls, and
β
{\displaystyle \beta }
the number of marbles, in each box.
Note that in experiment 2, changing the order of the boxes has a non-trivial effect, unlike experiment 1.
See also
Dirichlet-multinomial distribution
Lukacs's proportion-sum independence theorem
References
Kata Kunci Pencarian:
- Generalized Dirichlet distribution
- Dirichlet distribution
- Dirichlet-multinomial distribution
- Generalized Riemann hypothesis
- Beta distribution
- Multinomial distribution
- List of things named after Peter Gustav Lejeune Dirichlet
- Generalized beta distribution
- Gamma distribution
- List of probability distributions