- Source: Normal-inverse-gamma distribution
- Distribusi khi-kuadrat
- Normal-inverse-gamma distribution
- Inverse-gamma distribution
- Normal-gamma distribution
- Normal-inverse-Wishart distribution
- Gamma distribution
- Normal-inverse Gaussian distribution
- Normal distribution
- Inverse-chi-squared distribution
- Inverse matrix gamma distribution
- Generalised hyperbolic distribution
In probability theory and statistics, the normal-inverse-gamma distribution (or Gaussian-inverse-gamma distribution) is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and variance.
Definition
Suppose
x
∣
σ
2
,
μ
,
λ
∼
N
(
μ
,
σ
2
/
λ
)
{\displaystyle x\mid \sigma ^{2},\mu ,\lambda \sim \mathrm {N} (\mu ,\sigma ^{2}/\lambda )\,\!}
has a normal distribution with mean
μ
{\displaystyle \mu }
and variance
σ
2
/
λ
{\displaystyle \sigma ^{2}/\lambda }
, where
σ
2
∣
α
,
β
∼
Γ
−
1
(
α
,
β
)
{\displaystyle \sigma ^{2}\mid \alpha ,\beta \sim \Gamma ^{-1}(\alpha ,\beta )\!}
has an inverse-gamma distribution. Then
(
x
,
σ
2
)
{\displaystyle (x,\sigma ^{2})}
has a normal-inverse-gamma distribution, denoted as
(
x
,
σ
2
)
∼
N-
Γ
−
1
(
μ
,
λ
,
α
,
β
)
.
{\displaystyle (x,\sigma ^{2})\sim {\text{N-}}\Gamma ^{-1}(\mu ,\lambda ,\alpha ,\beta )\!.}
(
NIG
{\displaystyle {\text{NIG}}}
is also used instead of
N-
Γ
−
1
.
{\displaystyle {\text{N-}}\Gamma ^{-1}.}
)
The normal-inverse-Wishart distribution is a generalization of the normal-inverse-gamma distribution that is defined over multivariate random variables.
Characterization
= Probability density function
=f
(
x
,
σ
2
∣
μ
,
λ
,
α
,
β
)
=
λ
σ
2
π
β
α
Γ
(
α
)
(
1
σ
2
)
α
+
1
exp
(
−
2
β
+
λ
(
x
−
μ
)
2
2
σ
2
)
{\displaystyle f(x,\sigma ^{2}\mid \mu ,\lambda ,\alpha ,\beta )={\frac {\sqrt {\lambda }}{\sigma {\sqrt {2\pi }}}}\,{\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\,\left({\frac {1}{\sigma ^{2}}}\right)^{\alpha +1}\exp \left(-{\frac {2\beta +\lambda (x-\mu )^{2}}{2\sigma ^{2}}}\right)}
For the multivariate form where
x
{\displaystyle \mathbf {x} }
is a
k
×
1
{\displaystyle k\times 1}
random vector,
f
(
x
,
σ
2
∣
μ
,
V
−
1
,
α
,
β
)
=
|
V
|
−
1
/
2
(
2
π
)
−
k
/
2
β
α
Γ
(
α
)
(
1
σ
2
)
α
+
1
+
k
/
2
exp
(
−
2
β
+
(
x
−
μ
)
T
V
−
1
(
x
−
μ
)
2
σ
2
)
.
{\displaystyle f(\mathbf {x} ,\sigma ^{2}\mid \mu ,\mathbf {V} ^{-1},\alpha ,\beta )=|\mathbf {V} |^{-1/2}{(2\pi )^{-k/2}}\,{\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\,\left({\frac {1}{\sigma ^{2}}}\right)^{\alpha +1+k/2}\exp \left(-{\frac {2\beta +(\mathbf {x} -{\boldsymbol {\mu }})^{T}\mathbf {V} ^{-1}(\mathbf {x} -{\boldsymbol {\mu }})}{2\sigma ^{2}}}\right).}
where
|
V
|
{\displaystyle |\mathbf {V} |}
is the determinant of the
k
×
k
{\displaystyle k\times k}
matrix
V
{\displaystyle \mathbf {V} }
. Note how this last equation reduces to the first form if
k
=
1
{\displaystyle k=1}
so that
x
,
V
,
μ
{\displaystyle \mathbf {x} ,\mathbf {V} ,{\boldsymbol {\mu }}}
are scalars.
Alternative parameterization
It is also possible to let
γ
=
1
/
λ
{\displaystyle \gamma =1/\lambda }
in which case the pdf becomes
f
(
x
,
σ
2
∣
μ
,
γ
,
α
,
β
)
=
1
σ
2
π
γ
β
α
Γ
(
α
)
(
1
σ
2
)
α
+
1
exp
(
−
2
γ
β
+
(
x
−
μ
)
2
2
γ
σ
2
)
{\displaystyle f(x,\sigma ^{2}\mid \mu ,\gamma ,\alpha ,\beta )={\frac {1}{\sigma {\sqrt {2\pi \gamma }}}}\,{\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\,\left({\frac {1}{\sigma ^{2}}}\right)^{\alpha +1}\exp \left(-{\frac {2\gamma \beta +(x-\mu )^{2}}{2\gamma \sigma ^{2}}}\right)}
In the multivariate form, the corresponding change would be to regard the covariance matrix
V
{\displaystyle \mathbf {V} }
instead of its inverse
V
−
1
{\displaystyle \mathbf {V} ^{-1}}
as a parameter.
= Cumulative distribution function
=F
(
x
,
σ
2
∣
μ
,
λ
,
α
,
β
)
=
e
−
β
σ
2
(
β
σ
2
)
α
(
erf
(
λ
(
x
−
μ
)
2
σ
)
+
1
)
2
σ
2
Γ
(
α
)
{\displaystyle F(x,\sigma ^{2}\mid \mu ,\lambda ,\alpha ,\beta )={\frac {e^{-{\frac {\beta }{\sigma ^{2}}}}\left({\frac {\beta }{\sigma ^{2}}}\right)^{\alpha }\left(\operatorname {erf} \left({\frac {{\sqrt {\lambda }}(x-\mu )}{{\sqrt {2}}\sigma }}\right)+1\right)}{2\sigma ^{2}\Gamma (\alpha )}}}
Properties
= Marginal distributions
=Given
(
x
,
σ
2
)
∼
N-
Γ
−
1
(
μ
,
λ
,
α
,
β
)
.
{\displaystyle (x,\sigma ^{2})\sim {\text{N-}}\Gamma ^{-1}(\mu ,\lambda ,\alpha ,\beta )\!.}
as above,
σ
2
{\displaystyle \sigma ^{2}}
by itself follows an inverse gamma distribution:
σ
2
∼
Γ
−
1
(
α
,
β
)
{\displaystyle \sigma ^{2}\sim \Gamma ^{-1}(\alpha ,\beta )\!}
while
α
λ
β
(
x
−
μ
)
{\displaystyle {\sqrt {\frac {\alpha \lambda }{\beta }}}(x-\mu )}
follows a t distribution with
2
α
{\displaystyle 2\alpha }
degrees of freedom.
In the multivariate case, the marginal distribution of
x
{\displaystyle \mathbf {x} }
is a multivariate t distribution:
x
∼
t
2
α
(
μ
,
β
α
V
)
{\displaystyle \mathbf {x} \sim t_{2\alpha }({\boldsymbol {\mu }},{\frac {\beta }{\alpha }}\mathbf {V} )\!}
= Summation
== Scaling
=Suppose
(
x
,
σ
2
)
∼
N-
Γ
−
1
(
μ
,
λ
,
α
,
β
)
.
{\displaystyle (x,\sigma ^{2})\sim {\text{N-}}\Gamma ^{-1}(\mu ,\lambda ,\alpha ,\beta )\!.}
Then for
c
>
0
{\displaystyle c>0}
,
(
c
x
,
c
σ
2
)
∼
N-
Γ
−
1
(
c
μ
,
λ
/
c
,
α
,
c
β
)
.
{\displaystyle (cx,c\sigma ^{2})\sim {\text{N-}}\Gamma ^{-1}(c\mu ,\lambda /c,\alpha ,c\beta )\!.}
Proof: To prove this let
(
x
,
σ
2
)
∼
N-
Γ
−
1
(
μ
,
λ
,
α
,
β
)
{\displaystyle (x,\sigma ^{2})\sim {\text{N-}}\Gamma ^{-1}(\mu ,\lambda ,\alpha ,\beta )}
and fix
c
>
0
{\displaystyle c>0}
. Defining
Y
=
(
Y
1
,
Y
2
)
=
(
c
x
,
c
σ
2
)
{\displaystyle Y=(Y_{1},Y_{2})=(cx,c\sigma ^{2})}
, observe that the PDF of the random variable
Y
{\displaystyle Y}
evaluated at
(
y
1
,
y
2
)
{\displaystyle (y_{1},y_{2})}
is given by
1
/
c
2
{\displaystyle 1/c^{2}}
times the PDF of a
N-
Γ
−
1
(
μ
,
λ
,
α
,
β
)
{\displaystyle {\text{N-}}\Gamma ^{-1}(\mu ,\lambda ,\alpha ,\beta )}
random variable evaluated at
(
y
1
/
c
,
y
2
/
c
)
{\displaystyle (y_{1}/c,y_{2}/c)}
. Hence the PDF of
Y
{\displaystyle Y}
evaluated at
(
y
1
,
y
2
)
{\displaystyle (y_{1},y_{2})}
is given by :
f
Y
(
y
1
,
y
2
)
=
1
c
2
λ
2
π
y
2
/
c
β
α
Γ
(
α
)
(
1
y
2
/
c
)
α
+
1
exp
(
−
2
β
+
λ
(
y
1
/
c
−
μ
)
2
2
y
2
/
c
)
=
λ
/
c
2
π
y
2
(
c
β
)
α
Γ
(
α
)
(
1
y
2
)
α
+
1
exp
(
−
2
c
β
+
(
λ
/
c
)
(
y
1
−
c
μ
)
2
2
y
2
)
.
{\displaystyle f_{Y}(y_{1},y_{2})={\frac {1}{c^{2}}}{\frac {\sqrt {\lambda }}{\sqrt {2\pi y_{2}/c}}}\,{\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\,\left({\frac {1}{y_{2}/c}}\right)^{\alpha +1}\exp \left(-{\frac {2\beta +\lambda (y_{1}/c-\mu )^{2}}{2y_{2}/c}}\right)={\frac {\sqrt {\lambda /c}}{\sqrt {2\pi y_{2}}}}\,{\frac {(c\beta )^{\alpha }}{\Gamma (\alpha )}}\,\left({\frac {1}{y_{2}}}\right)^{\alpha +1}\exp \left(-{\frac {2c\beta +(\lambda /c)\,(y_{1}-c\mu )^{2}}{2y_{2}}}\right).\!}
The right hand expression is the PDF for a
N-
Γ
−
1
(
c
μ
,
λ
/
c
,
α
,
c
β
)
{\displaystyle {\text{N-}}\Gamma ^{-1}(c\mu ,\lambda /c,\alpha ,c\beta )}
random variable evaluated at
(
y
1
,
y
2
)
{\displaystyle (y_{1},y_{2})}
, which completes the proof.
= Exponential family
=Normal-inverse-gamma distributions form an exponential family with natural parameters
θ
1
=
−
λ
2
{\displaystyle \textstyle \theta _{1}={\frac {-\lambda }{2}}}
,
θ
2
=
λ
μ
{\displaystyle \textstyle \theta _{2}=\lambda \mu }
,
θ
3
=
α
{\displaystyle \textstyle \theta _{3}=\alpha }
, and
θ
4
=
−
β
+
−
λ
μ
2
2
{\displaystyle \textstyle \theta _{4}=-\beta +{\frac {-\lambda \mu ^{2}}{2}}}
and sufficient statistics
T
1
=
x
2
σ
2
{\displaystyle \textstyle T_{1}={\frac {x^{2}}{\sigma ^{2}}}}
,
T
2
=
x
σ
2
{\displaystyle \textstyle T_{2}={\frac {x}{\sigma ^{2}}}}
,
T
3
=
log
(
1
σ
2
)
{\displaystyle \textstyle T_{3}=\log {\big (}{\frac {1}{\sigma ^{2}}}{\big )}}
, and
T
4
=
1
σ
2
{\displaystyle \textstyle T_{4}={\frac {1}{\sigma ^{2}}}}
.
= Information entropy
== Kullback–Leibler divergence
=Measures difference between two distributions.
Maximum likelihood estimation
Posterior distribution of the parameters
See the articles on normal-gamma distribution and conjugate prior.
Interpretation of the parameters
See the articles on normal-gamma distribution and conjugate prior.
Generating normal-inverse-gamma random variates
Generation of random variates is straightforward:
Sample
σ
2
{\displaystyle \sigma ^{2}}
from an inverse gamma distribution with parameters
α
{\displaystyle \alpha }
and
β
{\displaystyle \beta }
Sample
x
{\displaystyle x}
from a normal distribution with mean
μ
{\displaystyle \mu }
and variance
σ
2
/
λ
{\displaystyle \sigma ^{2}/\lambda }
Related distributions
The normal-gamma distribution is the same distribution parameterized by precision rather than variance
A generalization of this distribution which allows for a multivariate mean and a completely unknown positive-definite covariance matrix
σ
2
V
{\displaystyle \sigma ^{2}\mathbf {V} }
(whereas in the multivariate inverse-gamma distribution the covariance matrix is regarded as known up to the scale factor
σ
2
{\displaystyle \sigma ^{2}}
) is the normal-inverse-Wishart distribution
See also
Compound probability distribution
References
Denison, David G. T.; Holmes, Christopher C.; Mallick, Bani K.; Smith, Adrian F. M. (2002) Bayesian Methods for Nonlinear Classification and Regression, Wiley. ISBN 0471490369
Koch, Karl-Rudolf (2007) Introduction to Bayesian Statistics (2nd Edition), Springer. ISBN 354072723X