• Source: Normal-inverse-gamma distribution
    • In probability theory and statistics, the normal-inverse-gamma distribution (or Gaussian-inverse-gamma distribution) is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and variance.


      Definition


      Suppose




      x


      σ

      2


      ,
      μ
      ,
      λ


      N

      (
      μ
      ,

      σ

      2



      /

      λ
      )




      {\displaystyle x\mid \sigma ^{2},\mu ,\lambda \sim \mathrm {N} (\mu ,\sigma ^{2}/\lambda )\,\!}


      has a normal distribution with mean



      μ


      {\displaystyle \mu }

      and variance




      σ

      2



      /

      λ


      {\displaystyle \sigma ^{2}/\lambda }

      , where





      σ

      2



      α
      ,
      β


      Γ


      1


      (
      α
      ,
      β
      )



      {\displaystyle \sigma ^{2}\mid \alpha ,\beta \sim \Gamma ^{-1}(\alpha ,\beta )\!}


      has an inverse-gamma distribution. Then



      (
      x
      ,

      σ

      2


      )


      {\displaystyle (x,\sigma ^{2})}


      has a normal-inverse-gamma distribution, denoted as




      (
      x
      ,

      σ

      2


      )


      N-


      Γ


      1


      (
      μ
      ,
      λ
      ,
      α
      ,
      β
      )

      .


      {\displaystyle (x,\sigma ^{2})\sim {\text{N-}}\Gamma ^{-1}(\mu ,\lambda ,\alpha ,\beta )\!.}


      (




      NIG



      {\displaystyle {\text{NIG}}}

      is also used instead of




      N-


      Γ


      1


      .


      {\displaystyle {\text{N-}}\Gamma ^{-1}.}

      )
      The normal-inverse-Wishart distribution is a generalization of the normal-inverse-gamma distribution that is defined over multivariate random variables.


      Characterization




      = Probability density function

      =




      f
      (
      x
      ,

      σ

      2



      μ
      ,
      λ
      ,
      α
      ,
      β
      )
      =



      λ


      σ


      2
      π









      β

      α



      Γ
      (
      α
      )






      (


      1

      σ

      2




      )


      α
      +
      1


      exp


      (





      2
      β
      +
      λ
      (
      x

      μ

      )

      2




      2

      σ

      2






      )



      {\displaystyle f(x,\sigma ^{2}\mid \mu ,\lambda ,\alpha ,\beta )={\frac {\sqrt {\lambda }}{\sigma {\sqrt {2\pi }}}}\,{\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\,\left({\frac {1}{\sigma ^{2}}}\right)^{\alpha +1}\exp \left(-{\frac {2\beta +\lambda (x-\mu )^{2}}{2\sigma ^{2}}}\right)}


      For the multivariate form where




      x



      {\displaystyle \mathbf {x} }

      is a



      k
      ×
      1


      {\displaystyle k\times 1}

      random vector,




      f
      (

      x

      ,

      σ

      2



      μ
      ,


      V



      1


      ,
      α
      ,
      β
      )
      =

      |


      V



      |



      1

      /

      2



      (
      2
      π

      )


      k

      /

      2







      β

      α



      Γ
      (
      α
      )






      (


      1

      σ

      2




      )


      α
      +
      1
      +
      k

      /

      2


      exp


      (





      2
      β
      +
      (

      x



      μ


      )

      T




      V



      1


      (

      x



      μ

      )


      2

      σ

      2






      )

      .


      {\displaystyle f(\mathbf {x} ,\sigma ^{2}\mid \mu ,\mathbf {V} ^{-1},\alpha ,\beta )=|\mathbf {V} |^{-1/2}{(2\pi )^{-k/2}}\,{\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\,\left({\frac {1}{\sigma ^{2}}}\right)^{\alpha +1+k/2}\exp \left(-{\frac {2\beta +(\mathbf {x} -{\boldsymbol {\mu }})^{T}\mathbf {V} ^{-1}(\mathbf {x} -{\boldsymbol {\mu }})}{2\sigma ^{2}}}\right).}


      where




      |


      V


      |



      {\displaystyle |\mathbf {V} |}

      is the determinant of the



      k
      ×
      k


      {\displaystyle k\times k}

      matrix




      V



      {\displaystyle \mathbf {V} }

      . Note how this last equation reduces to the first form if



      k
      =
      1


      {\displaystyle k=1}

      so that




      x

      ,

      V

      ,

      μ



      {\displaystyle \mathbf {x} ,\mathbf {V} ,{\boldsymbol {\mu }}}

      are scalars.


      Alternative parameterization


      It is also possible to let



      γ
      =
      1

      /

      λ


      {\displaystyle \gamma =1/\lambda }

      in which case the pdf becomes




      f
      (
      x
      ,

      σ

      2



      μ
      ,
      γ
      ,
      α
      ,
      β
      )
      =


      1

      σ


      2
      π
      γ









      β

      α



      Γ
      (
      α
      )






      (


      1

      σ

      2




      )


      α
      +
      1


      exp


      (





      2
      γ
      β
      +
      (
      x

      μ

      )

      2




      2
      γ

      σ

      2






      )



      {\displaystyle f(x,\sigma ^{2}\mid \mu ,\gamma ,\alpha ,\beta )={\frac {1}{\sigma {\sqrt {2\pi \gamma }}}}\,{\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\,\left({\frac {1}{\sigma ^{2}}}\right)^{\alpha +1}\exp \left(-{\frac {2\gamma \beta +(x-\mu )^{2}}{2\gamma \sigma ^{2}}}\right)}


      In the multivariate form, the corresponding change would be to regard the covariance matrix




      V



      {\displaystyle \mathbf {V} }

      instead of its inverse





      V



      1




      {\displaystyle \mathbf {V} ^{-1}}

      as a parameter.


      = Cumulative distribution function

      =




      F
      (
      x
      ,

      σ

      2



      μ
      ,
      λ
      ,
      α
      ,
      β
      )
      =




      e




      β

      σ

      2








      (


      β

      σ

      2




      )


      α



      (

      erf


      (





      λ


      (
      x

      μ
      )




      2


      σ



      )

      +
      1

      )



      2

      σ

      2


      Γ
      (
      α
      )





      {\displaystyle F(x,\sigma ^{2}\mid \mu ,\lambda ,\alpha ,\beta )={\frac {e^{-{\frac {\beta }{\sigma ^{2}}}}\left({\frac {\beta }{\sigma ^{2}}}\right)^{\alpha }\left(\operatorname {erf} \left({\frac {{\sqrt {\lambda }}(x-\mu )}{{\sqrt {2}}\sigma }}\right)+1\right)}{2\sigma ^{2}\Gamma (\alpha )}}}



      Properties




      = Marginal distributions

      =
      Given



      (
      x
      ,

      σ

      2


      )


      N-


      Γ


      1


      (
      μ
      ,
      λ
      ,
      α
      ,
      β
      )

      .


      {\displaystyle (x,\sigma ^{2})\sim {\text{N-}}\Gamma ^{-1}(\mu ,\lambda ,\alpha ,\beta )\!.}

      as above,




      σ

      2




      {\displaystyle \sigma ^{2}}

      by itself follows an inverse gamma distribution:





      σ

      2




      Γ


      1


      (
      α
      ,
      β
      )



      {\displaystyle \sigma ^{2}\sim \Gamma ^{-1}(\alpha ,\beta )\!}


      while







      α
      λ

      β



      (
      x

      μ
      )


      {\displaystyle {\sqrt {\frac {\alpha \lambda }{\beta }}}(x-\mu )}

      follows a t distribution with



      2
      α


      {\displaystyle 2\alpha }

      degrees of freedom.

      In the multivariate case, the marginal distribution of




      x



      {\displaystyle \mathbf {x} }

      is a multivariate t distribution:





      x



      t

      2
      α


      (

      μ

      ,


      β
      α



      V

      )



      {\displaystyle \mathbf {x} \sim t_{2\alpha }({\boldsymbol {\mu }},{\frac {\beta }{\alpha }}\mathbf {V} )\!}



      = Summation

      =


      = Scaling

      =
      Suppose




      (
      x
      ,

      σ

      2


      )


      N-


      Γ


      1


      (
      μ
      ,
      λ
      ,
      α
      ,
      β
      )

      .


      {\displaystyle (x,\sigma ^{2})\sim {\text{N-}}\Gamma ^{-1}(\mu ,\lambda ,\alpha ,\beta )\!.}


      Then for



      c
      >
      0


      {\displaystyle c>0}

      ,




      (
      c
      x
      ,
      c

      σ

      2


      )


      N-


      Γ


      1


      (
      c
      μ
      ,
      λ

      /

      c
      ,
      α
      ,
      c
      β
      )

      .


      {\displaystyle (cx,c\sigma ^{2})\sim {\text{N-}}\Gamma ^{-1}(c\mu ,\lambda /c,\alpha ,c\beta )\!.}


      Proof: To prove this let



      (
      x
      ,

      σ

      2


      )


      N-


      Γ


      1


      (
      μ
      ,
      λ
      ,
      α
      ,
      β
      )


      {\displaystyle (x,\sigma ^{2})\sim {\text{N-}}\Gamma ^{-1}(\mu ,\lambda ,\alpha ,\beta )}

      and fix



      c
      >
      0


      {\displaystyle c>0}

      . Defining



      Y
      =
      (

      Y

      1


      ,

      Y

      2


      )
      =
      (
      c
      x
      ,
      c

      σ

      2


      )


      {\displaystyle Y=(Y_{1},Y_{2})=(cx,c\sigma ^{2})}

      , observe that the PDF of the random variable



      Y


      {\displaystyle Y}

      evaluated at



      (

      y

      1


      ,

      y

      2


      )


      {\displaystyle (y_{1},y_{2})}

      is given by



      1

      /


      c

      2




      {\displaystyle 1/c^{2}}

      times the PDF of a




      N-


      Γ


      1


      (
      μ
      ,
      λ
      ,
      α
      ,
      β
      )


      {\displaystyle {\text{N-}}\Gamma ^{-1}(\mu ,\lambda ,\alpha ,\beta )}

      random variable evaluated at



      (

      y

      1



      /

      c
      ,

      y

      2



      /

      c
      )


      {\displaystyle (y_{1}/c,y_{2}/c)}

      . Hence the PDF of



      Y


      {\displaystyle Y}

      evaluated at



      (

      y

      1


      ,

      y

      2


      )


      {\displaystyle (y_{1},y_{2})}

      is given by :




      f

      Y


      (

      y

      1


      ,

      y

      2


      )
      =


      1

      c

      2







      λ


      2
      π

      y

      2



      /

      c







      β

      α



      Γ
      (
      α
      )






      (


      1


      y

      2



      /

      c



      )


      α
      +
      1


      exp


      (





      2
      β
      +
      λ
      (

      y

      1



      /

      c

      μ

      )

      2




      2

      y

      2



      /

      c




      )

      =



      λ

      /

      c


      2
      π

      y

      2









      (
      c
      β

      )

      α




      Γ
      (
      α
      )






      (


      1

      y

      2




      )


      α
      +
      1


      exp


      (





      2
      c
      β
      +
      (
      λ

      /

      c
      )

      (

      y

      1



      c
      μ

      )

      2




      2

      y

      2






      )

      .



      {\displaystyle f_{Y}(y_{1},y_{2})={\frac {1}{c^{2}}}{\frac {\sqrt {\lambda }}{\sqrt {2\pi y_{2}/c}}}\,{\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\,\left({\frac {1}{y_{2}/c}}\right)^{\alpha +1}\exp \left(-{\frac {2\beta +\lambda (y_{1}/c-\mu )^{2}}{2y_{2}/c}}\right)={\frac {\sqrt {\lambda /c}}{\sqrt {2\pi y_{2}}}}\,{\frac {(c\beta )^{\alpha }}{\Gamma (\alpha )}}\,\left({\frac {1}{y_{2}}}\right)^{\alpha +1}\exp \left(-{\frac {2c\beta +(\lambda /c)\,(y_{1}-c\mu )^{2}}{2y_{2}}}\right).\!}


      The right hand expression is the PDF for a




      N-


      Γ


      1


      (
      c
      μ
      ,
      λ

      /

      c
      ,
      α
      ,
      c
      β
      )


      {\displaystyle {\text{N-}}\Gamma ^{-1}(c\mu ,\lambda /c,\alpha ,c\beta )}

      random variable evaluated at



      (

      y

      1


      ,

      y

      2


      )


      {\displaystyle (y_{1},y_{2})}

      , which completes the proof.


      = Exponential family

      =
      Normal-inverse-gamma distributions form an exponential family with natural parameters





      θ

      1


      =




      λ

      2





      {\displaystyle \textstyle \theta _{1}={\frac {-\lambda }{2}}}

      ,





      θ

      2


      =
      λ
      μ



      {\displaystyle \textstyle \theta _{2}=\lambda \mu }

      ,





      θ

      3


      =
      α



      {\displaystyle \textstyle \theta _{3}=\alpha }

      , and





      θ

      4


      =

      β
      +




      λ

      μ

      2



      2





      {\displaystyle \textstyle \theta _{4}=-\beta +{\frac {-\lambda \mu ^{2}}{2}}}

      and sufficient statistics





      T

      1


      =



      x

      2



      σ

      2







      {\displaystyle \textstyle T_{1}={\frac {x^{2}}{\sigma ^{2}}}}

      ,





      T

      2


      =


      x

      σ

      2







      {\displaystyle \textstyle T_{2}={\frac {x}{\sigma ^{2}}}}

      ,





      T

      3


      =
      log



      (




      1

      σ

      2






      )





      {\displaystyle \textstyle T_{3}=\log {\big (}{\frac {1}{\sigma ^{2}}}{\big )}}

      , and





      T

      4


      =


      1

      σ

      2







      {\displaystyle \textstyle T_{4}={\frac {1}{\sigma ^{2}}}}

      .


      = Information entropy

      =


      = Kullback–Leibler divergence

      =
      Measures difference between two distributions.


      Maximum likelihood estimation




      Posterior distribution of the parameters


      See the articles on normal-gamma distribution and conjugate prior.


      Interpretation of the parameters


      See the articles on normal-gamma distribution and conjugate prior.


      Generating normal-inverse-gamma random variates


      Generation of random variates is straightforward:

      Sample




      σ

      2




      {\displaystyle \sigma ^{2}}

      from an inverse gamma distribution with parameters



      α


      {\displaystyle \alpha }

      and



      β


      {\displaystyle \beta }


      Sample



      x


      {\displaystyle x}

      from a normal distribution with mean



      μ


      {\displaystyle \mu }

      and variance




      σ

      2



      /

      λ


      {\displaystyle \sigma ^{2}/\lambda }



      Related distributions


      The normal-gamma distribution is the same distribution parameterized by precision rather than variance
      A generalization of this distribution which allows for a multivariate mean and a completely unknown positive-definite covariance matrix




      σ

      2



      V



      {\displaystyle \sigma ^{2}\mathbf {V} }

      (whereas in the multivariate inverse-gamma distribution the covariance matrix is regarded as known up to the scale factor




      σ

      2




      {\displaystyle \sigma ^{2}}

      ) is the normal-inverse-Wishart distribution


      See also


      Compound probability distribution


      References



      Denison, David G. T.; Holmes, Christopher C.; Mallick, Bani K.; Smith, Adrian F. M. (2002) Bayesian Methods for Nonlinear Classification and Regression, Wiley. ISBN 0471490369
      Koch, Karl-Rudolf (2007) Introduction to Bayesian Statistics (2nd Edition), Springer. ISBN 354072723X

    Kata Kunci Pencarian: