• Source: Range (statistics)
    • In descriptive statistics, the range of a set of data is size of the narrowest interval which contains all the data.
      It is calculated as the difference between the largest and smallest values (also known as the sample maximum and minimum).
      It is expressed in the same units as the data.
      The range provides an indication of statistical dispersion. Since it only depends on two of the observations, it is most useful in representing the dispersion of small data sets.


      For continuous IID random variables


      For n independent and identically distributed continuous random variables X1, X2, ..., Xn with the cumulative distribution function G(x) and a probability density function g(x), let T denote the range of them, that is, T= max(X1, X2, ..., Xn)-
      min(X1, X2, ..., Xn).


      = Distribution

      =
      The range, T, has the cumulative distribution function




      F
      (
      t
      )
      =
      n










      g
      (
      x
      )
      [
      G
      (
      x
      +
      t
      )

      G
      (
      x
      )

      ]

      n

      1




      d

      x
      .


      {\displaystyle F(t)=n\int _{-\infty }^{\infty }g(x)[G(x+t)-G(x)]^{n-1}\,{\text{d}}x.}


      Gumbel notes that the "beauty of this formula is completely marred by the facts that, in general, we cannot express G(x + t) by G(x), and that the numerical integration is lengthy and tiresome.": 385 
      If the distribution of each Xi is limited to the right (or left) then the asymptotic distribution of the range is equal to the asymptotic distribution of the largest (smallest) value. For more general distributions the asymptotic distribution can be expressed as a Bessel function.


      = Moments

      =
      The mean range is given by




      n



      0


      1


      x
      (
      G
      )
      [

      G

      n

      1



      (
      1

      G

      )

      n

      1


      ]


      d

      G


      {\displaystyle n\int _{0}^{1}x(G)[G^{n-1}-(1-G)^{n-1}]\,{\text{d}}G}


      where x(G) is the inverse function. In the case where each of the Xi has a standard normal distribution, the mean range is given by














      (
      1

      (
      1

      Φ
      (
      x
      )

      )

      n



      Φ
      (
      x

      )

      n


      )


      d

      x
      .


      {\displaystyle \int _{-\infty }^{\infty }(1-(1-\Phi (x))^{n}-\Phi (x)^{n})\,{\text{d}}x.}



      For continuous non-IID random variables


      For n nonidentically distributed independent continuous random variables X1, X2, ..., Xn with cumulative distribution functions G1(x), G2(x), ..., Gn(x) and probability density functions g1(x), g2(x), ..., gn(x), the range has cumulative distribution function




      F
      (
      t
      )
      =



      i
      =
      1


      n













      g

      i


      (
      x
      )



      j
      =
      1
      ,
      j

      i


      n


      [

      G

      j


      (
      x
      +
      t
      )


      G

      j


      (
      x
      )
      ]


      d

      x
      .


      {\displaystyle F(t)=\sum _{i=1}^{n}\int _{-\infty }^{\infty }g_{i}(x)\prod _{j=1,j\neq i}^{n}[G_{j}(x+t)-G_{j}(x)]\,{\text{d}}x.}



      For discrete IID random variables


      For n independent and identically distributed discrete random variables X1, X2, ..., Xn with cumulative distribution function G(x) and probability mass function g(x) the range of the Xi is the range of a sample of size n from a population with distribution function G(x). We can assume without loss of generality that the support of each Xi is {1,2,3,...,N} where N is a positive integer or infinity.


      = Distribution

      =
      The range has probability mass function




      f
      (
      t
      )
      =


      {






      x
      =
      1


      N


      [
      g
      (
      x
      )

      ]

      n




      t
      =
      0







      x
      =
      1


      N

      t



      (






      [
      G
      (
      x
      +
      t
      )

      G
      (
      x

      1
      )

      ]

      n
















      [
      G
      (
      x
      +
      t
      )

      G
      (
      x
      )

      ]

      n
















      [
      G
      (
      x
      +
      t

      1
      )

      G
      (
      x

      1
      )

      ]

      n









      +






      [
      G
      (
      x
      +
      t

      1
      )

      G
      (
      x
      )

      ]

      n






      )



      t
      =
      1
      ,
      2
      ,
      3

      ,
      N

      1.








      {\displaystyle f(t)={\begin{cases}\sum _{x=1}^{N}[g(x)]^{n}&t=0\\[6pt]\sum _{x=1}^{N-t}\left({\begin{alignedat}{2}&[G(x+t)-G(x-1)]^{n}\\{}-{}&[G(x+t)-G(x)]^{n}\\{}-{}&[G(x+t-1)-G(x-1)]^{n}\\{}+{}&[G(x+t-1)-G(x)]^{n}\\\end{alignedat}}\right)&t=1,2,3\ldots ,N-1.\end{cases}}}



      Example


      If we suppose that g(x) = 1/N, the discrete uniform distribution for all x, then we find




      f
      (
      t
      )
      =


      {





      1

      N

      n

      1






      t
      =
      0







      x
      =
      1


      N

      t



      (



      [



      t
      +
      1

      N


      ]


      n



      2


      [


      t
      N


      ]


      n


      +


      [



      t

      1

      N


      ]


      n



      )



      t
      =
      1
      ,
      2
      ,
      3

      ,
      N

      1.








      {\displaystyle f(t)={\begin{cases}{\frac {1}{N^{n-1}}}&t=0\\[4pt]\sum _{x=1}^{N-t}\left(\left[{\frac {t+1}{N}}\right]^{n}-2\left[{\frac {t}{N}}\right]^{n}+\left[{\frac {t-1}{N}}\right]^{n}\right)&t=1,2,3\ldots ,N-1.\end{cases}}}



      Derivation


      The probability of having a specific range value, t, can be determined by adding the probabilities of having two samples differing by t, and every other sample having a value between the two extremes.
      The probability of one sample having a value of x is



      n
      g
      (
      x
      )


      {\displaystyle ng(x)}

      . The probability of another having a value t greater than x is:




      (
      n

      1
      )
      g
      (
      x
      +
      t
      )
      .


      {\displaystyle (n-1)g(x+t).}


      The probability of all other values lying between these two extremes is:






      (




      x


      x
      +
      t


      g
      (
      x
      )


      d

      x

      )


      n

      2


      =


      (

      G
      (
      x
      +
      t
      )

      G
      (
      x
      )

      )


      n

      2


      .


      {\displaystyle \left(\int _{x}^{x+t}g(x)\,{\text{d}}x\right)^{n-2}=\left(G(x+t)-G(x)\right)^{n-2}.}


      Combining the three together yields:




      f
      (
      t
      )
      =
      n
      (
      n

      1
      )










      g
      (
      x
      )
      g
      (
      x
      +
      t
      )
      [
      G
      (
      x
      +
      t
      )

      G
      (
      x
      )

      ]

      n

      2




      d

      x


      {\displaystyle f(t)=n(n-1)\int _{-\infty }^{\infty }g(x)g(x+t)[G(x+t)-G(x)]^{n-2}\,{\text{d}}x}



      Related quantities


      The range is a specific example of order statistics. In particular, the range is a linear function of order statistics, which brings it into the scope of L-estimation.


      See also



      Interdecile range
      Interquartile range
      Studentized range


      References

    Kata Kunci Pencarian: