Search Results for “misconceptions about the normal distribution”

Source: Misconceptions about the normal distribution

Students of statistics and probability theory sometimes develop misconceptions about the normal distribution, ideas that may seem plausible but are mathematically untrue. For example, it is sometimes mistakenly thought that two linearly uncorrelated, normally distributed random variables must be statistically independent. However, this is untrue, as can be demonstrated by counterexample. Likewise, it is sometimes mistakenly thought that a linear combination of normally distributed random variables will itself be normally distributed, but again, counterexamples prove this wrong.
To say that the pair

(
X
,
Y
)

{\displaystyle (X,Y)}

of random variables has a bivariate normal distribution means that every linear combination

a
X
+
b
Y

{\displaystyle aX+bY}

of

X

{\displaystyle X}

and

Y

{\displaystyle Y}

for constant (i.e. not random) coefficients

a

{\displaystyle a}

and

b

{\displaystyle b}

(not both equal to zero) has a univariate normal distribution. In that case, if

X

{\displaystyle X}

and

Y

{\displaystyle Y}

are uncorrelated then they are independent. However, it is possible for two random variables

X

{\displaystyle X}

and

Y

{\displaystyle Y}

to be so distributed jointly that each one alone is marginally normally distributed, and they are uncorrelated, but they are not independent; examples are given below.

Examples

= A symmetric example

1|-1/2
.
Finally, the distribution of the simple linear combination

X
+
Y

{\displaystyle X+Y}

concentrates positive probability at 0:

Pr
⁡
(
X
+
Y
=
0
)
=
1

/

2

{\displaystyle \operatorname {Pr} (X+Y=0)=1/2}

. Therefore, the random variable

X
+
Y

{\displaystyle X+Y}

is not normally distributed, and so also

X

{\displaystyle X}

and

Y

{\displaystyle Y}

are not jointly normally distributed (by the definition above).

= An asymmetric example

=

Suppose

X

{\displaystyle X}

has a normal distribution with expected value 0 and variance 1. Let

Y
=

{

X

if

|
X
|

≤
c

−
X

if

|
X
|

>
c

{\displaystyle Y=\left\{{\begin{matrix}X&{\text{if }}\left|X\right|\leq c\\-X&{\text{if }}\left|X\right|>c\end{matrix}}\right.}

where

c

{\displaystyle c}

is a positive number to be specified below. If

c

{\displaystyle c}

is very small, then the correlation

corr
⁡
(
X
,
Y
)

{\displaystyle \operatorname {corr} (X,Y)}

is near

−
1

{\displaystyle -1}

if

c

{\displaystyle c}

is very large, then

corr
⁡
(
X
,
Y
)

{\displaystyle \operatorname {corr} (X,Y)}

is near 1. Since the correlation is a continuous function of

c

{\displaystyle c}

, the intermediate value theorem implies there is some particular value of

c

{\displaystyle c}

that makes the correlation 0. That value is approximately 1.54. In that case,

X

{\displaystyle X}

and

Y

{\displaystyle Y}

are uncorrelated, but they are clearly not independent, since

X

{\displaystyle X}

completely determines

Y

{\displaystyle Y}

.
To see that

Y

{\displaystyle Y}

is normally distributed—indeed, that its distribution is the same as that of

X

{\displaystyle X}

—one may compute its cumulative distribution function:

Pr
(
Y
≤
x
)

=
Pr
(
{

|

X

|

≤
c

and

X
≤
x
}

or

{

|

X

|

>
c

and

−
X
≤
x
}
)

=
Pr
(

|

X

|

≤
c

and

X
≤
x
)
+
Pr
(

|

X

|

>
c

and

−
X
≤
x
)

=
Pr
(

|

X

|

≤
c

and

X
≤
x
)
+
Pr
(

|

X

|

>
c

and

X
≤
x
)

=
Pr
(
X
≤
x
)
,

{\displaystyle {\begin{aligned}\Pr(Y\leq x)&=\Pr(\{|X|\leq c{\text{ and }}X\leq x\}{\text{ or }}\{|X|>c{\text{ and }}-X\leq x\})\\&=\Pr(|X|\leq c{\text{ and }}X\leq x)+\Pr(|X|>c{\text{ and }}-X\leq x)\\&=\Pr(|X|\leq c{\text{ and }}X\leq x)+\Pr(|X|>c{\text{ and }}X\leq x)\\&=\Pr(X\leq x),\end{aligned}}}

where the next-to-last equality follows from the symmetry of the distribution of

X

{\displaystyle X}

and the symmetry of the condition that

|

X

|

≤
c

{\displaystyle |X|\leq c}

.
In this example, the difference

X
−
Y

{\displaystyle X-Y}

is nowhere near being normally distributed, since it has a substantial probability (about 0.88) of it being equal to 0. By contrast, the normal distribution, being a continuous distribution, has no discrete part—that is, it does not concentrate more than zero probability at any single point. Consequently

X

{\displaystyle X}

and

Y

{\displaystyle Y}

are not jointly normally distributed, even though they are separately normally distributed.

= Examples with support almost everywhere in the plane

=
Suppose that the coordinates

(
X
,
Y
)

{\displaystyle (X,Y)}

of a random point in the plane are chosen according to the probability density function

p
(
x
,
y
)
=

1

2
π

3

[

exp
⁡

(

−

2
3

(

x

2

+
x
y
+

y

2

)

)

+
exp
⁡

(

−

2
3

(

x

2

−
x
y
+

y

2

)

)

]

.

{\displaystyle p(x,y)={\frac {1}{2\pi {\sqrt {3}}}}\left[\exp \left(-{\frac {2}{3}}(x^{2}+xy+y^{2})\right)+\exp \left(-{\frac {2}{3}}(x^{2}-xy+y^{2})\right)\right].}

Then the random variables

X

{\displaystyle X}

and

Y

{\displaystyle Y}

are uncorrelated, and each of them is normally distributed (with mean 0 and variance 1), but they are not independent.: 93
It is well-known that the ratio

C

{\displaystyle C}

of two independent standard normal random deviates

X

i

{\displaystyle X_{i}}

and

Y

i

{\displaystyle Y_{i}}

has a Cauchy distribution.: 122 One can equally well start with the Cauchy random variable

C

{\displaystyle C}

and derive the conditional distribution of

Y

i

{\displaystyle Y_{i}}

to satisfy the requirement that

X

i

=
C

Y

i

{\displaystyle X_{i}=CY_{i}}

with

X

i

{\displaystyle X_{i}}

and

Y

i

{\displaystyle Y_{i}}

independent and standard normal. It follows that

Y

i

=

W

i

χ

i

2

(

k
=
2

)

1
+

C

2

{\displaystyle Y_{i}=W_{i}{\sqrt {\frac {\chi _{i}^{2}\left(k=2\right)}{1+C^{2}}}}}

in which

W

i

{\displaystyle W_{i}}

is a Rademacher random variable and

χ

i

2

(

k
=
2

)

{\displaystyle \chi _{i}^{2}\left(k=2\right)}

is a Chi-squared random variable with two degrees of freedom.
Consider two sets of

(

X

i

,

Y

i

)

{\displaystyle \left(X_{i},Y_{i}\right)}

,

i
∈

{

1
,
2

}

{\displaystyle i\in \left\{1,2\right\}}

. Note that

C

{\displaystyle C}

is not indexed by

i

{\displaystyle i}

– that is, the same Cauchy random variable

C

{\displaystyle C}

is used in the definition of both

(

X

1

,

Y

1

)

{\displaystyle \left(X_{1},Y_{1}\right)}

and

(

X

2

,

Y

2

)

{\displaystyle \left(X_{2},Y_{2}\right)}

. This sharing of

C

{\displaystyle C}

results in dependences across indices: neither

X

1

{\displaystyle X_{1}}

nor

Y

1

{\displaystyle Y_{1}}

is independent of

Y

2

{\displaystyle Y_{2}}

. Nevertheless all of the

X

i

{\displaystyle X_{i}}

and

Y

i

{\displaystyle Y_{i}}

are uncorrelated as the bivariate distributions all have reflection symmetry across the axes.

The figure shows scatterplots of samples drawn from the above distribution. This furnishes two examples of bivariate distributions that are uncorrelated and have normal marginal distributions but are not independent. The left panel shows the joint distribution of

X

1

{\displaystyle X_{1}}

and

Y

2

{\displaystyle Y_{2}}

; the distribution has support everywhere but at the origin. The right panel shows the joint distribution of

Y

1

{\displaystyle Y_{1}}

and

Y

2

{\displaystyle Y_{2}}

; the distribution has support everywhere except along the axes and has a discontinuity at the origin: the density diverges when the origin is approached along any straight path except along the axes.