Normal distribution

Algebra ->  Algebra  -> Probability-and-statistics -> Normal distribution     (Log On)
Ad: Algebra Solved!™: algebra software that solves YOUR algebra homework problems with step-by-step help!

   

Normal distribution

Jump to: navigation, search
Normal
Probability density function
Probability density function for the normal distribution
The red line is the standard normal distribution
Cumulative distribution function
Cumulative distribution function for the normal distribution
Colors match the image above
Parameters μ location (real)
σ2 > 0 squared scale (real)
Support x \in\mathbb{R}\!
Probability density function (pdf) \frac{1}{\sigma \sqrt{2\pi} } \exp \left(-\frac{(x-\mu)^2}{2\sigma ^2} \right)
Cumulative distribution function (cdf) \frac12 \left(1+\mathrm{erf}\left( \frac{x-\mu}{\sigma\sqrt2}\right) \right)
Mean μ
Median μ
Mode μ
Variance σ2
Skewness 0
Excess kurtosis 0
Entropy \ln\left(\sigma\sqrt{2\,\pi\,e}\right)\!
Moment-generating function (mgf) M_X(t)= \exp\left(\mu\,t+\frac{\sigma^2 t^2}{2}\right)
Characteristic function \chi_X(t)=\exp\left(\mu\,i\,t-\frac{\sigma^2 t^2}{2}\right)

The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields. Each member of the family may be defined by two parameters, location and scale: the mean ("average", μ) and variance (standard deviation squared, σ2) respectively. The standard normal distribution is the normal distribution with a mean of zero and a variance of one (the red curves in the plots to the right). Carl Friedrich Gauss became associated with this set of distributions when he analyzed astronomical data using them,[1] and defined the equation of its probability density function. It is often called the bell curve because the graph of its probability density resembles a bell.

The importance of the normal distribution as a model of quantitative phenomena in the natural and behavioral sciences is due in part to the central limit theorem. Many measurements, ranging from psychological[2] to physical phenomena (in particular, thermal noise) can be approximated, to varying degrees, by the normal distribution. While the mechanisms underlying these phenomena are often unknown, the use of the normal model can be theoretically justified by assuming that many small, independent effects are additively contributing to each observation. The normal distribution is also important for its relationship to least-squares estimation, one of the simplest and oldest methods of statistical estimation.

The normal distribution also arises in many areas of statistics. For example, the sampling distribution of the sample mean is approximately normal, even if the distribution of the population from which the sample is taken is not normal. In addition, the normal distribution maximizes information entropy among all distributions with known mean and variance, which makes it the natural choice of underlying distribution for data summarized in terms of sample mean and variance. The normal distribution is the most widely used family of distributions in statistics and many statistical tests are based on the assumption of normality. In probability theory, normal distributions arise as the limiting distributions of several continuous and discrete families of distributions.

Contents

[ History

The normal distribution was first introduced by Abraham de Moivre in an article in 1733, which was reprinted in the second edition of his The Doctrine of Chances, 1738 in the context of approximating certain binomial distributions for large n. His result was extended by Laplace in his book Analytical Theory of Probabilities (1812), and is now called the theorem of de Moivre-Laplace.

Laplace used the normal distribution in the analysis of errors of experiments. The important method of least squares was introduced by Legendre in 1805. Gauss, who claimed to have used the method since 1794, justified it rigorously in 1809 by assuming a normal distribution of the errors. The fact the distribution is sometimes called Gaussian is an example of Stigler's Law.

The name "bell curve" goes back to Esprit Jouffret who first used the term "bell surface" in 1872 for a bivariate normal with independent components. The name "normal distribution" was coined independently by Charles S. Peirce, Francis Galton and Wilhelm Lexis around 1875.[citation needed] Despite this terminology, other probability distributions may be more appropriate in some contexts; see the discussion of occurrence, below.

[ Characterization

There are various ways to characterize a probability distribution. The most visual is the probability density function (PDF). Equivalent ways are the cumulative distribution function, the moments, the cumulants, the characteristic function, the moment-generating function, the cumulant-generating function, and Maxwell's theorem. See probability distribution for a discussion.

To indicate that a real-valued random variable X is normally distributed with mean μ and variance σ² ≥ 0, we write

X \sim N(\mu, \sigma^2).\,\!

While it is certainly useful for certain limit theorems (e.g. asymptotic normality of estimators) and for the theory of Gaussian processes to consider the probability distribution concentrated at μ (see Dirac measure) as a normal distribution with mean μ and variance σ² = 0, this degenerate case is often excluded from the considerations because no density with respect to the Lebesgue measure exists.

The normal distribution may also be parameterized using a precision parameter τ, defined as the reciprocal of σ². This parameterization has an advantage in numerical applications where σ² is very close to zero and is more convenient to work with in analysis as τ is a natural parameter of the normal distribution.

[ Probability density function

Probability density function for the normal distribution

The continuous probability density function of the normal distribution is the Gaussian function

\varphi_{\mu,\sigma^2}(x) = \frac{1}{\sigma\sqrt{2\pi}} \,e^{ -\frac{(x- \mu)^2}{2\sigma^2}} = \frac{1}{\sigma} \varphi\left(\frac{x - \mu}{\sigma}\right),\quad x\in\mathbb{R},

where σ > 0 is the standard deviation, the real parameter μ is the expected value, and

\varphi(x)=\varphi_{0,1}(x)=\frac{e^{-x^2/2}}{\sqrt{2\pi\,}}, \,\quad x\in\mathbb{R},

is the density function of the "standard" normal distribution: i.e., the normal distribution with μ = 0 and σ = 1. The integral of \varphi_{\mu,\sigma^2} over the real line is equal to one as shown in the Gaussian integral article.

As a Gaussian function with the denominator of the exponent equal to 2, the standard normal density function \varphi_{} is an eigenfunction of the Fourier transform.

The probability density function has notable properties including:

  • symmetry about its mean μ
  • the mode and median both equal the mean μ
  • the inflection points of the curve occur one standard deviation away from the mean, i.e. at μσ and μ + σ.

[ Cumulative distribution function

Cumulative distribution function for the normal distribution

The cumulative distribution function (cdf) of a probability distribution, evaluated at a number (lower-case) x, is the probability of the event that a random variable (capital) X with that distribution is less than or equal to x. The cumulative distribution function of the normal distribution is expressed in terms of the density function as follows:

 \begin{align}
\Phi_{\mu,\sigma^2}(x)
&{}=\int_{-\infty}^x\varphi_{\mu,\sigma^2}(u)\,du\\
&{}=\frac{1}{\sigma\sqrt{2\pi}}
\int_{-\infty}^x
\exp
  \Bigl( -\frac{(u - \mu)^2}{2\sigma^2}
\ \Bigr)\, du ,\quad x\in\mathbb{R}\\
\end{align}

The standard normal cdf is just the general cdf evaluated with μ = 0 and σ = 1:


\Phi(x) = \Phi_{0,1}(x)
= \frac{1}{\sqrt{2\pi}}
\int_{-\infty}^x
\exp\Bigl(-\frac{u^2}{2}\Bigr)
\, du, \quad x\in\mathbb{R}.

The standard normal cdf can be expressed in terms of a special function called the error function, as


\Phi(x)
=\frac{1}{2} \Bigl[ 1 + \operatorname{erf} \Bigl( \frac{x}{\sqrt{2}} \Bigr) \Bigr],
\quad x\in\mathbb{R},

and the cdf itself can hence be expressed as


\Phi_{\mu,\sigma^2}(x)
=\frac{1}{2} \Bigl[ 1 + \operatorname{erf} \Bigl( \frac{x-\mu}{\sigma\sqrt{2}} \Bigr) \Bigr],
\quad x\in\mathbb{R}.

The complement of the standard normal cdf, 1 − Φ(x), is often denoted Q(x), and is sometimes referred to simply as the Q-function, especially in engineering texts.[3][4] This represents the tail probability of the Gaussian distribution. Other definitions of the Q-function, all of which are simple transformations of Φ, are also used occasionally.[5]

The inverse standard normal cumulative distribution function, or quantile function, can be expressed in terms of the inverse error function:


\Phi^{-1}(p)
= \sqrt2
\;\operatorname{erf}^{-1} (2p - 1),
\quad p\in(0,1),

and the inverse cumulative distribution function can hence be expressed as


\Phi_{\mu,\sigma^2}^{-1}(p)
= \mu + \sigma\Phi^{-1}(p)
= \mu + \sigma\sqrt2
\; \operatorname{erf}^{-1}(2p - 1),
\quad p\in(0,1).

This quantile function is sometimes called the probit function. There is no elementary primitive for the probit function. This is not to say merely that none is known, but rather that the non-existence of such an elementary primitive has been proven. Several accurate methods exist for approximating the quantile function for the normal distribution - see quantile function for a discussion and references.

The values Φ(x) may be approximated very accurately by a variety of methods, such as numerical integration, Taylor series, asymptotic series and continued fractions.

[ Strict lower and upper bounds for the cdf

For large x the standard normal cdf \scriptstyle\Phi(x) is close to 1 and \scriptstyle\Phi(-x)\,{=}\,1\,{-}\,\Phi(x) is close to 0. The elementary bounds


\frac{x}{1+x^2}\varphi(x)<1-\Phi(x)<\frac{\varphi(x)}{x}, \qquad x>0,

in terms of the density \scriptstyle\varphi are useful.

Using the substitution v = u²/2, the upper bound is derived as follows:


\begin{align}
1-\Phi(x)
&=\int_x^\infty\varphi(u)\,du\\
&<\int_x^\infty\frac ux\varphi(u)\,du
=\int_{x^2/2}^\infty\frac{e^{-v}}{x\sqrt{2\pi}}\,dv
=-\biggl.\frac{e^{-v}}{x\sqrt{2\pi}}\biggr|_{x^2/2}^\infty
=\frac{\varphi(x)}{x}.
\end{align}

Similarly, using \scriptstyle\varphi'(u)\,{=}\,-u\,\varphi(u) and the quotient rule,


\begin{align}
\Bigl(1+\frac1{x^2}\Bigr)(1-\Phi(x))
&=\int_x^\infty \Bigl(1+\frac1{x^2}\Bigr)\varphi(u)\,du\\
&>\int_x^\infty \Bigl(1+\frac1{u^2}\Bigr)\varphi(u)\,du
=-\biggl.\frac{\varphi(u)}u\biggr|_x^\infty
=\frac{\varphi(x)}x.
\end{align}

Solving for \scriptstyle 1\,{-}\,\Phi(x)\, provides the lower bound.

[ Generating functions

[ Moment generating function

The moment generating function is defined as the expected value of exp(tX). For a normal distribution, the moment generating function is


\begin{align}
M_X(t) & {} = \mathrm{E} \left[ \exp{(tX)} \right] \\
& {} = \int_{-\infty}^{\infty}  \frac{1}{\sigma \sqrt{2\pi} }
\exp{\left( -\frac{(x - \mu)^2}{2 \sigma^2} \right)}
\exp{(tx)} \, dx \\
& {} = \exp{ \left(  \mu t + \frac{\sigma^2 t^2}{2} \right)}
\end{align}

as can be seen by completing the square in the exponent.

[ Cumulant generating function

The cumulant generating function is the logarithm of the moment generating function: g(t) = μt + σ²t²/2. Since this is a quadratic polynomial in t, only the first two cumulants are nonzero.

[ Characteristic function

The characteristic function is defined as the expected value of exp(itX), where i is the imaginary unit. So the characteristic function is obtained by replacing t with it in the moment-generating function.

For a normal distribution, the characteristic function is

\begin{align}
\chi_X(t;\mu,\sigma) &{} = M_X(i t) = \mathrm{E}
\left[ \exp(i t X) \right] \\
&{}=
\int_{-\infty}^{\infty}
\frac{1}{\sigma \sqrt{2\pi}}
\exp
\left(- \frac{(x - \mu)^2}{2\sigma^2}
\right)
\exp(i t x)
\, dx \\
&{}=
\exp
\left(
i \mu t - \frac{\sigma^2 t^2}{2}
\right).
\end{align}

[ Properties

Some properties of the normal distribution:

  1. If X \sim N(\mu, \sigma^2) and a and b are real numbers, then a X + b \sim N(a \mu + b, (a \sigma)^2) (see expected value and variance).
  2. If X \sim N(\mu_X, \sigma^2_X) and Y \sim N(\mu_Y, \sigma^2_Y) are independent normal random variables, then:
    • Their sum is normally distributed with U = X + Y \sim N(\mu_X + \mu_Y, \sigma^2_X + \sigma^2_Y) (proof). Interestingly, the converse holds: if two independent random variables have a normally-distributed sum, then they must be normal themselves — this is known as Cramér's theorem.
    • Their difference is normally distributed with V = X - Y \sim N(\mu_X - \mu_Y, \sigma^2_X + \sigma^2_Y).
    • If the variances of X and Y are equal, then U and V are independent of each other.
    • The Kullback-Leibler divergence, D_{\rm KL}( X \| Y ) =
{ 1 \over 2 } \left( \log \left( { \sigma^2_Y \over \sigma^2_X } \right) + \frac{\sigma^2_X}{\sigma^2_Y} +
\frac{\left(\mu_Y - \mu_X\right)^2}{\sigma^2_Y} - 1\right).
  3. If X \sim N(0, \sigma^2_X) and Y \sim N(0, \sigma^2_Y) are independent normal random variables, then:
  4. If X_1, \dots, X_n are independent standard normal variables, then X_1^2 + \cdots + X_n^2 has a chi-square distribution with n degrees of freedom.
  5. If X_1,\dots,X_n are independent standard normal variables, then the sample mean \bar{X}=(X_1+\cdots+X_n)/n and sample variance S^2=((X_1-\bar{X})^2+\cdots+(X_n-\bar{X})^2)/(n-1) are independent. This property characterizes normal distributions (and helps to explain why the F-test is non-robust with respect to non-normality!)

[ Standardizing normal random variables

As a consequence of Property 1, it is possible to relate all normal random variables to the standard normal.

If X ~ N(μ,σ2), then

Z = \frac{X - \mu}{\sigma} \!

is a standard normal random variable: Z ~ N(0,1). An important consequence is that the cdf of a general normal distribution is therefore

\Pr(X \le x)
=
\Phi
\left(
\frac{x-\mu}{\sigma}
\right)
=
\frac{1}{2}
\left(
1 + \operatorname{erf}
\left(
  \frac{x-\mu}{\sigma\sqrt{2}}
\right)
\right)
.

Conversely, if Z is a standard normal distribution, Z ~ N(0,1), then

X = σZ + μ

is a normal random variable with mean μ and variance σ2.

The standard normal distribution has been tabulated (usually in the form of value of the cumulative distribution function Φ), and the other normal distributions are the simple transformations, as described above, of the standard one. Therefore, one can use tabulated values of the cdf of the standard normal distribution to find values of the cdf of a general normal distribution.

[ Moments

The first few moments of the normal distribution are:

Number Raw moment Central moment Cumulant
0 1 1
1 μ 0 μ
2 μ2 + σ2 σ2 σ2
3 μ3 + 3μσ2 0 0
4 μ4 + 6μ2σ2 + 3σ4 4 0
5 μ5 + 10μ3σ2 + 15μσ4 0 0
6 μ6 + 15μ4σ2 + 45μ2σ4 + 15σ6 15σ6 0
7 μ7 + 21μ5σ2 + 105μ3σ4 + 105μσ6 0 0
8 μ8 + 28μ6σ2 + 210μ4σ4 + 420μ2σ6 + 105σ8 105σ8 0

All cumulants of the normal distribution beyond the second are zero.

Higher central moments (of order 2k with μ = 0) are given by the formula

 E\left[X^{2k}\right]=\frac{(2k)!}{2^k k!} \sigma^{2k}.

[ The central limit theorem

Plot of the pdf of a normal distribution with μ = 12 and σ = 3, approximating the pdf of a binomial distribution with n = 48 and p = 1/4

Under certain conditions (such as being independent and identically-distributed with finite variance), the sum of a large number of random variables is approximately normally distributed — this is the central limit theorem.

The practical importance of the central limit theorem is that the normal cumulative distribution function can be used as an approximation to some other cumulative distribution functions, for example:

  • A binomial distribution with parameters n and p is approximately normal for large n and p not too close to 1 or 0 (some books recommend using this approximation only if np and n(1 − p) are both at least 5; in this case, a continuity correction should be applied).
    The approximating normal distribution has parameters μ = np, σ2 = np(1 − p).
  • A Poisson distribution with parameter λ is approximately normal for large λ.
    The approximating normal distribution has parameters μ = σ2 = λ.

Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution. A general upper bound of the approximation error of the cumulative distribution function is given by the Berry–Esséen theorem.

[ Infinite divisibility

The normal distributions are infinitely divisible probability distributions: Given a mean μ, a variance σ 2 ≥ 0, and a natural number n, the sum X1 + . . . + Xn of n independent random variables

X_1,X_2,\dots,X_n \sim N(\mu/n, \sigma^2\!/n)\,

has this specified normal distribution (to verify this, use characteristic functions or convolution and mathematical induction).

[ Stability

The normal distributions are strictly stable probability distributions.

[ Standard deviation and confidence intervals

Dark blue is less than one standard deviation from the mean. For the normal distribution, this accounts for about 68% of the set (dark blue) while two standard deviations from the mean (medium and dark blue) account for about 95% and three standard deviations (light, medium, and dark blue) account for about 99.7%.

About 68% of values drawn from a normal distribution are within one standard deviation σ > 0 away from the mean μ; about 95% of the values are within two standard deviations and about 99.7% lie within three standard deviations. This is known as the "68-95-99.7 rule" or the "empirical rule."

To be more precise, the area under the bell curve between μ − nσ and μ + nσ in terms of the cumulative normal distribution function is given by

\begin{align}&\Phi_{\mu,\sigma^2}(\mu+n\sigma)-\Phi_{\mu,\sigma^2}(\mu-n\sigma)\\
&=\Phi(n)-\Phi(-n)=2\Phi(n)-1=\mathrm{erf}\bigl(n/\sqrt{2}\,\bigr),\end{align}

where erf is the error function. To 12 decimal places, the values for the 1-, 2-, up to 6-sigma points are:

 n\,  \mathrm{erf}\bigl(n/\sqrt{2}\,\bigr)\,
1  0.682689492137 
2 0.954499736104
3 0.997300203937
4 0.999936657516
5 0.999999426697
6 0.999999998027

The next table gives the reverse relation of sigma multiples corresponding to a few often used values for the area under the bell curve. These values are useful to determine (asymptotic) confidence intervals of the specified levels based on normally distributed (or asymptotically normal) estimators:

 \mathrm{erf}\bigl(n/\sqrt{2}\,\bigr)  n\, 
0.80  1.28155 
0.90 1.64485
0.95 1.95996
0.98 2.32635
0.99 2.57583
0.995 2.80703
0.998 3.09023
0.999 3.29052
0.9999 3.8906
0.99999 4.4172

where the value on the left of the table is the proportion of values that will fall within a given interval and n is a multiple of the standard deviation that specifies the width of the interval.

[ Exponential family form

The Normal distribution is a two-parameter exponential family form with natural parameters μ and 1/σ2, and natural statistics X and X2. The canonical form has parameters {\mu \over \sigma^2} and {1 \over \sigma^2} and sufficient statistics \sum  x and -{1 \over 2} \sum  x^2 .

[ Complex Gaussian process

Consider complex Gaussian random variable,


Z=X+iY\,

where X and Y are real and independent Gaussian variables with equal variances \sigma_r^2. The pdf of the joint variables is then


\frac{1}{2\,\pi\,\sigma_r^2} e^{-(x^2+y^2)/(2 \sigma_r ^2)}

Because \sigma_Z =\sqrt{2}\sigma_r, the resulting pdf for the complex Gaussian variable Z is


\frac{1}{\pi\,\sigma_Z^2} e^{-|Z|^2\!/\sigma_Z^2}.

[ Related distributions

[ Descriptive and inferential statistics

[ Scores

Many scores are derived from the normal distribution, including percentile ranks ("percentiles" or "quantiles"), normal curve equivalents, stanines, z-scores, and T-scores. Additionally, a number of behavioral statistical procedures are based on thee assumption that scores are normally distributed; for example, t-tests and ANOVAs (see below). Bell curve grading assigns relative grades based on a normal distribution of scores.

GFDL.

Tutors Answer Your Questions about Probability-and-statistics (FREE)