## The Normal Distribution

In this section, we introduce a continuous distribution that plays a central role in a very large body of statistical analysis. For example, suppose that a big group of students takes a test. A large proportion of their scores are likely to be concentrated about the mean, and the numbers of scores in ranges of a fixed width are likely to "tail off' away from the mean. If the average score on the test is 60, we would expect to find, for instance, more students with scores in the range 55-65 than in the range 85-95. These considerations suggest a probability density function that peaks at the mean and tails off at its extremities. One distribution with these properties is the normal distribution, whose probability density function is shown in Figure 5.8. As can be seen, this density function is bell-shaped. FIGURE 5.8 Probability density function for a normal distribution

Probability Density Function of the Normal Distribution

If the random variable X has probability density function fjx) = ,. ' e u ^^ for -oo < _v < oo V 2TTtr where /x and cr2 are any number such that — °° < ¡x< °° and 0 < tr2 < °° and where e and 77 are physical constants, e = 2.71828 .. . and tt = 3.14159 ..., then X is said to follow a normal distribution.

It can be seen from the definition that there is not a single normal distribution but a whole family of distributions, resulting from different specifications of fx and cr2. These two parameters have very convenient interpretations.

Some Properties of the Normal Distribution

Suppose that the random variable X follows a normal distribution with parameters /x and a2. The following properties hold:

(i) The mean of the random variable is jll; that is

(ii) The variance of the random variable is a2\ that is

(iii) The shape of the probability density function is a symmetric bell-shaped curve (see Figure 5.8) centered on the mean jx.

It follows from these properties that given the mean and variance of a normal random variable, an individual member of the family of normal distributions is specified. This allows use of a convenient notation.

Notation

If the random variable X follows a normal distribution with mean ¡x and variance cr2, we write

Now, the mean of any distribution provides a measure of central location, while the variance gives a measure of spread or dispersion about the mean. Thus, the values taken by the parameters ¡x and a2 have different effects on the probability density function of a normal random variable. Figure 5.9(a) shows probability density functions for two normal distributions with a common variance but different means. It can be seen that increasing the mean while holding the variance fixed shifts the density function but does not alter its shape. In Figure 5.9(b), the two density functions are of

FIGURE 5.9 Effects of fx and a2 on the probability density function of a normal random variable

FIGURE 5.9 Effects of fx and a2 on the probability density function of a normal random variable (a) Probability density functions (or two normal distributions with means 5 a nd 6; each distribution has variance 1 (b) Probability density functions for normal distributions with variances 1/4 and 1; each distribution has mean 10

normal random variables with a common mean but different variances. Each is symmetric about the common mean, but that with the larger variance is more disperse.

An extremely important practical question concerns the determination of probabilities from a specified normal distribution. As a first step in determining probabilities, we introduce the cumulative distribution function.

Cumulative Distribution Function of the Normal Distribution

Suppose that X is a normal random variable with mean ¡x and variance a2; that is, X — N{(x, a2). Then the cumulative distribution function /* yU:i) is

This is the area under the probability density function to the left of as illustrated in Figure 5.10. As for any proper density function, the total area under the curve is I ; that is

There is no simple algebraic expression for calculating the cumulative distribution function of a normally distributed random variable.7 The general shape of the cumulative distribution function is shown in Figure 5.11.

We have already seen that for any continuous random variable, probabilities can be expressed in terms of the cumulative distribution function.

Range Probabilities for Normal Random Variables

Let X be a normal random variable with cumulative distribution function Fx(x), and let a and b be two possible values of X, with a < b. Then

FIGURE 5.10 The shaded area is the probability that

FIGURE 5.10 The shaded area is the probability that 7 That is to say that the integral does not have a simple algebraic form.

The probability is the area under the corresponding probability density function between a and b, as illustrated in Figure 5.12.

Any required probability can be obtained from the cumulative distribution function. However, a crucial difficulty remains because there does not exist a convenient formula for determining the cumulative distribution function. In principle, for any specific normal distribution, probabilities could be obtained by numerical methods using an electronic computer. However, it would be enormously tedious if we had to carry out such an operation for every normal distribution we encountered. Fortunately, probabilities for any normal distribution can always be expressed in terms of probabilities for a single normal distribution for which the cumulative distribution function has been evaluated and tabulated. We now introduce the particular distribution that is used for this purpose.

The Standard Normal Distribution

Let Z be a normal random variable with mean 0 and variance 1; that is

FIGURE 5.11 Cumulative distribution function for a normal random variable

FIGURE 5.11 Cumulative distribution function for a normal random variable FIGURE 5.12 The shaded area is the probability that X lies between a and b for a normal random variable Then Z is said to follow the standard normal distribution.

If the cumulative distribution function of this random variable is denoted F2{z), and a* and b* are two numbers with a* < b*, then

The cumulative distribution function of the standard normal distribution is tabulated in Table 3 in the Appendix. This table gives values of

for nonnegative values of z. For example

Thus, the probability is .8944 that the standard normal random variable takes a value less than 1.25. Values of the cumulative distribution function for negative values of z can be inferred from the symmetry of the probability density function. Let z0 be any positive number, and suppose that we require

As illustrated in Figure 5.13, because the density function of the standard normal random variable is symmetric about its mean, 0, the area under the curve to the left of —z0 is the same as the area under the curve to the right of z0; that is

Moreover, since the total area under the curve is 1

Hence, it follows that

FIGURE 5.13 Probability density function for the standard normal random variable Z; the shaded areas, which are equal, show the probability that Z does not exceed

FIGURE 5.13 Probability density function for the standard normal random variable Z; the shaded areas, which are equal, show the probability that Z does not exceed g ^ probability is

### For example

F(Z< -1.25) - Fz(-1.25) = 1 ~Fz( 1.25) = 1 - .8944 = .1056 If Z is a standard normal random variable, find P( — .50 < Z < .75). The required

= Fz(.75) - [1 - FZ(.50)I Then, using Table 3 of the Appendix, we obtain

P(~.50 < Z < .75) - .7734 - (1 - .6915) - .4649

We now show how probabilities for any normal random variable can be expressed in terms of those for the standard normal random variable. Let the random variable X be normally distributed with mean fx and variance a2. We saw in Section 5.3 that subtracting the mean and dividing by the standard deviation yields a random variable Z that has mean 0 and variance 1. It can also be shown that if X is normally distributed, so is Z. Hence, Z has a standard normal distribution. Suppose, then, that we require the probability that X lies between the numbers a and b. This is equivalent to (X — (x)lcr lying between (a — /¿)/cr and (b — fx)/(r, so that the probability of interest is fa — ix X — ix b — ix

Finding Range Probabilities for Normal Random Variables

Let X be a normal random variable with mean ¡x and variance a1. Then the random variable Z = (X — jLt)/frhas a standard normal distribution; that is, Z ~ MO, 1). It follows that if a and b are any numbers with a < b, then

— Fz cr a b — jx\ ^ ( a — ¡x a where Z is the standard normal random variable and Fz(z) denotes its cumulative distribution function.

The result is illustrated in Figure 5.14. Part (a) of the figure shows the probability density function of a normal random variable X with mean ¡x = 3 and standard deviation a = 2. The shaded area shows the probability that X lies between 4 and 6. This is the same as the probability that a standard normal random variable lies between (4 — /x)fa and (6 — ¡x)/a, that is, between .5 and 1.5. This probability is the shaded area under the standard normal curve in Figure 5.14(b).

fx MA (a) Probability density function for normal random variable Xwith mean 3 and standard deviation 2; shaded area is probability that X lies between 4 and 6 is equal lo shaded area in part (a)

FIGURE 5.14 Finding range probabilities for normal random variables is equal lo shaded area in part (a)

FIGURE 5.14 Finding range probabilities for normal random variables 