## Info

Note: D2 = 1 for states in the Northeast and North Central; 0 otherwise.

D3 = 1 for states in the South; 0 otherwise. Source: National Educational Association, as reported by Albuquerque Tribune, Nov. 7, 1986.

statistically different from one another? There are various statistical techniques to compare two or more mean values, which generally go by the name of analysis of variance.5 But the same objective can be accomplished within the framework of regression analysis. To see this, consider the following model:

where Y, = (average) salary of public school teacher in state i D2, = 1 if the state is in the Northeast or North Central = 0 otherwise (i.e., in other regions of the country) D3, = 1 if the state is in the South

= 0 otherwise (i.e., in other regions of the country)

(Continued)

5For an applied treatment, see John Fox, Applied Regression Analysis, Linear Models, and Related Methods, Sage Publications, 1997, Chap. 8.

300 PART ONE: SINGLE-EQUATION REGRESSION MODELS

EXAMPLE 9.1 (Continued)

Note that (9.2.1) is like any multiple regression model considered previously, except that, instead of quantitative regressors, we have only qualitative, or dummy, regressors, taking the value of 1 if the observation belongs to a particular category and 0 if it does not belong to that category or group. Hereafter, we shall designate all dummy variables by the letter D. Table 9.1 shows the dummy variables thus constructed.

What does the model (9.2.1) tell us? Assuming that the error term satisfies the usual OLS assumptions, on taking expectation of (9.2.1) on both sides, we obtain:

Mean salary of public school teachers in the Northeast and North Central:

E(Yi | Dz, = 1, D3i = 0) = ft + & Mean salary of public school teachers in the South:

You might wonder how we find out the mean salary of teachers in the West. If you guessed that this is equal to ft, you would be absolutely right, for Mean salary of public school teachers in the West:

In other words, the mean salary of public school teachers in the West is given by the intercept, ft, in the multiple regression (9.2.1), and the "slope" coefficients fS2 and p3 tell by how much the mean salaries of teachers in the Northeast and North Central and in the South differ from the mean salary of teachers in the West. But how do we know if these differences are statistically significant? Before we answer this question, let us present the results based on the regression (9.2.1). Using the data given in Table 9.1, we obtain the following results:

Y = 26,158.62 - 1734.473D2, - 3264.615D3/ se = (1128.523) (1435.953) (1499.615)

where * indicates the p values.

As these regression results show, the mean salary of teachers in the West is about \$26,158, that of teachers in the Northeast and North Central is lower by about \$1734, and that of teachers in the South is lower by about \$3265. The actual mean salaries in the last two regions can be easily obtained by adding these differential salaries to the mean salary of teachers in the West, as shown in Eqs. (9.2.3) and (9.2.4). Doing this, we will find that the mean salaries in the latter two regions are about \$24,424 and \$22,894.

But how do we know that these mean salaries are statistically different from the mean salary of teachers in the West, the comparison category? That is easy enough. All we have to do is to find out if each of the "slope" coefficients in (9.2.5) is statistically significant. As can be seen from this regression, the estimated slope coefficient for Northeast and North Central is not statistically significant, as its p value is 23 percent, whereas that of the South is statistically significant, as the p value is only about 3.5 percent. Therefore, the overall conclusion is that statistically the mean salaries of public school teachers in the West and the Northeast and North Central are about the same but the mean salary of teachers in the South is statistically significantly lower by about \$3265. Diagrammatically, the situation is shown in Figure 9.1.

A caution is in order in interpreting these differences. The dummy variables will simply point out the differences, if they exist, but they do not suggest the reasons for the differences.

CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 301

EXAMPLE 9.1 (Continued)

West Northeast and South

North Central 