## Table 83 Anovatable For The Child Mortality Example

Source of variation SS df MSS

Due to regression 257,362.4 2 128,681.2

Due to residuals 106,315.6 61 1742.88

Total 363,678 63

CHAPTER EIGHT: MULTIPLE REGRESSION ANALYSIS: THE PROBLEM OF INFERENCE 257

We can generalize the preceding F-testing procedure as follows.

Testing the Overall Significance of a Multiple Regression: The F Test

Decision Rule. Given the k-variable regression model: Yi = fti + ft2 + ft3 ^ + ... + ftkXki + U To test the hypothesis

Ho: ft 2 = ft3 = ■■■ = ftk = 0 (i.e., all slope coefficients are simultaneously zero) versus

H1: Not all slope coefficients are simultaneously zero compute

If F > Fa(k — 1, n — k), reject H0; otherwise you do not reject it, where Fa (k — 1, n — k) is the critical F value at the a level of significance and (k — 1) numerator df and (n — k) denominator df. Alternatively, if the p value of F obtained from (8.5.7) is sufficiently low, one can reject H0.

Needless to say, in the three-variable case (Y and X2, X3) k is 3, in the four-variable case k is 4, and so on.

In passing, note that most regression packages routinely calculate the F value (given in the analysis of variance table) along with the usual regression output, such as the estimated coefficients, their standard errors, t values, etc. The null hypothesis for the t computation is usually assumed to be fti = 0.

Individual versus Joint Testing of Hypotheses. In Section 8.4 we discussed the test of significance of a single regression coefficient and in Section 8.5 we have discussed the joint or overall test of significance of the estimated regression (i.e., all slope coefficients are simultaneously equal to zero). We reiterate that these tests are different. Thus, on the basis of the t test or confidence interval (of Section 8.4) it is possible to accept the hypothesis that a particular slope coefficient, k, is zero, and yet reject the joint hypothesis that all slope coefficients are zero.

The lesson to be learned is that the joint "message'' of individual confidence intervals is no substitute for a joint confidence region [implied by the F test] in performing joint tests of hypotheses and making joint confidence statements.8

258 PART ONE: SINGLE-EQUATION REGRESSION MODELS

### An Important Relationship between R2 and F

There is an intimate relationship between the coefficient of determination R2 and the F test used in the analysis of variance. Assuming the normal distribution for the disturbances ui and the null hypothesis that fa = fa = 0, we have seen that

is distributed as the F distribution with 2 and n — 3 df.

More generally, in the k-variable case (including intercept), if we assume that the disturbances are normally distributed and that the null hypothesis is

follows the F distribution with k — 1 and n — k df. (Note: The total number of parameters to be estimated is k, of which one is the intercept term.) Let us manipulate (8.5.10) as follows:

 n - k ESS F= 