CHAPTER TEN: MULTICOLLINEARITY 373
6 auxiliary regressions, again suggesting that indeed the Longley data are plagued by the multicollinearity problem. Incidentally, applying the F test given in (10.7.3) the reader should verify that the R2 values given in the preceding tables are all statistically significantly different from zero.
We noted earlier that the OLS estimators and their standard errors are sensitive to small changes in the data. In exercise 10.32 the reader is asked to rerun the regression of Y on all the six X variables but drop the last data observations, that is, run the regression for the period 1947-1961. You will see how the regression results change by dropping just a single year's observations.
Now that we have established that we have the multicollinearity problem, what "remedial" actions can we take? Let us reconsider our original model. First of all, we could express GNP not in nominal terms, but in real terms, which we can do by dividing nominal GNP by the implicit price deflator. Second, since noninstitutional population over 14 years of age grows over time because of natural population growth, it will be highly correlated with time, the variable X6 in our model. Therefore, instead of keeping both these variables, we will keep the variable X5 and drop X6. Third, there is no compelling reason to include X3, the number of people unemployed; perhaps the unemployment rate would have been a better measure of labor market conditions. But we have no data on the latter. So, we will drop the variable X3. Making these changes, we obtain the following regression results (RGNP = real GNP)45:
Dependent Variable: Y Sample: 1947-1962
Variable Coefficient Std. Error t-Statistic Prob.
C 65720.37 10624.81 6.185558 0.0000
RGNP 9.736496 1.791552 5.434671 0.0002
X4 -0.687966 0.322238 -2.134965 0.0541
X5 -0.299537 0.141761 -2.112965 0.0562
Adjusted R-squared 0.976755
S.E. of regression 535.4492
Sum squared resid 3440470.
Log likelihood -120.9313
Durbin-Watson stat 1.654069
Mean dependent var 65317.00
S.D. dependent var 3511.968
Akaike info criterion 15.61641
Schwarz criterion 15.80955
Although the R2 value has declined slightly compared with the original R2, it is still very high. Now all the estimated coefficients are significant and the signs of the coefficients make economic sense.
45The coefficient of correlation between X5 and X6 is about 0.9939, a very high correlation indeed.
374 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL
We leave it for the reader to devise alternative models and see how the results change. Also keep in mind the warning sounded earlier about using the ratio method of transforming the data to alleviate the problem of collinearity. We will revisit this question in Chapter 11.
1. One of the assumptions of the classical linear regression model is that there is no multicollinearity among the explanatory variables, the X's. Broadly interpreted, multicollinearity refers to the situation where there is either an exact or approximately exact linear relationship among the X variables.
2. The consequences of multicollinearity are as follows: If there is perfect collinearity among the X's, their regression coefficients are indeterminate and their standard errors are not defined. If collinearity is high but not perfect, estimation of regression coefficients is possible but their standard errors tend to be large. As a result, the population values of the coefficients cannot be estimated precisely. However, if the objective is to estimate linear combinations of these coefficients, the estimable functions, this can be done even in the presence of perfect multicollinearity.
3. Although there are no sure methods of detecting collinearity, there are several indicators of it, which are as follows:
(a) The clearest sign of multicollinearity is when R2 is very high but none of the regression coefficients is statistically significant on the basis of the conventional t test. This case is, of course, extreme.
(b) In models involving just two explanatory variables, a fairly good idea of collinearity can be obtained by examining the zero-order, or simple, correlation coefficient between the two variables. If this correlation is high, multicollinearity is generally the culprit.
(c) However, the zero-order correlation coefficients can be misleading in models involving more than two X variables since it is possible to have low zero-order correlations and yet find high multi-collinearity. In situations like these, one may need to examine the partial correlation coefficients.
(d) If R2 is high but the partial correlations are low, multicollinearity is a possibility. Here one or more variables may be superfluous. But if R2 is high and the partial correlations are also high, multi-collinearity may not be readily detectable. Also, as pointed out by C. Robert, Krishna Kumar, John O'Hagan, and Brendan McCabe, there are some statistical problems with the partial correlation test suggested by Farrar and Glauber.
(e) Therefore, one may regress each of the Xi variables on the remaining X variables in the model and find out the corresponding coefficients of determination R2. A high R2 would suggest that Xi
Was this article helpful?