## R23

It is not difficult to see that (10.7.2) is satisfied by r42 = 0 . 5,r43 = 0 . 5, and r23 = -0 . 5, which are not very high values.

Therefore, in models involving more than two explanatory variables, the simple or zero-order correlation will not provide an infallible guide to the presence of multicollinearity. Of course, if there are only two explanatory variables, the zero-order correlations will suffice.

3. Examination of partial correlations. Because of the problem just mentioned in relying on zero-order correlations, Farrar and Glauber have suggested that one should look at the partial correlation coefficients.19 Thus, in the regression of Y on X2, X3, and X4, a finding that R2 234 is very high but r?2 34, r23 24, and r24 23 are comparatively low may suggest that the variables X2, X3, and X4 are highly intercorrelated and that at least one of these variables is superfluous.

Although a study of the partial correlations may be useful, there is no guarantee that they will provide an infallible guide to multicollinearity, for it may happen that both R2 and all the partial correlations are sufficiently high. But more importantly, C. Robert Wichers has shown20 that the Farrar-Glauber partial correlation test is ineffective in that a given partial correlation may be compatible with different multicollinearity patterns. The Farrar-Glauber test has also been severely criticized by T. Krishna Kumar21 and John O'Hagan and Brendan McCabe.22

19D. E. Farrar and R. R. Glauber, "Multicollinearity in Regression Analysis: The Problem Revisited," Review of Economics and Statistics, vol. 49, 1967, pp. 92-107.

20"The Detection of Multicollinearity: A Comment," Review of Economics and Statistics, vol. 57, 1975, pp. 365-366.

21"Multicollinearity in Regression Analysis," Review of Economics and Statistics, vol. 57, 1975, pp. 366-368.

22"Tests for the Severity of Multicollinearity in Regression Analysis: A Comment," Review of Economics and Statistics, vol. 57, 1975, pp. 368-370.

CHAPTER TEN: MULTICOLLINEARITY 361

4. Auxiliary regressions. Since multicollinearity arises because one or more of the regressors are exact or approximately linear combinations of the other regressors, one way of finding out which X variable is related to other X variables is to regress each Xi on the remaining X variables and compute the corresponding R2, which we designate as R; each one of these regressions is called an auxiliary regression, auxiliary to the main regression of Y on the X's. Then, following the relationship between F and R2 established in (8.5.11), the variable

follows the F distribution with k — 2 and n — k + 1 df. In Eq. (10.7.3) n stands for the sample size, k stands for the number of explanatory variables including the intercept term, and R2x. x x ...x is the coefficient of determination in the regression of variable Xi on the remaining X variables.23

If the computed F exceeds the critical Fi at the chosen level of significance, it is taken to mean that the particular Xi is collinear with other X's; if it does not exceed the critical Fi, we say that it is not collinear with other X's, in which case we may retain that variable in the model. If Fi is statistically significant, we will still have to decide whether the particular Xi should be dropped from the model. This question will be taken up in Section 10.8. But this method is not without its drawbacks, for

... if the multicollinearity involves only a few variables so that the auxiliary regressions do not suffer from extensive multicollinearity, the estimated coefficients may reveal the nature of the linear dependence among the regressors. Unfortunately, if there are several complex linear associations, this curve fitting exercise may not prove to be of much value as it will be difficult to identify the separate interrelationships.24

Instead of formally testing all auxiliary R2 values, one may adopt Klien's rule of thumb, which suggests that multicollinearity may be a troublesome problem only if the R2 obtained from an auxiliary regression is greater than the overall R2, that is, that obtained from the regression of Y on all the re-gressors.25 Of course, like all other rules of thumb, this one should be used judiciously.

5. Eigenvalues and condition index. If you examine the SAS output of the Cobb-Douglas production function given in Appendix 7A.5 you will see

23For example, R2 can be obtained by regressing X2i as follows: X2i = a1 + a3X3i + a4X4i + ----+akXki + u.

24George G. Judge, R. Carter Hill, William E. Griffiths, Helmut Lutkepohl, and Tsoung-Chao Lee, Introduction to the Theory and Practice of Econometrics, John Wiley & Sons, New York, 1982, p. 621.

25Lawrence R. Klien, An Introduction to Econometrics, Prentice-Hall, Englewood Cliffs, N.J., 1962, p. 101.

362 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL

that SAS uses eigenvalues and the condition index to diagnose multicollinearity. We will not discuss eigenvalues here, for that would take us into topics in matrix algebra that are beyond the scope of this book. From these eigenvalues, however, we can derive what is known as the condition number k defined as k Maximum eigenvalue Minimum eigenvalue and the condition index (CI) defined as

Ci /Maximum eigenvalue ^ Minimum eigenvalue

Then we have this rule of thumb. If k is between 100 and 1000 there is moderate to strong multicollinearity and if it exceeds 1000 there is severe multicollinearity. Alternatively, if the CI (= Vk) is between 10 and 30, there is moderate to strong multicollinearity and if it exceeds 30 there is severe multicollinearity.

For the illustrative example, k = 3.0/0.00002422 or about 123,864, and CI = V123,864 = about 352; both k and the CI therefore suggest severe multicollinearity. Of course, k and CI can be calculated between the maximum eigenvalue and any other eigenvalue, as is done in the printout. (Note: The printout does not explicitly compute k, but that is simply the square of CI.) Incidentally, note that a low eigenvalue (in relation to the maximum eigenvalue) is generally an indication of near-linear dependencies in the data.

Some authors believe that the condition index is the best available multi-collinearity diagnostic. But this opinion is not shared widely. For us, then, the CI is just a rule of thumb, a bit more sophisticated perhaps. But for further details, the reader may consult the references.26

6. Tolerance and variance inflation factor. We have already introduced TOL and VIF. As R2, the coefficient of determination in the regression of regressor Xj on the remaining regressors in the model, increases toward unity, that is, as the collinearity of Xj with the other regressors increases, VIF also increases and in the limit it can be infinite.

Some authors therefore use the VIF as an indicator of multicollinearity. The larger the value of VIFj, the more "troublesome" or collinear the variable Xj. As a rule of thumb, if the VIF of a variable exceeds 10, which will happen if R^ exceeds 0.90, that variable is said be highly collinear.27

Of course, one could use TOLj as a measure of multicollinearity in view of its intimate connection with VIFj. The closer is TOLj to zero, the greater the degree of collinearity of that variable with the other regressors. On the

26See especially D. A. Belsley, E. Kuh, and R. E. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, John Wiley & Sons, New York, 1980, Chap. 3. However, this book is not for the beginner.

27See David G. Kleinbaum, Lawrence L. Kupper, and Keith E. Muller, Applied Regression Analysis and other Multivariate Methods, 2d ed., PWS-Kent, Boston, Mass., 1988, p. 210.

CHAPTER TEN: MULTICOLLINEARITY 363

other hand, the closer TOLj is to 1, the greater the evidence that Xj is not collinear with the other regressors.

VIF (or tolerance) as a measure of collinearity is not free of criticism. As (10.5.4) shows, var (fa) depends on three factors: a2,J2xj, and VIFj. A high VIF can be counterbalanced by a low a2 or a high To put it differently, a high VIF is neither necessary nor sufficient to get high variances and high standard errors. Therefore, high multicollinearity, as measured by a high VIF, may not necessarily cause high standard errors. In all this discussion, the terms high and low are used in a relative sense.

To conclude our discussion of detecting multicollinearity, we stress that the various methods we have discussed are essentially in the nature of "fishing expeditions," for we cannot tell which of these methods will work in any particular application. Alas, not much can be done about it, for multi-collinearity is specific to a given sample over which the researcher may not have much control, especially if the data are nonexperimental in nature— the usual fate of researchers in the social sciences.

Again as a parody of multicollinearity, Goldberger cites numerous ways of detecting micronumerosity, such as developing critical values of the sample size, n, such that micronumerosity is a problem only if the actual sample size, n, is smaller than n. The point of Goldberger's parody is to emphasize that small sample size and lack of variability in the explanatory variables may cause problems that are at least as serious as those due to multicollinearity.

What can be done if multicollinearity is serious? We have two choices: (1) do nothing or (2) follow some rules of thumb.

The "do nothing" school of thought is expressed by Blanchard as follows28:

When students run their first ordinary least squares (OLS) regression, the first problem that they usually encounter is that of multicollinearity. Many of them conclude that there is something wrong with OLS; some resort to new and often creative techniques to get around the problem. But, we tell them, this is wrong. Multi-collinearity is God's will, not a problem with OLS or statistical technique in general.

What Blanchard is saying is that multicollinearity is essentially a data deficiency problem (micronumerosity, again) and some times we have no choice over the data we have available for empirical analysis.

Also, it is not that all the coefficients in a regression model are statistically insignificant. Moreover, even if we cannot estimate one or more regression coefficients with greater precision, a linear combination of them (i.e., estimable function) can be estimated relatively efficiently. As we saw in