## Detection Of Multicollinearity

Having studied the nature and consequences of multicollinearity, the natural question is: How does one know that collinearity is present in any given situation, especially in models involving more than two explanatory variables? Here it is useful to bear in mind Kmenta's warning:

1. Multicollinearity is a question of degree and not of kind. The meaningful distinction is not between the presence and the absence of multicollinearity, but between its various degrees.

2. Since multicollinearity refers to the condition of the explanatory variables that are assumed to be nonstochastic, it is a feature of the sample and not of the population.

Therefore, we do not "test for multicollinearity" but can, if we wish, measure its degree in any particular sample.17

Since multicollinearity is essentially a sample phenomenon, arising out of the largely nonexperimental data collected in most social sciences, we do not have one unique method of detecting it or measuring its strength. What we have are some rules of thumb, some informal and some formal, but rules of thumb all the same. We now consider some of these rules.

1. High R2 but few significant t ratios. As noted, this is the "classic" symptom of multicollinearity. If R2 is high, say, in excess of 0.8, the F test in most cases will reject the hypothesis that the partial slope coefficients are simultaneously equal to zero, but the individual t tests will show that none or very few of the partial slope coefficients are statistically different from zero. This fact was clearly demonstrated by our consumption-income-wealth example.

Although this diagnostic is sensible, its disadvantage is that "it is too strong in the sense that multicollinearity is considered as harmful only when all of the influences of the explanatory variables on Y cannot be disentangled."18

2. High pair-wise correlations among regressors. Another suggested rule of thumb is that if the pair-wise or zero-order correlation coefficient between two regressors is high, say, in excess of 0.8, then multicollinearity is a serious problem. The problem with this criterion is that, although high zero-order correlations may suggest collinearity, it is not necessary that they be high to have collinearity in any specific case. To put the matter somewhat technically, high zero-order correlations are a sufficient but not a necessary condition for the existence of multicollinearity because it can exist even though the zero-order or simple correlations are comparatively low (say, less than 0.50). To see this relationship, suppose we have a four-variable model:

17Jan Kmenta, Elements of Econometrics, 2d ed., Macmillan, New York, 1986, p. 431.

360 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL

and suppose that

X4i = A 2 X2i + A3 X3i where k2 and k3 are constants, not both zero. Obviously, X4 is an exact linear combination of X2 and X3, giving R42.23 = 1, the coefficient of determination in the regression of X4 on X2 and X3.

Now recalling the formula (7.11.5) from Chapter 7, we can write

But since R2 23 = 1 because of perfect collinearity, we obtain 