## Remedial Measures

Do Nothing

28Blanchard, O. J., Comment, Journal of Business and Economic Statistics, vol. 5, 1967, pp. 449-451. The quote is reproduced from Peter Kennedy, A Guide to Econometrics, 4th ed., MIT Press, Cambridge, Mass., 1998, p. 190.

364 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL

(10.2.3), we can estimate a uniquely, even if we cannot estimate its two components given there individually. Sometimes this is the best we can do with a given set of data.29

One can try the following rules of thumb to address the problem of multicollinearity, the success depending on the severity of the collinearity problem.

1. A priori information. Suppose we consider the model where Y = consumption, X2 = income, and X3 = wealth. As noted before, income and wealth variables tend to be highly collinear. But suppose a priori we believe that 03 = 0.1002; that is, the rate of change of consumption with respect to wealth is one-tenth the corresponding rate with respect to income. We can then run the following regression:

where Xi = X2i + 0.1 X3i. Once we obtain 02, we can estimate j§3 from the postulated relationship between 2 and 3.

How does one obtain a priori information? It could come from previous empirical work in which the collinearity problem happens to be less serious or from the relevant theory underlying the field of study. For example, in the Cobb-Douglas-type production function (7.9.1), if one expects constant returns to scale to prevail, then (02 + 03) = 1, in which case we could run the regression (8.7.14), regressing the output-labor ratio on the capital-labor ratio. If there is collinearity between labor and capital, as generally is the case in most sample data, such a transformation may reduce or eliminate the collinearity problem. But a warning is in order here regarding imposing such a priori restrictions, ". . . since in general we will want to test economic theory's a priori predictions rather than simply impose them on data for which they may not be true."30 However, we know from Section 8.7 how to test for the validity of such restrictions explicitly.

2. Combining cross-sectional and time series data. A variant of the extraneous or a priori information technique is the combination of cross-sectional and time-series data, known as pooling the data. Suppose we want

29For an interesting discussion on this, see Conlisk, J., "When Collinearity is Desirable," Western Economic Journal, vol. 9, 1971, pp. 393-407.

30Mark B. Stewart and Kenneth F. Wallis, Introductory Econometrics, 2d ed., John Wiley & Sons, A Halstead Press Book, New York, 1981, p. 154.

Rule-of-Thumb Procedures

Yi = ßi + ß2 X2i + 0.1üß2 X3i + Ui = ßi + ß2 Xi + Ui

CHAPTER TEN: MULTICOLLINEARITY 365

to study the demand for automobiles in the United States and assume we have time series data on the number of cars sold, average price of the car, and consumer income. Suppose also that ln Yt = ft + ft ln Pt + ft ln It + u where Y = number of cars sold, P = average price, I = income, and t = time. Out objective is to estimate the price elasticity ft and income elasticity ft.

In time series data the price and income variables generally tend to be highly collinear. Therefore, if we run the preceding regression, we shall be faced with the usual multicollinearity problem. A way out of this has been suggested by Tobin.31 He says that if we have cross-sectional data (for example, data generated by consumer panels, or budget studies conducted by various private and governmental agencies), we can obtain a fairly reliable estimate of the income elasticity ft because in such data, which are at a point in time, the prices do not vary much. Let the cross-sectionally estimated income elasticity be ft. Using this estimate, we may write the preceding time series regression as

Y* = ft + ft ln Pt + ut where Y~ = ln Y — ft ln I, that is, Y~ represents that value of Y after removing from it the effect of income. We can now obtain an estimate of the price elasticity ft from the preceding regression.

Although it is an appealing technique, pooling the time series and cross-sectional data in the manner just suggested may create problems of interpretation, because we are assuming implicitly that the cross-sectionally estimated income elasticity is the same thing as that which would be obtained from a pure time series analysis.32 Nonetheless, the technique has been used in many applications and is worthy of consideration in situations where the cross-sectional estimates do not vary substantially from one cross section to another. An example of this technique is provided in exercise 10.26.

3. Dropping a variable(s) and specification bias. When faced with severe multicollinearity, one of the "simplest" things to do is to drop one of the collinear variables. Thus, in our consumption-income-wealth illustration, when we drop the wealth variable, we obtain regression (10.6.4), which shows that, whereas in the original model the income variable was statistically insignificant, it is now "highly" significant.

But in dropping a variable from the model we may be committing a specification bias or specification error. Specification bias arises from

31J. Tobin, "A Statistical Demand Function for Food in the U.S.A.," Journal of the Royal Statistical Society, Ser. A, 1950, pp. 113-141.

32For a thorough discussion and application of the pooling technique, see Edwin Kuh, Capital Stock Growth: A Micro-Econometric Approach, North-Holland Publishing Company, Amsterdam, 1963, Chaps. 5 and 6.

366 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL

incorrect specification of the model used in the analysis. Thus, if economic theory says that income and wealth should both be included in the model explaining the consumption expenditure, dropping the wealth variable would constitute specification bias.

Although we will discuss the topic of specification bias in Chapter 13, we caught a glimpse of it in Section 7.7. If, for example, the true model is

Yi = 01 + 02 X2i + 03 X3i + Ui but we mistakenly fit the model

then it can be shown that (see Appendix 13A.1)

where b32 = slope coefficient in the regression of X3 on X2. Therefore, it is obvious from (10.8.2) that bi2 will be a biased estimate of 02 as long as b32 is different from zero (it is assumed that 03 is different from zero; otherwise there is no sense in including X3 in the original model).33 Of course, if b32 is zero, we have no multicollinearity problem to begin with. It is also clear from (10.8.2) that if both b32 and 03 are positive (or both are negative), E(b12) will be greater than 02; hence, on the average b12 will overestimate 02, leading to a positive bias. Similarly, if the product b3203 is negative, on the average b12 will underestimate 2, leading to a negative bias.

From the preceding discussion it is clear that dropping a variable from the model to alleviate the problem of multicollinearity may lead to the specification bias. Hence the remedy may be worse than the disease in some situations because, whereas multicollinearity may prevent precise estimation of the parameters of the model, omitting a variable may seriously mislead us as to the true values of the parameters. Recall that OLS estimators are BLUE despite near collinearity.

4. Transformation of variables. Suppose we have time series data on consumption expenditure, income, and wealth. One reason for high multi-collinearity between income and wealth in such data is that over time both the variables tend to move in the same direction. One way of minimizing this dependence is to proceed as follows.

If the relation

33Note further that if b32 does not approach zero as the sample size is increased indefinitely, then b12 will be not only biased but also inconsistent.

CHAPTER TEN: MULTICOLLINEARITY 367

holds at time t, it must also hold at time t - 1 because the origin of time is arbitrary anyway. Therefore, we have

Yt-1 = ¡1 + ¡2 X2,t-1 + ¡3 X3,t-1 + ut-1 (10.8.4)

Yt - Yt-1 = ¡¡2(X2t - X2,t-1) + ¡3(X3t - X3,t-1) + vt (10.8.5)

where vt = ut - ut-1. Equation (10.8.5) is known as the first difference form because we run the regression, not on the original variables, but on the differences of successive values of the variables.

The first difference regression model often reduces the severity of multi-collinearity because, although the levels of X2 and X3 may be highly correlated, there is no a priori reason to believe that their differences will also be highly correlated.

As we shall see in the chapters on time series econometrics, an incidental advantage of the first-difference transformation is that it may make a nonstationary time series stationary. In those chapters we will see the importance of stationary time series. As noted in Chapter 1, loosely speaking, a time series, say, Yt, is stationary if its mean and variance do not change systematically over time.

Another commonly used transformation in practice is the ratio transformation. Consider the model:

where Y is consumption expenditure in real dollars, X2 is GDP, and X3 is total population. Since GDP and population grow over time, they are likely to be correlated. One "solution" to this problem is to express the model on a per capita basis, that is, by dividing (10.8.4) by X3, to obtain:

Such a transformation may reduce collinearity in the original variables.

But the first-difference or ratio transformations are not without problems. For instance, the error term vt in (10.8.5) may not satisfy one of the assumptions of the classical linear regression model, namely, that the disturbances are serially uncorrelated. As we will see in Chapter 12, if the original disturbance term ut is serially uncorrelated, the error term vt obtained previously will in most cases be serially correlated. Therefore, the remedy may be worse than the disease. Moreover, there is a loss of one observation due to the differencing procedure, and therefore the degrees of freedom are

368 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL

reduced by one. In a small sample, this could be a factor one would wish at least to take into consideration. Furthermore, the first-differencing procedure may not be appropriate in cross-sectional data where there is no logical ordering of the observations.

Similarly, in the ratio model (10.8.7), the error term will be heteroscedastic, if the original error term ut is homoscedastic, as we shall see in Chapter 11. Again, the remedy may be worse than the disease of collinearity.

In short, one should be careful in using the first difference or ratio method of transforming the data to resolve the problem of multicollinearity.

5. Additional or new data. Since multicollinearity is a sample feature, it is possible that in another sample involving the same variables collinear-ity may not be so serious as in the first sample. Sometimes simply increasing the size of the sample (if possible) may attenuate the collinearity problem. For example, in the three-variable model we saw that var(fa)=