## P n2k2

where n = total number of observations, d = Durbin-Watson d, and k = number of coefficients (including the intercept) to be estimated.

Show that for large n, this estimate of p is equal to the one obtained by the simpler formula (1 — d/2).

12.7. Estimating p: The Hildreth-Lu scanning or search procedure.^ Since in the first-order autoregressive scheme ut = p ut—1 + St p is expected to lie between —1 and +1, Hildreth and Lu suggest a systematic "scanning" or search procedure to locate it. They recommend selecting p between —1 and +1 using, say, 0.1 unit intervals and transforming the data by the generalized difference equation (12.6.5). Thus, one may choose p from —0.9, —0.8, ..., 0.8, 0.9. For each chosen p we run the generalized difference equation and obtain the associated RSS: ut2 . Hildreth and Lu suggest choosing that p which minimizes the RSS (hence maximizing the R2). If further refinement is needed, they suggest using smaller unit intervals, say, 0.01 unit such as —0.99, —0.98, ..., 0.90, 0.91, and so on.

a. What are the advantages of the Hildreth-Lu procedure?

b. How does one know that the p value ultimately chosen to transform the data will, in fact, guarantee minimum u2 ?

12.8. Estimating p: The Cochrane-Orcutt (C-O) iterative procedure.*

As an illustration of this procedure, consider the two-variable model:

The table may be found in Johnston, op. cit., 3d ed., p. 559.

^G. Hildreth and J. Y. Lu, "Demand Relations with Autocorrelated Disturbances," Michigan State University, Agricultural Experiment Station, Tech. Bull. 276, November 1960.

*D. Cochrane and G. H. Orcutt, "Applications of Least-Squares Regressions to Relationships Containing Autocorrelated Error Terms," Journal of the American Statistical Association, vol. 44, 1949, pp 32-61.

CHAPTER TWELVE: AUTOCORRELATION 493

and the AR(1) scheme ut = p u.t—1 + et, —1 <p< 1 (2)

Cochrane and Orcutt then recommend the following steps to estimate p.

1. Estimate (1) by the usual OLS routine and obtain the residuals, ut. Incidentally, note that you can have more than one X variable in the model.

2. Using the residuals obtained in step 1, run the following regression:

which is the empirical counterpart of (2).*

3. Using p obtained in (3), estimate the generalized difference equation (12.9.6).

4. Since a priori it is not known if the p obtained from (3) is the best estimate of p, substitute the values of ¡* and ¡** obtained in step (3) in the original regression (1) and obtain the new residuals, say, ut* as u* = Yt — ¡1 — ¡2* Xt (4)

which can be easily computed since Yt, Xt, ¡*, and ¡2 are all known.

5. Now estimate the following regression:

which is similar to (3) and thus provides the second round estimate of p

Since we do not know whether this second-round estimate of p is the best estimate of the true p, we go into the third-round estimate, and so on. That is why the C-O procedure is called an iterative procedure. But how long should we go on this (merry) go-round? The general recommendation is to stop carrying out iterations when the successive estimates of p differ by a small amount, say, by less than 0.01 or 0.005. In our wages-productivity example, it took about seven iterations before we stopped.

a. Using software of your choice, verify that the estimated p value of 0.8919 for Eq. (12.9.16) and 0.9610 for Eq. (12.9.17) are approximately correct.

b. Does the rho value obtained by the C-O procedure guarantee the global minimum or just the local minimum?

c. Optional: Apply the C-O method to the log-linear wages-productivity regression given in (12.5.2), retaining the first observation as well as dropping it. Compare your results with those of regression (12.5.1).

12.9. Estimating p: The Cochrane-Orcutt two-step procedure. This is a shortened version of the C-O iterative procedure. In step 1, we estimate p from the first iteration, that is from Eq. (3) in the preceding exercise,

Note that p = J] ûtût-\/^2u2 (why?). Although biased, p is a consistent estimator of the true p.

494 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL

and in step 2 we use that estimate of p to run the generalized difference equation, as in Eq. (4) in the preceding exercise. Sometimes in practice, this two-step method gives results quite similar to those obtained from the more elaborate C-O iterative procedure.

Apply the C-O two-step method to the illustrative wages-productivity regression given in the text and compare your results with those obtained from the iterative method. Pay special attention to the first observation in the transformation.

12.10. Estimating p: Durbin's two-step method.* To explain this method, we can write the generalized difference equation (12.9.5) equivalently as follows:

Durbin suggests the following two-step procedure to estimate p. First, treat (1) as a multiple regression model, regressing Yt on Xt, Xt—1, and Yt—1 and treat the estimated value of the regression coefficient of Yt—1 ( = p) as an estimate of p. Second, having obtained p, use it to estimate the parameters of the generalized difference equation (12.9.5) or its equivalent (12.9.6).

a. Apply the Durbin two-step method to the wages-productivity example discussed in the text and compare your results with those obtained from the Cochrane-Orcutt iterative procedure and the C-O two-step method. And comment on the "quality" of your results.

b. If you examine Eq. (1) above, you will observe that the coefficient of Xt—1 ( = —pPi) is equal to minus 1 times the product of the coefficient of Xt ( = p2) and the coefficient of Yt—1 ( = p). How would you test that coefficients obey the preceding restriction?

12.11. In measuring returns to scale in electricity supply, Nerlove used cross-sectional data of 145 privately owned utilities in the United States for the period 1955 and regressed the log of total cost on the logs of output, wage rate, price of capital, and price of fuel. He found that the residuals estimated from this regression exhibited "serial'' correlation, as judged by the Durbin-Watson d. To seek a remedy, he plotted the estimated residuals on the log of output and obtained Figure 12.11.

a. What does Figure 12.11 show?

b. How can you get rid of "serial'' correlation in the preceding situation?

12.12. The residuals from a regression when plotted against time gave the scattergram in Figure 12.12. The encircled "extreme" residual is called an outlier. An outlier is an observation whose value exceeds the values of other observations in the sample by a large amount, perhaps three or four standard deviations away from the mean value of all the observations.

a. What are the reasons for the existence of the outlier(s)?

b. If there is an outlier(s), should that observation(s) be discarded and the regression run on the remaining observations?

c. Is the Durbin-Watson d applicable in the presence of the outlier(s)?

J. Durbin, "Estimation of Parameters in Time-Series Regression Models," Journal of the Royal Statistical Society, series B, vol. 22, 1960, p. 139-153. FIGURE 12.11 Regression residuals from the Nerlove study. (Adapted from Marc Nerlove, "Return to Scale in Electric Supply," in Carl F. Christ et al., Measurement in Economics, Stanford University Press, Stanford, Calif., 1963.) 