Model Misspecification Versus Pure Autocorrelation
476 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL
then we need to include the time or trend, t, variable in the model to see the relationship between wages and productivity net of the trends in the two variables.
To test this, we included the trend variable in (12.5.1) and obtained the following results
The interpretation of this model is straightforward: Over time, the index of real wages has been decreasing by about 0.90 units per year. After allowing for this, if the productivity index went up by a unit, on average, the real wage index went up by about 1.30 units, although this number is not statistically different from one (why?). What is interesting to note is that even allowing for the trend variable, the d value is still very low, suggesting that (12.8.1) suffers from pure autocorrelation and not necessarily specification error.
How do we know that (12.8.1) is the correct specification? To test this, we regress Y on X and X2 to test for the possibility that the real wage index may be nonlinearly related to the productivity index. The results of this regression are as follows:
These results are interesting. All the coefficients are statistically highly significant, the p values being extremely small. From the negative quadratic term, it seems that although the real wage index increases as the productivity index increases, it increases at a decreasing rate. But look at the d value. It still suggests positive autocorrelation in the residuals, for dL = 1.391 and dU = 1.60 and the estimated d value lies below dL.
It may be safe to conclude from the preceding analysis that our wagesproductivity regression probably suffers from pure autocorrelation and not necessarily from specification bias. Knowing the consequences of autocorrelation, we may therefore want to take some corrective action. We will do so shortly.
Incidentally, for all the wagesproductivity regressions that we have presented above, we applied the JarqueBera test of normality and found that the residuals were normally distributed, which is comforting because the d test assumes normality of the error term.
Yt = 1.4752 + 1.3057Xt  0.9032t se = (13.18) (0.2765) (0.4203) t = (0.1119) (4.7230) (2.1490)
Yt = 16.2181 + 1.9488Xt  0.0079X2 t = (5.4891) (24.9868) (15.9363) (12.8.2)
CHAPTER TWELVE: AUTOCORRELATION 477
12.9 CORRECTING FOR (PURE) AUTOCORRELATION: THE METHOD OF GENERALIZED LEAST SQUARES (GLS)
Knowing the consequences of autocorrelation, especially the lack of efficiency of OLS estimators, we may need to remedy the problem. The remedy depends on the knowledge one has about the nature of interdependence among the disturbances, that is, knowledge about the structure of autocorrelation.
As a starter, consider the twovariable regression model:
and assume that the error term follows the AR(1) scheme, namely, ut = put— i + et —1 < p < 1 (12.9.2)
Now we consider two cases: (1) p is known and (2) p is not known but has to be estimated.
When p Is Known
If the coefficient of firstorder autocorrelation is known, the problem of autocorrelation can be easily solved. If (12.9.1) holds true at time t, it also holds true at time (t — 1). Hence,
Multiplying (12.9.3) by p on both sides, we obtain pYt— 1 = p01 + p02 Xt—1 + put—1 (12.9.4)
Subtracting (12.9.4) from (12.9.1) gives
(Yt — pYt—1) = 01(1 — p) + 02(Xt — p Xt—1) + et (12.9.5)
where 0* = 01(1 — p), Y* = (Yt — pY— 1), X* = (Xt — pX— 1), and 02* = 02.
Since the error term in (12.9.6) satisfies the usual OLS assumptions, we can apply OLS to the transformed variables Y~ and X~ and obtain estimators with all the optimum properties, namely, BLUE. In effect, running (12.9.6) is tantamount to using generalized least squares (GLS) discussed in the previous chapter—recall that GLS is nothing but OLS applied to the transformed model that satisfies the classical assumptions.
478 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL
Regression (12.9.5) is known as the generalized, or quasi, difference equation. It involves regressing Y on X, not in the original form, but in the difference form, which is obtained by subtracting a proportion (= p) of the value of a variable in the previous time period from its value in the current time period. In this differencing procedure we lose one observation because the first observation has no antecedent. To avoid this loss of one observation, the first observation on Y and X is transformed as follows35: Y1\/1 — p2 and X1^/1 — p2. This transformation is known as the PraisWinsten transformation.
Although conceptually straightforward to apply, the method of generalized difference given in (12.9.5) is difficult to implement because p is rarely known in practice. Therefore, we need to find ways of estimating p. We have several possibilities.
The FirstDifference Method. Since p lies between 0 and ±1, one could start from two extreme positions. At one extreme, one could assume that p = 0, that is, no (firstorder) serial correlation, and at the other extreme we could let p = ± 1, that is, perfect positive or negative correlation. As a matter of fact, when a regression is run, one generally assumes that there is no autocorrelation and then lets the DurbinWatson or other test show whether this assumption is justified. If, however, p = +1, the generalized difference equation (12.9.5) reduces to the firstdifference equation:
where A is the firstdifference operator introduced in (12.1.10)
Since the error term in (12.9.7) is free from (firstorder) serial correlation (why?), to run the regression (12.9.7) all one has to do is form the first differences of both the regressand and regressor(s) and run the regression on these first differences.
The first difference transformation may be appropriate if the coefficient of autocorrelation is very high, say in excess of 0.8, or the DurbinWatson d is quite low. Maddala has proposed this rough rule of thumb: Use the first difference form whenever d < R2.36 This is the case in our wagesproductivity
35The loss of one observation may not be very serious in large samples but can make a substantial difference in the results in small samples. Without transforming the first observation as indicated, the error variance will not be homoscedastic. On this see, Jeffrey Wooldridge, op. cit., p. 388. For some Monte Carlo results on the importance of the first observation, see Russell Davidson and James G. MacKinnon, Estimation and Inference in Econometrics, Oxford University Press, New York, 1993, Table 10.1, p. 349.
When p Is Not Known
regression (12.5.1), where we found that d = 0.1229 and r2 = 0.9584. Thefirstdifference regression for our illustrative example will be presented shortly.
An interesting feature of the firstdifference model (12.9.7) is that there is no intercept in it. Hence, to estimate (12.9.7), you have to use the regression through the origin routine (that is, suppress the intercept term), which is now available in most software packages. If, however, you forget to drop the intercept term in the model and estimate the following model that includes the intercept term then the original model must have a trend in it and ft represents the coefficient of the trend variable.37 Therefore, one "accidental" benefit of introducing the intercept term in the firstdifference model is to test for the presence of a trend variable in the original model.
Returning to our wagesproductivity regression (i2.5.i), and given the AR(i) scheme and a low d value in relation to r2, we rerun (i2.5.i) in the firstdifference form without the intercept term; remember that (i2.5.i) is in the level form. The results are as follows38:
Compared with the level form regression (12.5.1), we see that the slope coefficient has not changed much, but the r2 value has dropped considerably. This is generally the case because by taking the first differences we are essentially studying the behavior of variables around their (linear) trend values. Of course, we cannot compare the r2 of (12.9.9) directly with that of the r2 of (12.5.1) because the dependent variables in the two models are different.39 Also, notice that compared with the original regression, the d value has increased dramatically, perhaps indicating that there is little autocorrelation in the firstdifference regression.40
Another interesting aspect of the firstdifference transformation relates to the stationarity properties of the underlying time series. Return to Eq. (12.2.1), which describes the AR(1) scheme. Now if in fact p = 1, then it is clear from Eqs. (12.2.3) and (12.2.4) that the series ut is nonstationary, for the variances and covariances become infinite. That is why, when we
37This is easy to show. Let Yt = a1 + j1t + j2Xt + ut. Therefore, Yt—1 = a + j1 (t — 1) + j2Xt—1 + ut—1. Subtracting the latter from the former, you will obtain: AYt = j + ji^Xt + et, which shows that the intercept term in this equation is indeed the coefficient of the trend variable in the original model. Remember that we are assuming that p = 1.
38In exercise 12.38 you are asked to run this model, including the constant term.
39The comparison of r 2 in the level and firstdifference form is slightly involved. For an extended discussion on this, see Maddala, op. cit., Chap. 6.
40It is not clear whether the computed d in the firstdifference regression can be interpreted in the same way as it was in the original, level form regression. However, applying the runs test, it can be seen that there is no evidence of autocorrelation in the residuals of the firstdifference regression.
AYt = 0.7i99A Xt t = (9.2073) r2 = 0.36i0 d = i.5096
480 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL
discussed this topic, we put the restriction that p < 1. But it is clear from (12.2.1) that if the autocorrelation coefficient is in fact 1, then (12.2.1) becomes ut = ut—1 + St or
That is, it is the firstdifferenced ut that becomes stationary, for it is equal to St, which is a white noise error term.
The point of the preceding discussion is that if the original time series are nonstationary, very often their first differences become stationary. And, therefore, firstdifference transformation serves a dual purpose in that it might get rid of (firstorder) autocorrelation and also render the time series stationary. We will revisit this topic in Part V, where we discuss the econometrics of time series analysis in some depth.
We mentioned that the firstdifference transformation may be appropriate if p is high or d is low. Strictly speaking, the firstdifference transformation is valid only if p = 1. As a matter of fact, there is a test, called the BerenbluttWebb test,41 to test the hypothesis that p = 1. The test statistic they use is called the g statistic, which is defined as follows:
1 ut where ut are the OLS residuals from the original (i.e., level form) regression and et are the OLS residuals from the firstdifference regression. Keep in mind that in the firstdifference form there is no intercept.
To test the significance of the g statistic, assuming that the level form regression contains the intercept term, we can use the DurbinWatson tables except that now the null hypothesis is that p = 1 rather than the DurbinWatson hypothesis that p = 0.
Revisiting our wagesproductivity regression, for the original regression (12.5.1) we obtain Y,u2 = 272.0220 and for the first regression (12.7.11) we obtain J2ef = 0.334270. Putting these values into the g statistic given in (12.9.11), we obtain
Rules Of The Rich And Wealthy
Learning About The Rules Of The Rich And Wealthy Can Have Amazing Benefits For Your Life And Success. Discover the hidden rules and beat the rich at their own game. The general population has a love / hate kinship with riches. They resent those who have it, but spend their total lives attempting to get it for themselves. The reason an immense majority of individuals never accumulate a substantial savings is because they don't comprehend the nature of money or how it works.
Responses

KLAUS1 year ago
 Reply

Fedro5 months ago
 Reply