## Tests Of Specification Errors

Knowing the consequences of specification errors is one thing but finding out whether one has committed such errors is quite another, for we do not deliberately set out to commit such errors. Very often specification biases arise inadvertently, perhaps from our inability to formulate the model as

13Michael D. Intriligator, Econometric Models, Techniques and Applications, Prentice Hall, Englewood Cliffs, N.J., 1978, p. 189. Recall the Occam's razor principle.

precisely as possible because the underlying theory is weak or because we do not have the right kind of data to test the model. As Davidson notes, "Because of the non-experimental nature of economics, we are never sure how the observed data were generated. The test of any hypothesis in economics always turns out to depend on additional assumptions necessary to specify a reasonably parsimonious model, which may or may not be

The practical question then is not why specification errors are made, for they generally are, but how to detect them. Once it is found that specification errors have been made, the remedies often suggest themselves. If, for example, it can be shown that a variable is inappropriately omitted from a model, the obvious remedy is to include that variable in the analysis, assuming, of course, the data on that variable are available.

In this section we discuss some tests that one may use to detect specification errors.

Detecting the Presence of Unnecessary Variables (Overfitting a Model)

Suppose we develop a k-variable model to explain a phenomenon:

However, we are not totally sure that, say, the variable Xk really belongs in the model. One simple way to find this out is to test the significance of the estimated ¡k with the usual t test: t = ¡k/se (¡¡k). But suppose that we are not sure whether, say, X3 and X4 legitimately belong in the model. This can be easily ascertained by the F test discussed in Chapter 8. Thus, detecting the presence of an irrelevant variable (or variables) is not a difficult task.

It is, however, very important to remember that in carrying out these tests of significance we have a specific model in mind. We accept that model as the maintained hypothesis or the "truth," however tentative it may be. Given that model, then, we can find out whether one or more regressors are really relevant by the usual t and F tests. But note carefully that we should not use the t and F tests to build a model iteratively, that is, we should not say that initially Y is related to X2 only because ¡2 is statistically significant and then expand the model to include X3 and decide to keep that variable in the model if ¡¡3 turns out to be statistically significant, and so on. This strategy of building a model is called the bottom-up approach (starting with a smaller model and expanding it as one goes along) or by the somewhat pejorative term, data mining (other names are regression fishing, data grubbing, data snooping, and number crunching).

justified."14

14James Davidson, Econometric Theory, Blackwell Publishers, Oxford, U.K., 2000, p. 153.

516 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL

The primary objective of data mining is to develop the "best" model after several diagnostic tests so that the model finally chosen is a "good" model in the sense that all the estimated coefficients have the "right" signs, they are statistically significant on the basis of the t and F tests, the R2 value is reasonably high and the Durbin-Watson d has acceptable value (around 2), etc. The purists in the profession look down on the practice of data mining. In the words of William Pool, ". . . making an empirical regularity the foundation, rather than an implication of economic theory, is always danger-ous."15 One reason for "condemning" data mining is as follows.

Nominal versus True Level of Significance in the Presence of Data Mining. A danger of data mining that the unwary researcher faces is that the conventional levels of significance (a) such as 1, 5, or 10 percent are not the true levels of significance. Lovell has suggested that if there are c candidate regressors out of which k are finally selected (k < c) on the basis of data mining, then the true level of significance (a) is related to the nominal level of significance (a) as follows:16

For example, if c = 15, k = 5, and a = 5 percent, from (13.4.3) the true level of significance is (15/5)(5) = 15 percent. Therefore, if a researcher data-mines and selects 5 out of 15 regressors and reports only the results of the condensed model at the nominal 5 percent level of significance and declares that the results are statistically significant, one should take this conclusion with a big grain of salt, for we know the (true) level of significance is in fact 15 percent. It should be noted that if c = k, that is, there is no data mining, the true and nominal levels of significance are the same. Of course, in practice most researchers report only the results of their "final" regression without necessarily telling about all the data mining, or pretesting, that has gone before.17

Despite some of its obvious drawbacks, there is increasing recognition, especially among applied econometricians, that the purist (i.e., non-data mining) approach to model building is not tenable. As Zaman notes:

Unfortunately, experience with real data sets shows that such a [purist approach]

is neither feasible nor desirable. It is not feasible because it is a rare economic

15William Pool, "Is Inflation Too Low," the Cato Journal, vol. 18, no. 3, Winter 1999, p. 456.

16M. Lovell, "Data Mining," Review of Economics and Statistics, vol. 65, 1983, pp. 1-12.

17For a detailed discussion of pretesting and the biases it can lead to, see Wallace, T. D., "Pretest Estimation in Regression: A Survey," American Journal of Agricultural Economics, vol. 59, 1977, pp. 431-443.

theory which leads to a unique model. It is not desirable because a crucial aspect of learning from the data is learning what types of models are and are not supported by data. Even if, by rare luck, the initial model shows a good fit, it is frequently important to explore and learn the types of the models the data does or does not agree with.18

A similar view is expressed by Kerry Patterson who maintains that:

This [data mining] approach suggests that economic theory and empirical specification interact rather than be kept in separate compartments.19

Instead of getting caught in the data mining versus the purist approach to model-building controversy, one can endorse the view expressed by Peter Kennedy:

[that model specification] needs to be a well-thought-out combination of theory and data, and that testing procedures used in specification searches should be designed to minimize the costs of data mining. Examples of such procedures are setting aside data for out-of-sample prediction tests, adjusting significance levels [a la Lovell], and avoiding questionable criteria such as maximizing R2.20

If we look at data mining in a broader perspective as a process of discovering empirical regularities that might suggest errors and/or omissions in (existing) theoretical models, it has a very useful role to play. To quote Kennedy again, "The art of the applied econometrician is to allow for data-driven theory while avoiding the considerable dangers in data mining."21

In practice we are never sure that the model adopted for empirical testing is "the truth, the whole truth and nothing but the truth." On the basis of theory or introspection and prior empirical work, we develop a model that we believe captures the essence of the subject under study. We then subject the model to empirical testing. After we obtain the results, we begin the postmortem, keeping in mind the criteria of a good model discussed earlier. It is at this stage that we come to know if the chosen model is adequate. In determining model adequacy, we look at some broad features of the results, such as the R2 value, the estimated t ratios, the signs of the estimated coefficients in relation to their prior expectations, the Durbin-Watson statistic, and the like. If these diagnostics are reasonably good, we proclaim that the

18Asad Zaman, Statistical Foundations for Econometric Techniques, Academic Press, New York, 1996, p. 226.

19Kerry Patterson, An Introduction to Applied Econometrics, St. Martin's Press, New York, 2000, p. 10.

20Peter Kennedy, "Sinning in the Basement: What Are the Rules? The Ten Commandments of Applied Econometrics," unpublished manuscript.

Tests for Omitted Variables and Incorrect Functional Form

518 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL

chosen model is a fair representation of reality. By the same token, if the results do not look encouraging because the R2 value is too low or because very few coefficients are statistically significant or have the correct signs or because the Durbin-Watson d is too low, then we begin to worry about model adequacy and look for remedies: Maybe we have omitted an important variable, or have used the wrong functional form, or have not first-differenced the time series (to remove serial correlation), and so on. To aid us in determining whether model inadequacy is on account of one or more of these problems, we can use some of the following methods.

Examination of Residuals. As noted in Chapter 12, examination of the residuals is a good visual diagnostic to detect autocorrelation or het-eroscedasticity. But these residuals can also be examined, especially in cross-sectional data, for model specification errors, such as omission of an important variable or incorrect functional form. If in fact there are such errors, a plot of the residuals will exhibit distinct patterns.

To illustrate, let us reconsider the cubic total cost of production function first considered in Chapter 7. Assume that the true total cost function is described as follows, where Y = total cost and X = output:

Yi = fa1 + fa2 Xi + fa3 X2 + fa4 Xf + u (13.4.4) but a researcher fits the following quadratic function:

Yi = «1 + «2 Xi + «3 Xf + u2i (13.4.5) and another researcher fits the following linear function:

Although we know that both researchers have made specification errors, for pedagogical purposes let us see how the estimated residuals look in the three models. (The cost-output data are given in Table 7.4.) Figure 13.1 speaks for itself: As we move from left to right, that is, as we approach the truth, not only are the residuals smaller (in absolute value) but also they do not exhibit the pronounced cyclical swings associated with the misfitted models.

The utility of examining the residual plot is thus clear: If there are specification errors, the residuals will exhibit noticeable patterns.

The Durbin-Watson d Statistic Once Again. If we examine the routinely calculated Durbin-Watson d in Table 13.1, we see that for the linear cost function the estimated d is 0.716, suggesting that there is positive "correlation" in the estimated residuals: for n = 10 and k' = 1, the 5 percent

CHAPTER THIRTEEN: ECONOMETRIC MODELING 519 TABLE 13.1 ESTIMATED RESIDUALS FROM THE LINEAR, QUADRATIC, AND CUBIC TOTAL COST FUNCTIONS

Observation 0, 0, u ,, number linear model* quadratic model1" cubic model** 