Consequences Of Model Specification Errors

Whatever the sources of specification errors, what are the consequences? To keep the discussion simple, we will answer this question in the context of the three-variable model and consider in this section the first two types of specification errors discussed earlier, namely, (1) underfitting a model, that is, omitting relevant variables, and (2) overfitting a model, that is, including unnecessary variables. Our discussion here can be easily generalized to more than two regressors, but with tedious algebra6; matrix algebra becomes almost a necessity once we go beyond the three-variable case.

Underfitting a Model (Omitting a Relevant Variable)

Suppose the true model is:

but for some reason we fit the following model:

The consequences of omitting variable X3 are as follows:

1. If the left-out, or omitted, variable X3 is correlated with the included variable X2, that is, r23, the correlation coefficient between the two variables, is nonzero, « 1 and S2 are biased as well as inconsistent. That is, E(a 1) = fa1 and E(<S2) = fa2, and the bias does not disappear as the sample size gets larger.

2. Even if X2 and X3 are not correlated, S1 is biased, although <S2 is now unbiased.

3. The disturbance variance a2 is incorrectly estimated.

4. The conventionally measured variance of S2 (= a2/J2 x2i) is a biased estimator of the variance of the true estimator fa2.

5. In consequence, the usual confidence interval and hypothesis-testing procedures are likely to give misleading conclusions about the statistical significance of the estimated parameters.

6But see exercise 13.32.


6. As another consequence, the forecasts based on the incorrect model and the forecast (confidence) intervals will be unreliable.

Although proofs of each of the above statements will take us far afield,7 it is shown in Appendix 13A, Section 13A.1, that

where b32 is the slope in the regression of the excluded variableX3 on the included variableX2 (b32 = Ex3ix2i/Exfj)• As (13.3.3) shows, a2 is biased, unless 03 or b32 or both are zero. We rule out 03 being zero, because in that case we do not have specification error to begin with. The coefficient b32 will be zero if X2 and X3 are uncorrelated, which is unlikely in most economic data.

Generally, however, the extent of the bias will depend on the bias term 03b32. If, for instance, 03 is positive (i.e., X3 has a positive effect on Y) and b32 is positive (i.e., X2 and X3 are positively correlated), <52, on average, will overestimate the true 02 (i.e., positive bias). But this result should not be surprising, for X2 represents not only its direct effect on Y but also its indirect effect (via X3) on Y. In short, X2 gets credit for the influence that is rightly attributable to X3, the latter prevented from showing its effect explicitly because it is not "allowed" to enter the model. As a concrete example, consider the example discussed in Chapter 7.


Regressing child mortality (CM) on per capita GNP (PGNP) and female literacy rate (FLR), we obtained the regression results shown in Eq. (7.6.2), giving the partial slope coefficient values of the two variables as -0.0056 and -2.2316, respectively. But if we now drop the FLR variable, we obtain the results shown in Eq. (7.7.2). If we regard (7.6.2) as the correct model, then (7.7.2) is a mis-specified model in that it omits the relevant variable FLR. Now you can see that in the correct model the coefficient of the PGNP variable was -0.0056, whereas in the "incorrect" model (7.7.2) it is now -0.0114.

In absolute terms, now PGNP has a greater impact on CM as compared with the true model. But if we regress FLR on PGNP (regression of the excluded variable on the included variable), the slope coefficient in this regression [¿32 in terms of Eq. (13.3.3)] is 0.00256.8 This suggests that as PGNP increases by a unit, on average, FLR goes up by 0.00256 units. But if FLR goes up by these units, its effect on CM will be (-2.2316) (0.00256) = j)3b32 = -0.00543.

Therefore, from (13.3.3) we finally have (j2 + j3b32) = [-0.0056 + (—2.2316)(0.00256)] ^ -0.0111, which is about the value of the PGNP coefficient obtained in the incorrect model (7.7.2).9 As this example illustrates, the true impact of PGNP on CM is much less (-0.0056) than that suggested by the incorrect model (7.7.2), namely, (-0.0114).

7For an algebraic treatment, see Jan Kmenta, Elements of Econometrics, Macmillan, New York, 1971, pp. 391-399. Those with a matrix algebra background may want to consult J. Johnston, Econometrics Methods, 4th ed., McGraw-Hill, New York, 1997, pp. 119-112.

8The regression results are:

9Note that in the true model 02 and 03 are unbiased estimates of their true values.


Now let us examine the variances of a2 and fa

Was this article helpful?

0 -1
Rules Of The Rich And Wealthy

Rules Of The Rich And Wealthy

Learning About The Rules Of The Rich And Wealthy Can Have Amazing Benefits For Your Life And Success. Discover the hidden rules and beat the rich at their own game. The general population has a love / hate kinship with riches. They resent those who have it, but spend their total lives attempting to get it for themselves. The reason an immense majority of individuals never accumulate a substantial savings is because they don't comprehend the nature of money or how it works.

Get My Free Ebook


  • Konsta
    What are the consequences of specification bias?
    2 years ago

Post a comment