## Specification Tests For Binary Choice Models

In the linear regression model, we considered two important specification problems, the effect of omitted variables and the effect of heteroscedasticity. In the classical model, y = X? + X2/?2 + ®, when least squares estimates bi are computed omitting X2,

Unless X1 and X2 are orthogonal or ??2 = 0, b1 is biased. If we ignore heteroscedasticity, then although the least squares estimator is still unbiased and consistent, it is inefficient and the usual estimate of its sampling covariance matrix is inappropriate. Yatchew and Griliches (1984) have examined these same issues in the setting of the probit and logit models. Their general results are far more pessimistic. In the context of a binary choice model, they find the following:

1. If x2 is omitted from a model containing x1 and x2, (i.e. ??2 = 0) then plim/?1 = C101 + C2?2, where c1 and c2 are complicated functions of the unknown parameters. The implication is that even if the omitted variable is uncorrelated with the included one, the coefficient on the included variable will be inconsistent.

2. If the disturbances in the underlying regression are heteroscedastic, then the maximum likelihood estimators are inconsistent and the covariance matrix is inappropriate.

The second result is particularly troubling because the probit model is most often used with microeconomic data, which are frequently heteroscedastic.

Any of the three methods of hypothesis testing discussed above can be used to analyze these specification problems. The Lagrange multiplier test has the advantage that it can be carried out using the estimates from the restricted model, which sometimes brings a large saving in computational effort. This situation is especially true for the test for heteroscedasticity.13

To reiterate, the Lagrange multiplier statistic is computed as follows. Let the null hypothesis, H0,be a specification of the model, and let H1 be the alternative. For example, H0 might specify that only variables x1 appear in the model, whereas H1 might specify that x2 appears in the model as well. The statistic is

LM = g0Vo1go, where g0 is the vector of derivatives of the log-likelihood as specified by H1 but evaluated at the maximum likelihood estimator of the parameters assuming that H0 is true, and V0—1 is any of the three consistent estimators of the asymptotic variance matrix of the maximum likelihood estimator under H1, also computed using the maximum likelihood estimators based on H0. The statistic is asymptotically distributed as chi-squared with degrees of freedom equal to the number of restrictions.

13The results in this section are based on Davidson and MacKinnon (1984) and Engle (1984). A symposium on the subject of specification tests in discrete choice models is Blundell (1987).

21.4.4.a Omitted Variables

The hypothesis to be tested is

so the test is of the null hypothesis that 02 = 0. The Lagrange multiplier test would be carried out as follows:

1. Estimate the model in H0 by maximum likelihood. The restricted coefficient vector is [001, 0].

The statistic is then computed according to (21-29) or (21-30). It is noteworthy that in this case as in many others, the Lagrange multiplier is the coefficient of determination in a regression.

21.4.4.b Heteroscedasticity

We use the general formulation analyzed by Harvey (1976),14