That is, Yi follows the Bernoulli probability distribution.

Now, by the definition of mathematical expectation, we obtain:

that is, the conditional expectation of the model (15.2.1) can, in fact, be interpreted as the conditional probability of Yi .In general, the expectation of a Bernoulli random variable is the probability that the random variable equals 1. In passing note that if there are n independent trials, each with a probability p of success and probability (1 — p) of failure, and X of these trials represent the number of successes, then X is said to follow the binomial distribution. The mean of the binomial distribution is np and its variance is np(1 — p). The term success is defined in the context of the problem.

Since the probability Pi must lie between 0 and 1, we have the restriction

that is, the conditional expectation (or conditional probability) must lie between 0 and 1.

From the preceding discussion it would seem that OLS can be easily extended to binary dependent variable regression models. So, perhaps there


is nothing new here. Unfortunately, this is not the case, for the LPM poses several problems, which are as follows:

Non-Normality of the Disturbances u■,

Although OLS does not require the disturbances (ui) to be normally distributed, we assumed them to be so distributed for the purpose of statistical inference.3 But the assumption of normality for ui is not tenable for the LPMs because, like Y, the disturbances ui also take only two values; that is, they also follow the Bernoulli distribution. This can be seen clearly if we write (15.2.1) as ui = Yi - fa - faXi (15.2.6)

The probability distribution of ui is ui Probability

Obviously, ui cannot be assumed to be normally distributed; they follow the Bernoulli distribution.

But the nonfulfillment of the normality assumption may not be so critical as it appears because we know that the OLS point estimates still remain unbiased (recall that, if the objective is point estimation, the normality assumption is not necessary). Besides, as the sample size increases indefinitely, statistical theory shows that the OLS estimators tend to be normally distributed generally.4 As a result, in large samples the statistical inference of the LPM will follow the usual OLS procedure under the normality assumption.

Heteroscedastic Variances of the Disturbances

Even if E(ui) = 0 and cov(ui, uj) = 0 for i = j (i.e., no serial correlation), it can no longer be maintained that in the LPM the disturbances are

3Recall that we have recommended that the normality assumption be checked in an application by suitable normality tests, such as the Jarque-Bera test.

4The proof is based on the central limit theorem and may be found in E. Malinvaud, Statistical Methods of Econometrics, Rand McNally, Chicago, 1966, pp. 195-197. If the regressors are deemed stochastic and are jointly normally distributed, the F and t tests can still be used even though the disturbances are non-normal. Also keep in mind that as the sample size increases indefinitely, the binomial distribution converges to the normal distribution.


homoscedastic. This is, however, not surprising. As statistical theory shows, for a Bernoulli distribution the theoretical mean and variance are, respectively, p and p(1 — p), where p is the probability of success (i.e., something happening), showing that the variance is a function of the mean. Hence the error variance is heteroscedastic.

For the distribution of the error term given in (15.2.7), applying the definition of variance, the reader should verify that (see exercise 15.10)

That is, the variance of the error term in the LPM is heteroscedastic. Since Pi = E(Yi | Xi) = p1 + Xi, the variance of u ultimately depends on the values of X and hence is not homoscedastic.

We already know that, in the presence of heteroscedasticity, the OLS estimators, although unbiased, are not efficient; that is, they do not have minimum variance. But the problem of heteroscedasticity, like the problem of non-normality, is not insurmountable. In Chapter 11 we discussed several methods of handling the heteroscedasticity problem. Since the variance of ui depends on E(Yi | Xi), one way to resolve the heteroscedasticity problem is to transform the model (15.2.1) by dividing it through by jE(Yi ¡Xi )[1 — E(Yi ¡Xi)] = V Pi (1 — Pi) = say Vw that is,

wi jwi jwi jw,

As you can readily verify, the transformed error term in (15.2.9) is homoscedastic. Therefore, after estimating (15.2.1), we can now estimate (15.2.9) by OLS, which is nothing but the weighted least squares (WLS) with wi serving as the weights.

In theory, what we have just described is fine. But in practice the true E(Yi | Xi) is unknown; hence the weights wi are unknown. To estimate wi, we can use the following two-step procedure5:

Step 1. Run the OLS regression (15.2.1) despite the heteroscedasticity problem and obtain Yi = estimate of the true E(Yi | Xi). Then obtain wi = Yi (1 — Yi), the estimate of wi.

5For the justification of this procedure, see Arthur S. Goldberger, Econometric Theory, John Wiley & Sons, New York, 1964, pp. 249-250. The justification is basically a large-sample one that we discussed under the topic of feasible or estimated generalized least squares in the chapter on heteroscedasticity (see Sec. 11.6).


Step 2. Use the estimated wi to transform the data as shown in (15.2.9) and estimate the transformed equation by OLS (i.e., weighted least squares).

We will illustrate this procedure for our example shortly. But there is another problem with LPM that we need to address first.

Since E(Yi | X) in the linear probability models measures the conditional probability of the event Y occurring given X, it must necessarily lie between 0 and 1. Although this is true a priori, there is no guarantee that Yi, the estimators of E(Yi | Xi), will necessarily fulfill this restriction, and this is the real problem with the OLS estimation of the LPM. There are two ways of finding out whether the estimated Yi lie between 0 and 1. One is to estimate the LPM by the usual OLS method and find out whether the estimated Yi lie between 0 and 1. If some are less than 0 (that is, negative), Yi is assumed to be zero for those cases; if they are greater than 1, they are assumed to be 1. The second procedure is to devise an estimating technique that will guarantee that the estimated conditional probabilities Yi will lie between 0 and 1. The logit and probit models discussed later will guarantee that the estimated probabilities will indeed lie between the logical limits 0 and 1.

Questionable Value of R2 as a Measure of Goodness of Fit

The conventionally computed R2 is of limited value in the dichotomous response models. To see why, consider the following figure. Corresponding to a given X, Y is either 0 or 1. Therefore, all the Y values will either lie along the X axis or along the line corresponding to 1. Therefore, generally no LPM is expected to fit such a scatter well, whether it is the unconstrained LPM (Figure 15.1a) or the truncated or constrained LPM (Figure 15.1b), an LPM estimated in such a way that it will not fall outside the logical band 0-1. As a result, the conventionally computed R2 is likely to be much lower than 1 for such models. In most practical applications the R2 ranges between 0.2 to 0.6. R2 in such models will be high, say, in excess of 0.8 only when the actual scatter is very closely clustered around points A and B (Figure 15.1c), for in that case it is easy to fix the straight line by joining the two points A and B. In this case the predicted Yi will be very close to either 0 or 1.

For these reasons John Aldrich and Forrest Nelson contend that "use of the coefficient of determination as a summary statistic should be avoided in models with qualitative dependent variable.''6

6Aldrich and Nelson, op. cit., p. 15. For other measures of goodness of fit in models involving dummy regressands, see T. Amemiya, "Qualitative Response Models,'' Journal of Economic Literature, vol. 19, 1981, pp. 331-354.


LPM (unconstrained)
LPM (constrained)

FIGURE 15.1 Linear probability models.

FIGURE 15.1 Linear probability models.


To illustrate some of the points made about the LPM in the preceding section, we present a numerical example. Table 15.1 gives invented data on home ownership Y (1 = owns a house, 0 = does not own a house) and family income X (thousands of dollars) for 40 families.

From these data the LPM estimated by OLS was as follows:




Was this article helpful?

0 0
Rules Of The Rich And Wealthy

Rules Of The Rich And Wealthy

Learning About The Rules Of The Rich And Wealthy Can Have Amazing Benefits For Your Life And Success. Discover the hidden rules and beat the rich at their own game. The general population has a love / hate kinship with riches. They resent those who have it, but spend their total lives attempting to get it for themselves. The reason an immense majority of individuals never accumulate a substantial savings is because they don't comprehend the nature of money or how it works.

Get My Free Ebook

Post a comment