## Y

9For illustration, we are assuming merely that the us are distributed symmetrically as shown in Figure 3.3. But shortly we will assume that the us are distributed normally.

68 PART ONE: SINGLE-EQUATION REGRESSION MODELS

values cancel out the negative ui values so that their average or mean effect on Y is zero.10

In passing, note that the assumption E(ui | Xi) = 0 implies that E(Y | Xi) = ¡i + ¡2Xi. (Why?) Therefore, the two assumptions are equivalent.

Assumption 4: Homoscedasticity or equal variance of u.. Given the value of X, the variance of u, is the same for all observations. That is, the conditional variances of u, are identical. Symbolically, we have var (u, | Xi) = E[u, - E(u, | X,)]2

where var stands for variance.

Eq. (3.2.2) states that the variance of ui for each Xi (i.e., the conditional variance of ui) is some positive constant number equal to a2. Technically, (3.2.2) represents the assumption of homoscedasticity, or equal (homo) spread (scedasticity) or equal variance. The word comes from the Greek verb skedanime, which means to disperse or scatter. Stated differently, (3.2.2) means that the Y populations corresponding to various X values have the same variance. Put simply, the variation around the regression line (which is the line of average relationship between Y and X) is the same across the X values; it neither increases or decreases as X varies. Diagrammatically, the situation is as depicted in Figure 3.4.

FIGURE 3.4 Homoscedasticity.

10For a more technical reason why Assumption 3 is necessary see E. Malinvaud, Statistical Methods of Econometrics, Rand McNally, Chicago, 1966, p. 75. See also exercise 3.3.

CHAPTER THREE: TWO-VARIABLE REGRESSION MODEL 69

FIGURE 3.5 Heteroscedasticity.

In contrast, consider Figure 3.5, where the conditional variance of the Y population varies with X. This situation is known appropriately as het-eroscedasticity, or unequal spread, or variance. Symbolically, in this situation (3.2.2) can be written as var (u | Xi) = of (3.2.3)

Notice the subscript on o2 in Eq. (3.2.3), which indicates that the variance of the Y population is no longer constant.

To make the difference between the two situations clear, let Y represent weekly consumption expenditure and X weekly income. Figures 3.4 and 3.5 show that as income increases the average consumption expenditure also increases. But in Figure 3.4 the variance of consumption expenditure remains the same at all levels of income, whereas in Figure 3.5 it increases with increase in income. In other words, richer families on the average consume more than poorer families, but there is also more variability in the consumption expenditure of the former.

To understand the rationale behind this assumption, refer to Figure 3.5. As this figure shows, var (u | X1) < var(u | X2),..., < var(u | Xi). Therefore, the likelihood is that the Y observations coming from the population with X = X1 would be closer to the PRF than those coming from populations corresponding to X = X2, X = X3, and so on. In short, not all Y values corresponding to the various X's will be equally reliable, reliability being judged by how closely or distantly the Y values are distributed around their means, that is, the points on the PRF. If this is in fact the case, would we not prefer to sample from those Y populations that are closer to their mean than those that are widely spread? But doing so might restrict the variation we obtain across X values.

70 PART ONE: SINGLE-EQUATION REGRESSION MODELS

By invoking Assumption 4, we are saying that at this stage all Y values corresponding to the various Xs are equally important. In Chapter 11 we shall see what happens if this is not the case, that is, where there is het-eroscedasticity.

In passing, note that Assumption 4 implies that the conditional variances of Y{ are also homoscedastic. That is,

Of course, the unconditional variance of Y is Oy- Later we will see the importance of distinguishing between conditional and unconditional variances of Y (see Appendix A for details of conditional and unconditional variances).

Assumption 5: No autocorrelation between the disturbances. Given any two Xvalues, Xi and Xj (i = j), the correlation between any two u and Uj (i = j) is zero. Symbolically, cov (U, Uj | X, Xj) = E{[u, — E(u)] | X}{[Uj — E(u)] | Xj}

where i and j are two different observations and where cov means covariance.

In words, (3.2.5) postulates that the disturbances ui and Uj are uncorre-lated. Technically, this is the assumption of no serial correlation, or no autocorrelation. This means that, given Xi, the deviations of any two Y values from their mean value do not exhibit patterns such as those shown in Figure 3.6a and b. In Figure 3.6a, we see that the us are positively correlated, a positive u followed by a positive u or a negative u followed by a negative u. In Figure 3.6b, the us are negatively correlated, a positive u followed by a negative u and vice versa.

If the disturbances (deviations) follow systematic patterns, such as those shown in Figure 3.6a and b, there is auto- or serial correlation, and what Assumption 5 requires is that such correlations be absent. Figure 3.6c shows that there is no systematic pattern to the us, thus indicating zero correlation.

The full import of this assumption will be explained thoroughly in Chapter 12. But intuitively one can explain this assumption as follows. Suppose in our PRF (Yt = p1 + p2Xt + ut) that ut and ut—1 are positively correlated. Then Yt depends not only on Xt but also on ut—1 for ut—1 to some extent determines ut. At this stage of the development of the subject matter, by invoking Assumption 5, we are saying that we will consider the systematic effect, if any, of Xt on Yt and not worry about the other influences that might act on Y as a result of the possible intercorrelations among the us. But, as noted in Chapter 12, we will see how intercorrelations among the disturbances can be brought into the analysis and with what consequences.

Gujarati: Basic Econometrics, Fourth Edition

I. Single-Equation Regression Models

3. Two-Variable Regression Model: The Problem of Estimation

CHAPTER THREE: TWO-VARIABLE REGRESSION MODEL 71

FIGURE 3.6 Patterns of correlation among the disturbances. (a) positive serial correlation; (b) negative serial correlation; (c) zero correlation.

Assumption 6: Zero covariance between u and X, or E(u,X) = 0. Formally, cov (u, X) = E[u, - E(u,)][X, - E(X)]

= E[Ui(Xi - E(Xj})] since E(u) = 0 = E(uX) - E(X)E(u) since E(X) is nonstochastic = E(uX) since E(u) = 0 = 0 by assumption

Assumption 6 states that the disturbance U and explanatory variable X are uncorrelated. The rationale for this assumption is as follows: When we expressed the PRF as in (2.4.2), we assumed that X and u (which may represent the influence of all the omitted variables) have separate (and additive) influence on Y. But if X and u are correlated, it is not possible to assess their individual effects on Y. Thus, if X and U are positively correlated, X increases

72 PART ONE: SINGLE-EQUATION REGRESSION MODELS

when U increases and it decreases when U decreases. Similarly, if X and U are negatively correlated, X increases when U decreases and it decreases when u increases. In either case, it is difficult to isolate the influence of X and U on Y.

Assumption 6 is automatically fulfilled if X variable is nonrandom or nonstochastic and Assumption 3 holds, for in that case, cov (ui,Xi) = [Xi — E(Xi)]E[ui — E(ui)] = 0. (Why?) But since we have assumed that ourXvari-able not only is nonstochastic but also assumes fixed values in repeated samples,11 Assumption 6 is not very critical for us; it is stated here merely to point out that the regression theory presented in the sequel holds true even if the X's are stochastic or random, provided they are independent or at least uncorrelated with the disturbances ui.12 (We shall examine the consequences of relaxing Assumption 6 in Part II.)

Assumption 7: The number of observations n must be greater than the number of parameters to be estimated. Alternatively, the number of observations n must be greater than the number of explanatory variables.

This assumption is not so innocuous as it seems. In the hypothetical example of Table 3.1, imagine that we had only the first pair of observations on Y and X (4 and 1). From this single observation there is no way to estimate the two unknowns, 1 and 2. We need at least two pairs of observations to estimate the two unknowns. In a later chapter we will see the critical importance of this assumption.

Assumption 8: Variability in Xvalues. The Xvalues in a given sample must not all be the same. Technically, var (X) must be a finite positive number.13

This assumption too is not so innocuous as it looks. Look at Eq. (3.1.6). If all the X values are identical, then Xi = X (Why?) and the denominator of that equation will be zero, making it impossible to estimate and therefore fai. Intuitively, we readily see why this assumption is important. Looking at

"Recall that in obtaining the samples shown in Tables 2.4 and 2.5, we kept the same X values.

12As we will discuss in Part II, if the X's are stochastic but distributed independently of ui, the properties of least estimators discussed shortly continue to hold, but if the stochastic X's are merely uncorrelated with u, the properties of OLS estimators hold true only if the sample size is very large. At this stage, however, there is no need to get bogged down with this theoretical point.

13The sample variance of X is where n is sample size.

our family consumption expenditure example in Chapter 2, if there is very little variation in family income, we will not be able to explain much of the variation in the consumption expenditure. The reader should keep in mind that variation in both Y and X is essential to use regression analysis as a research tool. In short, the variables must vary!

Assumption 9: The regression model is correctly specified. Alternatively, there is no specification bias or error in the model used in empirical analysis.

As we discussed in the Introduction, the classical econometric methodology assumes implicitly, if not explicitly, that the model used to test an economic theory is "correctly specified." This assumption can be explained informally as follows. An econometric investigation begins with the specification of the econometric model underlying the phenomenon of interest. Some important questions that arise in the specification of the model include the following: (1) What variables should be included in the model? (2) What is the functional form of the model? Is it linear in the parameters, the variables, or both? (3) What are the probabilistic assumptions made about the Yi, the Xi, and the ui entering the model?

These are extremely important questions, for, as we will show in Chapter 13, by omitting important variables from the model, or by choosing the wrong functional form, or by making wrong stochastic assumptions about the variables of the model, the validity of interpreting the estimated regression will be highly questionable. To get an intuitive feeling about this, refer to the Phillips curve shown in Figure 1.3. Suppose we choose the following two models to depict the underlying relationship between the rate of change of money wages and the unemployment rate:

where Yi = the rate of change of money wages, and Xi = the unemployment rate.

The regression model (3.2.7) is linear both in the parameters and the variables, whereas (3.2.8) is linear in the parameters (hence a linear regression model by our definition) but nonlinear in the variable X. Now consider Figure 3.7.

If model (3.2.8) is the "correct" or the "true" model, fitting the model (3.2.7) to the scatterpoints shown in Figure 3.7 will give us wrong predictions: Between points A and B, for any given Xi the model (3.2.7) is going to overestimate the true mean value of Y, whereas to the left of A (or to the right of B) it is going to underestimate (or overestimate, in absolute terms) the true mean value of Y.

74 PART ONE: SINGLE-EQUATION REGRESSION MODELS

74 PART ONE: SINGLE-EQUATION REGRESSION MODELS

The preceding example is an instance of what is called a specification bias or a specification error;here the bias consists in choosing the wrong functional form. We will see other types of specification errors in Chapter 13.

Unfortunately, in practice one rarely knows the correct variables to include in the model or the correct functional form of the model or the correct probabilistic assumptions about the variables entering the model for the theory underlying the particular investigation (e.g., the Phillips-type money wage change-unemployment rate tradeoff) may not be strong or robust enough to answer all these questions. Therefore, in practice, the econome-trician has to use some judgment in choosing the number of variables entering the model and the functional form of the model and has to make some assumptions about the stochastic nature of the variables included in the model. To some extent, there is some trial and error involved in choosing the "right" model for empirical analysis.14

If judgment is required in selecting a model, what is the need for Assumption 9? Without going into details here (see Chapter 13), this assumption is there to remind us that our regression analysis and therefore the results based on that analysis are conditional upon the chosen model and to warn us that we should give very careful thought in formulating econometric

14But one should avoid what is known as "data mining," that is, trying every possible model with the hope that at least one will fit the data well. That is why it is essential that there be some economic reasoning underlying the chosen model and that any modifications in the model should have some economic justification. A purely ad hoc model may be difficult to justify on theoretical or a priori grounds. In short, theory should be the basis of estimation. But we will have more to say about data mining in Chap. 13, for there are some who argue that in some situations data mining can serve a useful purpose.

models, especially when there may be several competing theories trying to explain an economic phenomenon, such as the inflation rate, or the demand for money, or the determination of the appropriate or equilibrium value of a stock or a bond. Thus, econometric model-building, as we shall discover, is more often an art rather than a science.

Our discussion of the assumptions underlying the classical linear regression model is now completed. It is important to note that all these assumptions pertain to the PRF only and not the SRF. But it is interesting to observe that the method of least squares discussed previously has some properties that are similar to the assumptions we have made about the PRF. For example, the finding that J2 ui = 0, and, therefore, u= 0, is akin to the assumption that E(ui | Xi) = 0. Likewise, the finding that J2 uiXi = 0 is similar to the assumption that cov(ui, Xi) = 0. It is comforting to note that the method of least squares thus tries to "duplicate" some of the assumptions we have imposed on the PRF.

Of course, the SRF does not duplicate all the assumptions of the CLRM. As we will show later, although cov(ui, uj) = 0(i = j) by assumption, it is not true that the sample cov(ui, uj) = 0(i = j). As a matter of fact, we will show later that the residuals not only are autocorrelated but also are het-eroscedastic (see Chapter 12).

When we go beyond the two-variable model and consider multiple regression models, that is, models containing several regressors, we add the following assumption.

Assumption 10: There is no perfect multicollinearity. That is, there are no perfect linear relationships among the explanatory variables.

We will discuss this assumption in Chapter 7, where we discuss multiple regression models.

The million-dollar question is: How realistic are all these assumptions? The "reality of assumptions" is an age-old question in the philosophy of science. Some argue that it does not matter whether the assumptions are realistic. What matters are the predictions based on those assumptions. Notable among the "irrelevance-of-assumptions thesis" is Milton Friedman. To him, unreality of assumptions is a positive advantage: "to be important... a hypothesis must be descriptively false in its assumptions."15

One may not subscribe to this viewpoint fully, but recall that in any scientific study we make certain assumptions because they facilitate the

15Milton Friedman, Essays in Positive Economics, University of Chicago Press, Chicago, 1953, p. 14.

76 PART ONE: SINGLE-EQUATION REGRESSION MODELS

development of the subject matter in gradual steps, not because they are necessarily realistic in the sense that they replicate reality exactly. As one author notes, ". . . if simplicity is a desirable criterion of good theory, all good theories idealize and oversimplify outrageously."16

What we plan to do is first study the properties of the CLRM thoroughly, and then in later chapters examine in depth what happens if one or more of the assumptions of CLRM are not fulfilled. At the end of this chapter, we provide in Table 3.4 a guide to where one can find out what happens to the CLRM if a particular assumption is not satisfied.

As a colleague pointed out to me, when we review research done by others, we need to consider whether the assumptions made by the researcher are appropriate to the data and problem. All too often, published research is based on implicit assumptions about problem and data that are likely not correct and that produce estimates based on these assumptions. Clearly, the knowledgeable reader should, realizing these problems, adopt a skeptical attitude toward the research. The assumptions listed in Table 3.4 therefore provide a checklist for guiding our research and for evaluating the research of others.

With this backdrop, we are now ready to study the CLRM. In particular, we want to find out the statistical properties of OLS compared with the purely numerical properties discussed earlier. The statistical properties of OLS are based on the assumptions of CLRM already discussed and are enshrined in the famous Gauss-Markov theorem. But before we turn to this theorem, which provides the theoretical justification for the popularity of OLS, we first need to consider the precision or standard errors of the least-squares estimates.