## L

80 100 120 140 160 180 200 220 240 260 Weekly income, \$

FIGURE 2.1 Conditional distribution of expenditure for various levels of income (data of Table 2.1).

CHAPTER TWO: TWO-VARIABLE REGRESSION ANALYSIS: SOME BASIC IDEAS 39

tion expenditure within each income bracket, on the average, weekly consumption expenditure increases as income increases. To see this clearly, in Table 2.1 we have given the mean, or average, weekly consumption expenditure corresponding to each of the 10 levels of income. Thus, corresponding to the weekly income level of \$80, the mean consumption expenditure is \$65, while corresponding to the income level of \$200, it is \$137. In all we have 10 mean values for the 10 subpopulations of Y. We call these mean values conditional expected values, as they depend on the given values of the (conditioning) variable X. Symbolically, we denote them as E(Y | X), which is read as the expected value of Y given the value of X (see also Table 2.2).

It is important to distinguish these conditional expected values from the unconditional expected value of weekly consumption expenditure, E(Y). If we add the weekly consumption expenditures for all the 60 families in the population and divide this number by 60, we get the number \$121.20 (\$7272/60), which is the unconditional mean, or expected, value of weekly consumption expenditure, E(Y); it is unconditional in the sense that in arriving at this number we have disregarded the income levels of the various families.3 Obviously, the various conditional expected values of Y given in Table 2.1 are different from the unconditional expected value of Y of \$121.20. When we ask the question, "What is the expected value of weekly consumption expenditure of a family," we get the answer \$121.20 (the unconditional mean). But if we ask the question, "What is the expected value of weekly consumption expenditure of a family whose monthly income is, 3As shown in App. A, in general the conditional and unconditional mean values are different.

40 PART ONE: SINGLE-EQUATION REGRESSION MODELS

say, \$140," we get the answer \$101 (the conditional mean). To put it differently, if we ask the question, "What is the best (mean) prediction of weekly expenditure of families with a weekly income of \$140," the answer would be \$101. Thus the knowledge of the income level may enable us to better predict the mean value of consumption expenditure than if we do not have that knowledge.4 This probably is the essence of regression analysis, as we shall discover throughout this text.

The dark circled points in Figure 2.1 show the conditional mean values of Y against the various X values. If we join these conditional mean values, we obtain what is known as the population regression line (PRL), or more generally, the population regression curve.5 More simply, it is the regression of Y on X. The adjective "population" comes from the fact that we are dealing in this example with the entire population of 60 families. Of course, in reality a population may have many families.

Geometrically, then, a population regression curve is simply the locus of the conditional means of the dependent variable for the fixed values of the explanatory variable(s). More simply, it is the curve connecting the means of the subpopulations of Y corresponding to the given values of the regressor X. It can be depicted as in Figure 2.2. 