The quantity r2 thus defined is known as the (sample) coefficient of determination and is the most commonly used measure of the goodness of fit of a regression line. Verbally, r2 measures the proportion or percentage of the total variation in Y explained by the regression model.

Two properties of r2 may be noted:

1. It is a nonnegative quantity. (Why?)

2. Its limits are 0 < r2 < 1. An r2 of 1 means a perfect fit, that is, Yi _ Yi for each i. On the other hand, an r2 of zero means that there is no relationship between the regressand and the regressor whatsoever (i.e., j2 _ 0). In this case, as (3.1.9) shows, Yri _ j _ Y, that is, the best prediction of any Y value is simply its mean value. In this situation therefore the regression line will be horizontal to the X axis.

Although r2 can be computed directly from its definition given in (3.5.5), it can be obtained more quickly from the following formula:

CHAPTER THREE: TWO-VARIABLE REGRESSION MODEL 85

If we divide the numerator and the denominator of (3.5.6) by the sample size n (or n — 1 if the sample size is small), we obtain

where Sy2 and Sx2 are the sample variances of Y and X, respectively. Since f2 = x^/xx2, Eq. (3.5.6) can also be expressed as r 2 = (E xyQ2 X x2X yi

an expression that may be computationally easy to obtain.

Given the definition of r2, we can express ESS and RSS discussed earlier as follows:

=r2Ey