## Summary And Conclusions

1. Estimation and hypothesis testing constitute the two main branches of classical statistics. Having discussed the problem of estimation in Chapters 3 and 4, we have taken up the problem of hypothesis testing in this chapter.

2. Hypothesis testing answers this question: Is a given finding compatible with a stated hypothesis or not?

3. There are two mutually complementary approaches to answering the preceding question: confidence interval and test of significance.

4. Underlying the confidence-interval approach is the concept of interval estimation. An interval estimator is an interval or range constructed in such a manner that it has a specified probability of including within its limits the true value of the unknown parameter. The interval thus constructed is known as a confidence interval, which is often stated in percent form, such as 90 or 95%. The confidence interval provides a set of plausible hypotheses about the value of the unknown parameter. If the null-hypothesized value lies in the confidence interval, the hypothesis is not rejected, whereas if it lies outside this interval, the null hypothesis can be rejected.

5. In the significance test procedure, one develops a test statistic and examines its sampling distribution under the null hypothesis. The test statistic usually follows a well-defined probability distribution such as the normal, t, F, or chi-square. Once a test statistic (e.g., the t statistic) is computed

Gujarati: Basic I. Single-Equation 5. Two-Variable © The McGraw-Hill

Econometrics, Fourth Regression Models Regression: Interval Companies, 2004

Edition Estimation and Hypothesis

Testing

CHAPTER FIVE: TWO VARIABLE REGRESSION: INTERVAL ESTIMATION AND HYPOTHESIS TESTING 151

from the data at hand, its p value can be easily obtained. The p value gives the exact probability of obtaining the estimated test statistic under the null hypothesis. If this p value is small, one can reject the null hypothesis, but if it is large one may not reject it. What constitutes a small or large p value is up to the investigator. In choosing the p value the investigator has to bear in mind the probabilities of committing Type I and Type II errors.

6. In practice, one should be careful in fixing a, the probability of committing a Type I error, at arbitrary values such as 1, 5, or 10 percent. It is better to quote the p value of the test statistic. Also, the statistical significance of an estimate should not be confused with its practical significance.

7. Of course, hypothesis testing presumes that the model chosen for empirical analysis is adequate in the sense that it does not violate one or more assumptions underlying the classical normal linear regression model. Therefore, tests of model adequacy should precede tests of hypothesis. This chapter introduced one such test, the normality test, to find out whether the error term follows the normal distribution. Since in small, or finite, samples, the t, F, and chi-square tests require the normality assumption, it is important that this assumption be checked formally.

8. If the model is deemed practically adequate, it may be used for forecasting purposes. But in forecasting the future values of the regressand, one should not go too far out of the sample range of the regressor values. Otherwise, forecasting errors can increase dramatically.

EXERCISES Questions

5.1. State with reason whether the following statements are true, false, or uncertain. Be precise.

a. The t test of significance discussed in this chapter requires that the sampling distributions of estimators ft and ft follow the normal distribution.

b. Even though the disturbance term in the CLRM is not normally distributed, the OLS estimators are still unbiased.

c. If there is no intercept in the regression model, the estimated 4 (= 4) will not sum to zero.

d. The p value and the size of a test statistic mean the same thing.

e. In a regression model that contains the intercept, the sum of the residuals is always zero.

f. If a null hypothesis is not rejected, it is true.

g. The higher the value of a2, the larger is the variance of j?2 given in (3.3.1).

h. The conditional and unconditional means of a random variable are the same things.

i. In the two-variable PRF, if the slope coefficient j2 is zero, the intercept jj1 is estimated by the sample mean Y.

j. The conditional variance, var (Y | X{) = a2, and the unconditional variance of Y, var (Y) = a^, will be the same if X had no influence on Y.

Gujarati: Basic I I. Single-Equation I 5. Two-Variable I I © The McGraw-Hill

Econometrics, Fourth Regression Models Regression: Interval Companies, 2004 Edition Estimation and Hypothesis

Testing

152 PART ONE: SINGLE-EQUATION REGRESSION MODELS

5.2. Set up the ANOVA table in the manner of Table 5.4 for the regression model given in (3.7.2) and test the hypothesis that there is no relationship between food expenditure and total expenditure in India.

5.3. From the data given in Table 2.6 on earnings and education, we obtained the following regression [see Eq. (3.7.3)]:

Meanwage = 0.7437 + 0.6416 Education; se = (0.8355) ( ) t = ( ) (9.6536) r2 = 0.8944 n = 13

a. Fill in the missing numbers.

b. How do you interpret the coefficient 0.6416?

c. Would you reject the hypothesis that education has no effect whatsoever on wages? Which test do you use? And why? What is the p value of your test statistic?

d. Set up the ANOVA table for this example and test the hypothesis that the slope coefficient is zero. Which test do you use and why?

e. Suppose in the regression given above the r2 value was not given to you. Could you have obtained it from the other information given in the regression?

5.4. Let p2 represent the true population coefficient of correlation. Suppose you want to test the hypothesis that p2 = 0. Verbally explain how you would test this hypothesis. Hint: Use Eq. (3.5.11). See also exercise 5.7.

5.5. What is known as the characteristic line of modern investment analysis is simply the regression line obtained from the following model:

r;t = a; + + u where rit = the rate of return on the ith security in time t rmt = the rate of return on the market portfolio in time t ut = stochastic disturbance term

In this model f is known as the beta coefficient of the ith security, a measure of market (or systematic) risk of a security.*

On the basis of 240 monthly rates of return for the period 1956-1976, Fogler and Ganapathy obtained the following characteristic line for IBM stock in relation to the market portfolio index developed at the University of Chicago1':

a. A security whose beta coefficient is greater than one is said to be a volatile or aggressive security. Was IBM a volatile security in the time period under study?

See Haim Levy and Marshall Sarnat, Portfolio and Investment Selection: Theory and Practice, Prentice-Hall International, Englewood Cliffs, N.J., 1984, Chap. 12.

1H. Russell Fogler and Sundaram Ganapathy, Financial Econometrics, Prentice Hall, Englewood Cliffs, N.J., 1982, p. 13.

5. Two-Variable Regression: Interval Estimation and Hypothesis Testing

CHAPTER FIVE: TWO VARIABLE REGRESSION: INTERVAL ESTIMATION AND HYPOTHESIS TESTING 153

b. Is the intercept coefficient significantly different from zero? If it is, what is its practical meaning?

5.6. Equation (5.3.5) can also be written as

Pr [fa — 4/2se (fa) < 02 < fa + ta/2se (fa)] = 1 - a

That is, the weak inequality (<) can be replaced by the strong inequality (<). Why?

5.7. R. A. Fisher has derived the sampling distribution of the correlation coefficient defined in (3.5.13). If it is assumed that the variables X and Y are jointly normally distributed, that is, if they come from a bivariate normal distribution (see Appendix 4A, exercise 4.1), then under the assumption that the population correlation coefficient p is zero, it can be shown that t = r^Jn — 2/V1 — r2 follows Student's t distribution with n — 2 df.* Show that this t value is identical with the t value given in (5.3.2) under the null hypothesis that 02 = 0. Hence establish that under the same null hypothesis F = t2. (See Section 5.9.)

where Y = labor force participation rate (LFPR) of women in 1972 and X = LFPR of women in 1968. The regression results were obtained from a sample of 19 cities in the United States.

a. How do you interpret this regression?

b. Test the hypothesis: H0:fa2 = 1 against H1:fa2 > 1. Which test do you use? And why? What are the underlying assumptions of the test(s) you use?

c. Suppose that the LFPR in 1968 was 0.58 (or 58 percent). On the basis of the regression results given above, what is the mean LFPR in 1972? Establish a 95% confidence interval for the mean prediction.

d. How would you test the hypothesis that the error term in the population regression is normally distribute? Show the necessary calculations.

5.9. Table 5.5 gives data on average public teacher pay (annual salary in dollars) and spending on public schools per pupil (dollars) in 1985 for 50 states and the District of Columbia.

If p is in fact zero, Fisher has shown that r follows the same t distribution provided either X or Yis normally distributed. But if p is not equal to zero, both variables must be normally distributed. See R. L. Anderson and T. A. Bancroft, Statistical Theory in Research, McGraw-Hill, New York, 1952, pp. 87-88.

^Adapted from Samprit Chatterjee, Ali S. Hadi, and Bertram Price, Regression Analysis by Example, 3d ed., Wiley Interscience, New York, 2000, pp. 46-47.

Problems

5.8. Consider the following regression output1':

Gujarati: Basic I I. Single-Equation I 5. Two-Variable I I © The McGraw-Hill

Econometrics, Fourth Regression Models Regression: Interval Companies, 2004 Edition Estimation and Hypothesis

Testing

154 PART ONE: SINGLE-EQUATION REGRESSION MODELS

 Observation Salary Spending Observation Salary Spending