## Vnr0 U N[0 T

For some covariance matrix T that we have yet to estimate, it follows that the Wald statistic, where the degrees of freedom J is the number of moment restrictions being tested and T is an estimate of T. Thus, the statistic can be referred to the chi-squared table. It remains to determine the estimator of T. The full derivation of T is fairly complicated. See Pagan and Vella (1989, pp. S32-S33). But when the vector of parameter estimators is a maximum likelihood estimator, as it would be for the...

## Multivariate Probit Model

In principle, a multivariate model would extend (21-41) to more than two outcome variables just by adding equations. The practical obstacle to such an extension is primarily the evaluation of higher-order multivariate normal integrals. Some progress has been made on using quadrature for trivariate integration, but existing results are not sufficient to allow accurate and efficient evaluation for more than two variables in a sample of even moderate size. An altogether different approach has been...

## An Encompassing Model

The encompassing approach is one in which the ability of one model to explain features of another is tested. Model 0 encompasses Model 1 if the features of Model 1 can be explained by Model 0 but the reverse is not true.5 Since H0 cannot be written as a restriction on Hi, none of the procedures we have considered thus far is appropriate. One possibility is an artificial nesting of the two models. Let X be the set of variables in X that are not in Z, define Z likewise with respect to X, and let...

## Estimation When Q Is Unknown

For an unknown Q, there are a variety of approaches. Any consistent estimator of Q(p) will suffice recall from Theorem (10.8) in Section 10.5.2, all that is needed for efficient estimation of 0 is a consistent estimator of Q(p). The complication arises, as might be expected, in estimating the autocorrelation parameter(s). The AR(1) model is the one most widely used and studied. The most common procedure is to begin FGLS with a natural estimator of p, the autocorrelation of the residuals. Since...

## The Rank And Order Conditions For Identification

It is useful to summarize what we have determined thus far. The unknown structural parameters consist of Y, an M x M symmetric positive definite matrix. The known, reduced-form parameters are n a K x M reduced-form coefficients matrix, & an M x M reduced-form covariance matrix. Simply counting parameters in the structure and reduced forms yields an excess of l M2 + KM + 1 M( M + 1) KM 2 M(M + 1) M2, which is, as might be expected from the earlier results, the number of unknown elements in T....

## Panel Data Models

Many recent studies have analyzed panel, or longitudinal, data sets. Two very famous ones are the National Longitudinal Survey of Labor Market Experience (NLS) and the Michigan Panel Study of Income Dynamics (PSID). In these data sets, very large cross sections, consisting of thousands of microunits, are followed through time, but the number of periods is often quite small. The PSID, for example, is a study of roughly 6,000 families and 15,000 individuals who have been interviewed periodically...

## Summary And Conclusions

This chapter has analyzed one form of the generalized regression model, the model of heteroscedasticity. We first considered least squares estimation. The primary result for 34White (1982a) gives some additional requirements for the true underlying density of et. Gourieroux, Monfort, and Trognon (1984) also consider the issue. Under the assumptions given, the expectations of the matrices in (18-27) and (18-32) remain the same as under normality. The consistency and asymptotic normality of the...

## Proby j ix 1eV j

74For a similar treatment in a continuous data application, see Cragg (1971). This formulation changes the probability of the zero outcome and scales the remaining probabilities so that the sum to one. It adds a new restriction that Prob(yi 0 x,) no longer depends on the covariates, however. Therefore, a natural next step is to parameterize this probability. Mullahey suggests some formulations and applies the model to a sample of observations on daily beverage consumption. Mullahey (1986),...

## Recursive Residuals And The Cusums Test

Example 7.6 shows a test of structural change based essentially on the model's ability to predict correctly outside the range of the observations used to estimate it. A similar logic underlies an alternative test of model stability proposed by Brown, Durbin, and Evans (1975) based on recursive residuals. The technique is appropriate for time-series data and might be used if one is uncertain about when a structural change might have taken place. The null hypothesis is that the coefficient vector...

## The F Statistic And The Least Squares Discrepancy

We now consider testing a set of J linear restrictions stated in the null hypothesis, Each row of R is the coefficients in a linear restriction on the coefficient vector. Typically, R will have only a few rows and numerous zeros in each row. Some examples would be as follows 1. One of the coefficients is zero, pj 0 R 0 0 1 0 0 and q 0. 2. Two of the coefficients are equal, pk Pj, R 0 0 1 -1 0 and q 0. 3. A set of the coefficients sum to one, p2 + p3 + p4 1, R 0 1 1 1 0 and q 1. 4. A subset of...

## Instrumental Variables Estimation Of The Random Effects Model

Recall the original specification of the linear model for panel data in (13-1) ytt _ x'u0 + z-a + sit. (13-35) The random effects model is based on the assumption that the unobserved person specific effects, zt, are uncorrelated with the included variables, xit. This assumption is a major shortcoming of the model. However, the random effects treatment does allow the model to contain observed time invariant characteristics, such as demographic characteristics, while the fixed effects model does...

## The Independence From Irrelevant Alternatives

We noted earlier that the odds ratios in the multinomial logit or conditional logit models are independent of the other alternatives. This property is convenient as regards estimation, but it is not a particularly appealing restriction to place on consumer behavior. The property of the logit model whereby Pj Pk is independent of the remaining probabilities is called the independence from irrelevant alternatives (IIA). The independence assumption follows from the initial assumption that the...

## Common Factor Restrictions

The preceding discussion suggests that evidence of autocorrelation in a time-series regression model might signal more than merely a need to use generalized least squares to make efficient use of the data. See Hendry (1993). If we find evidence of autocorrelation based, say, on the Durbin-Watson statistic or on Durbin's h statistic, then it would make sense to test the hypothesis of the AR(1) model that might normally be the next step against the alternative possibility that the model is merely...

## Semiparametric Estimation

The fully parametric probit and logit models remain by far the mainstays of empirical research on binary choice. Fully nonparametric discrete choice models are fairly exotic and have made only limited inroads in the literature, and much of that literature is theoretical e.g., Matzkin (1993) . The primary obstacle to application is their paucity of interpretable results. (See Example 21.9.) Of course, one could argue on this basis that the firm results produced by the fully parametric models are...

## Info

24 CHAPTER 3 Least Squares The solution is b X'X -1X'y -o.5o9o7, -o.o1658, o.67o38, -o.oo2326, -o.oooo94o1 '. 3.2.3 ALGEBRAIC ASPECTS OF THE LEAST SQUARES SOLUTION X'Xb - X'y -X' y - Xb -X'e 0. 3-12 Hence, for every column xk of X, xke o. If the first column of X is a column of 1s, then there are three implications. 1. The least squares residuals sum to zero. This implication follows from x e i'e 2. The regression hyperplane passes through the point of means of the data. The first normal...

## Treatment Effects

The basic model of selectivity outlined earlier has been extended in an impressive variety of directions.27 An interesting application that has found wide use is the measurement of treatment effects and program effectiveness.28 An earnings equation that accounts for the value of a college education is where Ci is a dummy variable indicating whether or not the individual attended college. The same format has been used in any number of other analyses of programs, experiments, and treatments. The...

## Integrated Processes And Differencing

A process that figures prominently in recent work is the random walk with drift, That is, yt is the simple sum of what will eventually be an infinite number of random variables, possibly with nonzero mean. If the innovations are being generated by the same zero-mean, constant-variance distribution, then the variance of yt would obviously be infinite. As such, the random walk is clearly a nonstationary process, even if m equals zero. On the other hand, the first difference of yt, is simply the...

## Exercises

For the regression model y a fix e, a. Show that the least squares normal equations imply Viei 0 and Vixiei 0. b. Show that the solution for the constant term is a y - bx. c. Show that the solution for b is b EL x - x yi - y E 1 xi - x 2 . d. Prove that these two values uniquely minimize the sum of squares by showing that the diagonal elements of the second derivatives matrix of the sum of squares with respect to the parameters are both positive and that the...

## Censoring And Truncation In Models For Counts

Truncation and censoring are relatively common in applications of models for counts see Section 21.9 . Truncation often arises as a consequence of discarding what appear to be unusable data, such as the zero values in survey data on the number of uses of recreation facilities Shaw 1988 and Bockstael et al. 1990 . The zero values in this setting might represent a discrete decision not to visit the site, which is a qualitatively different decision from the positive number for someone who had...

## Testing For Overdispersion

The Poisson model has been criticized because of its implicit assumption that the variance of yi equals its mean. Many extensions of the Poisson model that relax this assumption have been proposed by Hausman, Hall, and Griliches 1984 , McCullagh and Nelder 1983 , and Cameron and Trivedi 1986 , to name but a few. The first step in this extended analysis is usually a test for overdispersion in the context of the simple model. A number of authors have devised tests for overdispersion within the...

## Incidental Truncation In A Bivariate Distribution

Suppose that y and z have a bivariate distribution with correlation p. We are interested in the distribution of y given that z exceeds a particular value. Intuition suggests that if y and z are positively correlated, then the truncation of z should push the distribution of y to the right. As before, we are interested in 1 the form of the incidentally truncated distribution and 2 the mean and variance of the incidentally truncated random variable. Since it has dominated the empirical literature,...

## Some Issues In Specification

Two issues that commonly arise in microeconomic data, heteroscedasticity and nonnor-mality, have been analyzed at length in the tobit setting.13 Maddala and Nelson 1975 , Hurd 1979 , Arabmazar and Schmidt 1982a,b , and Brown and Moffitt 1982 all have varying degrees of pessimism regarding how inconsistent the maximum likelihood estimator will be when heteroscedasticity occurs. Not surprisingly, the degree of censoring is the primary determinant. Unfortunately, all the analyses have been carried...

## The Censored Normal Distribution

The relevant distribution theory for a censored variable is similar to that for a truncated one. Once again, we begin with the normal distribution, as much of the received work has been based on an assumption of normality. We also assume that the censoring point is zero, although this is only a convenient normalization. In a truncated distribution, only the part of distribution above y 0 is relevant to our computations. To make the distribution integrate to one, we scale it up by the...

## Restrictions And Nested Models

One common approach to testing a hypothesis is to formulate a statistical model that contains the hypothesis as a restriction on its parameters. A theory is said to have testable implications if it implies some testable restrictions on the model. Consider, for example, a simple model of investment, It, suggested by Section 3.3.2, ln It Pi frit fo Apt Pa ln Yt Pst e,, 6-1 which states that investors are sensitive to nominal interest rates, it, the rate of inflation, Apt, the log of real output,...

## Minimum Mean Squared Error Predictor

As an alternative approach, consider the problem of finding an optimal linear predictor for y. Once again, ignore Assumption A6 and, in addition, drop Assumption A1 that the conditional mean function, E y x is linear. For the criterion, we will use the mean squared error rule, so we seek the minimum mean squared error linear predictor of y, which we'll denote x'y. The expected squared error of this predictor is MSE Ey,x y - E y x 2 Ey,x E y x - x'y 2. We seek the y that minimizes this...

## Nonnormal Disturbances And Large Sample Tests

The distributions of the F, t, and chi-squared statistics that we used in the previous section rely on the assumption of normally distributed disturbances. Without this assumption, 7This case is not true when the restrictions are nonlinear. We consider this issue in Chapter 9. the exact distributions of these statistics depend on the data and the parameters and are not F, t, and chi-squared. At least at first blush, it would seem that we need either a new set of critical values for the tests or...

## The Goldfeldquandt Test

By narrowing our focus somewhat, we can obtain a more powerful test. Two tests that are relatively general are the Goldfeld-Quandt 1965 test and the Breusch-Pagan 1979 Lagrange multiplier test. For the Goldfeld-Quandt test, we assume that the observations can be divided into two groups in such a way that under the hypothesis of homoscedasticity, the disturbance variances would be the same in the two groups, whereas under the alternative, the disturbance variances would differ systematically....

## Partitioned Regression And Partial Regression

It is common to specify a multiple regression model when, in fact, interest centers on only one or a subset of the full set of variables. Consider the earnings equation discussed in Example 2.2. Although we are primarily interested in the association of earnings and education, age is, of necessity, included in the model. The question we consider here is what computations are involved in obtaining, in isolation, the coefficients of a subset of the variables in a multiple regression for example,...

## The Least Squares Coefficient Vector

The least squares coefficient vector minimizes the sum of squared residuals where b0 denotes the choice for the coefficient vector. In matrix terms, minimizing the sum of squares in 3-1 requires us to choose b0 to Minimizebo S bo e0eo y - Xbo ' y - Xbo . 3-2 eoeo y'y - box'y - y'Xbo boX'Xbo 3-3 S bo y'y - 2y'Xbo boX'Xbo. The necessary condition for a minimum is 1 We shall have to establish that the practical approach of fitting the line as closely as possible to the data by least squares leads...

## Testing Nonlinear Restrictions

The preceding discussion has relied heavily on the linearity of the regression model. When we analyze nonlinear functions of the parameters and nonlinear regression models, most of these exact distributional results no longer hold. The general problem is that of testing a hypothesis that involves a nonlinear function of the regression coefficients We shall look first at the case of a single restriction. The more general one, in which c f q is a set of restrictions, is a simple extension. The...

## Theorem 44 Independence of b and s2

If e is normally distributed, then the least squares coefficient estimator b is statistically independent of the residual vector e and therefore, all functions of e, including s2. t bk - M VoW bk - Pk k n - K s2 a2 n - K VsW ' has a t distribution with n - K degrees of freedom.2 We can use tk to test hypotheses or form confidence intervals about the individual elements of . A common test is whether a parameter pk is significantly different from zero. The appropriate test statistic

## The Population Orthogonality Conditions

Let x denote the vector of independent variables in the population regression model and for the moment, based on assumption A5, the data may be stochastic or nonstochastic. Assumption A3 states that the disturbances in the population are stochastically orthogonal to the independent variables in the model that is, E e x 0. It follows that Cov x, e 0. Since by the law of iterated expectations Theorem B.l Ex E e x E e 0, we may write this as The right-hand side is not a function of y so the...

## PUm n XXpm n Bxr2Q

Where Q0 is a positive definite matrix. To establish coZTstency of b in the linear model, we required plim 1 n X'e 0. We will use the counterpart to this for the pseudoregressors This is the orthogonality condition noted earlier in 5-4 . In particular, note that orthogonality of the disturbances and the data is not the same condition. Finally, asymptotic normality can be established under general conditions if With these in hand, the asymptotic properties of the nonlinear least squares...

## Partial Regression And Partial Correlation Coefficients

The use of multiple regression involves a conceptual experiment that we might not be able to carry out in practice, the ceteris paribus analysis familiar in economics. To pursue Example 2.2, a regression equation relating earnings to age and education enables us to do the conceptual experiment of comparing the earnings of two individuals of the same age with different education levels, even if the sample contains no such pair of individuals. It is this characteristic of the regression that is...