Even in the absence of multicollinearity or other data problems, it is worthwhile to examine one's data closely for two reasons. First, the identification of outliers in the data is useful, particularly in relatively small cross sections in which the identity and perhaps even the ultimate source of the data point may be known. Second, it may be possible to ascertain which, if any, particular observations are especially influential in the results obtained. As such, the identification of these data points may call for further study. It is worth emphasizing, though, that there is a certain danger in singling out particular observations for scrutiny or even elimination from the sample on the basis of statistical results that are based on those data. At the extreme, this step may invalidate the usual inference procedures.
Of particular importance in this analysis is the projection matrix or hat matrix:
This matrix appeared earlier as the matrix that projects any n x 1 vector into the column space of X. For any vector y, Py is the set of fitted values in the least squares regression of y on X. The least squares residuals are e = My = Me = (I - P)e, so the covariance matrix for the least squares residual vector is
To identify which residuals are significantly large, we first standardize them by dividing
7Afifi and Elashoff (1966, 1967) and Haitovsky (1968). Griliches (1986) considers a number of other possibilities.
Was this article helpful?