A regression like (2.34), in which the regressors are broken up into two groups, can arise in many situations. In this section, we will study three of these. The first two, seasonal dummy variables and time trends, are obvious applications of the FWL Theorem. The third, measures of goodness of fit that take the constant term into account, is somewhat less obvious. In all cases, the FWL Theorem allows us to obtain explicit expressions based on (2.42) for subsets of the parameter estimates of a linear regression.

For a variety of reasons, it is sometimes desirable to include among the explanatory variables of a regression model variables that can take on only two possible values, which are usually 0 and 1. Such variables are called indicator variables, because they indicate a subset of the observations, namely, those for which the value of the variable is 1. Indicator variables are a special case of dummy variables, which can take on more than two possible values.

Seasonal variation provides a good reason to employ dummy variables. It is common for economic data that are indexed by time to take the form of quarterly data, where each year in the sample period is represented by four observations, one for each quarter, or season, of the year. Many economic activities are strongly affected by the season, for obvious reasons like Christmas shopping, or summer holidays, or the difficulty of doing outdoor work during very cold weather. This seasonal variation, or seasonality, in economic activity is likely to be reflected in the economic time series that are used in regression models. The term "time series" is used to refer to any variable the observations of which are indexed by the time. Of course, time-series data are sometimes annual, in which case there is no seasonal variation to worry about, and sometimes monthly, in which case there are twelve "seasons" instead of four. For simplicity, we consider only the case of quarterly data.

Since there are four seasons, there may be four seasonal dummy variables, each taking the value 1 for just one of the four seasons. Let us denote these variables as s1, s2, S3, and s4. If we consider a sample the first observation of which corresponds to the first quarter of some year, these variables look like

1 |
0 |
0 |
0 | |||

0 |
1 |
0 |
0 | |||

0 |
0 |
1 |
0 | |||

0 |
0 |
0 |
1 | |||

1 |
, S2 = |
0 |
, S3 = |
0 |
, S4 = |
0 |

0 |
1 |
0 |
0 | |||

0 |
0 |
1 |
0 | |||

0 |
0 |
0 |
An important property of these variables is that, since every observation must correspond to some season, the sum of the seasonal dummies must indicate every season. This means that this sum is a vector every component of which equals 1. Algebraically, as is clear from (2.47). Since i represents the constant in a regression, (2.48) means that the five-variable set consisting of all four seasonal dummies plus the constant is linearly dependent. Consequently, one of the five variables must be dropped if all the regressors are to be linearly independent. Just which one of the five is dropped makes no difference to the fitted values and residuals of a regression, because it is easy to check that S(S1, S2, S3, S4) = S(i, S2, S3, S4) = S(i, S1, S3, S4), and so on. However the parameter estimates associated with the set of four variables that we choose to keep have different interpretations depending on that choice. Suppose first that we drop the constant and run the regression y = «1S1 + «2 S2 + «3 S3 + «4 S4 + Xfi + u, (2.49) where the n x k matrix X contains other explanatory variables. Consider a single observation, indexed by t, that corresponds to the first season. The tth observations of s2, S3, and s4 are all 0, and that of s1 is 1. Thus, if we write out the tth observation of (2.49), we get yt = ai + Xt( + ut. From this it is clear that, for all t belonging to the first season, the constant term in the regression is ai . If we repeat this exercise for t in the second, third, or fourth season, we see at once that ai is the constant for season i. Thus the introduction of the seasonal dummies gives us a different constant for every season. An alternative is to retain the constant and drop s1 . This yields y = ao i + 72 S2 + 73 S3 + 74 S4 + X( + u. It is clear that, in this specification, the overall constant a0 is really the constant for season 1. For an observation belonging to season 2, the constant is a0 + y2, for an observation belonging to season 3, it is a0 + y3, and so on. The easiest way to interpret this is to think of season 1 as the reference season. The coefficients 7^ i = 2,3,4, measure the difference between a0, the constant for the reference season, and the constant for season i. Since we could have dropped any of the seasonal dummies, the reference season is, of course, entirely arbitrary. Another alternative is to retain the constant and use the three dummy variables defined by si = Si — S4, s2 = S2 — S4, s3 = S3 — S4. (2.50) These new dummy variables are not actually indicator variables, because their components for season 4 are equal to —1, but they have the advantage that, for each complete year, the sum of their components for that year is 0. Thus, for any sample whose size is a multiple of 4, each of the Si, i = 1,2,3, is orthogonal to the constant. We can write the regression as y = ¿01 + ¿S + 82 s2 + ¿3 s3 + X( + u. (2.51) It is easy to see that, for t in season i, i = 1,2,3, the constant term is ¿0 + ¿i. For t belonging to season 4, it is ¿0 — ¿i — ¿2 — ¿3. Thus the average of the constants for all four seasons is just ¿0, the coefficient of the constant, i. Accordingly, the ¿i, i = 1, 2, 3, measure the difference between the average constant ¿0 and the constant specific to season i. Season 4 is a bit of a mess, because of the arithmetic needed to ensure that the average does indeed work out to ¿0. Let S denote whatever n x 4 matrix we choose to use in order to span the constant and the four seasonal variables si. Then any of the regressions we have considered so far can be written as This regression has two groups of regressors, as required for the application of the FWL Theorem. That theorem implies that the estimates / and the residuals u can also be obtained by running the FWL regression where, as the notation suggests, MS = I — S(STS) 1ST. The effect of the projection MS on y and on the explanatory variables in the matrix X can be considered as a form of seasonal adjustment. By making MS y orthogonal to all the seasonal variables, we are, in effect, purging it of its seasonal variation. Consequently, MSy can be called a seasonally adjusted, or deseasonalized, version of y, and similarly for the explanatory variables. In practice, such seasonally adjusted variables can be conveniently obtained as the residuals from regressing y and each of the columns of X on the variables in S. The FWL Theorem tells us that we get the same results in terms of estimates of / and residuals whether we run (2.52), in which the variables are unadjusted and seasonality is explicitly accounted for, or run (2.53), in which all the variables are seasonally adjusted by regression. This was, in fact, the subject of the famous paper by Lovell (1963). The equivalence of (2.52) and (2.53) is sometimes used to claim that, in estimating a regression model with time-series data, it does not matter whether one uses "raw" data, along with seasonal dummies, or seasonally adjusted data. Such a conclusion is completely unwarranted. Official seasonal adjustment procedures are almost never based on regression; using official seasonally adjusted data is therefore not equivalent to using residuals from regression on a set of seasonal variables. Moreover, if (2.52) is not a sensible model (and it would not be if, for example, the seasonal pattern were more complicated than that given by Sa), then (2.53) is not a sensible specification either. Seasonality is actually an important practical problem in applied work with time-series data. We will discuss it further in Chapter 13. For more detailed treatments, see Hylleberg (1986, 1992) and Ghysels and Osborn (2001). The deseasonalization performed by the projection MS makes all variables orthogonal to the constant as well as to the seasonal dummies. Thus the effect of MS is not only to deseasonalize, but also to center, the variables on which it acts. Sometimes this is undesirable; if so, we may use the three variables si given in (2.50). Since they are themselves orthogonal to the constant, no centering takes place if only these three variables are used for seasonal adjustment. An explicit constant should normally be included in any regression that uses variables seasonally adjusted in this way. |

Was this article helpful?

## Post a comment