(3.22) Salest = Yt = a + bPt + cPSEt + dADt + ut where Yt is the number of contracts sold, Pt is the average contract price per month, PSEt is personal selling expenses, ADt is advertising expenditures, and ut is a random disturbance term—all measured on a monthly basis over the past year.
When this linear regression model is estimated over the EDP data, the following regression equation is estimated:
Salest = 169.0 - 0.046Pt + 0.005PSEt + 0.002ADt (3.97) (-6.77) (5.69) (2.72)
where Pt is price, PSEt is selling expense, ADt is advertising, and t statistics are indicated within parentheses. The standard error of the estimate, or SEE, is 11.2 units, the coefficient of determination or R2 = 96.6 percent, the adjusted coefficient of determination is R2 = 95.3 percent, and the relevant F statistic = 76.17.
How might the values of these coefficient estimates be interpreted? To begin, the intercept term a = 169.0 has no economic meaning. Caution must always be exercised when interpreting points outside the range of observed data and this intercept, like most, lies far from typical values. This intercept cannot be interpreted as the expected level of sales at a zero price and assuming both personal selling expenses and advertising are completely eliminated. Similarly, it would be hazardous to use this regression model to predict sales at prices, selling expenses, or advertising levels well in excess of sample norms.
Slope coefficients provide estimates of the change in sales that might be expected following a one-unit increase in price, selling expenses, or advertising expenditures. In this example, sales are measured in units, and each independent variable is measured in dollars. Therefore, a $1 increase in price can be expected to lead to a 0.046-unit reduction in sales volume per month. Similarly, a $1 increase in selling expenses can be expected to lead to a 0.005-unit increase in sales; a $1 increase in advertising can be expected to lead to a 0.002-unit increase in sales. In each instance, the effect of independent X variables appears quite consistent over the entire sample. The t statistics for both price and selling expenses exceed a value of three.3 The chance of observing such high t statistics when in fact no relation exists between sales and these X variables is less than 1 percent. Though less strong, the link between sales and advertising expenditures is also noteworthy. The t statistic for advertising exceeds the value of two, meaning that there can be 95 percent confidence that advertising has an effect on sales. The chance of observing such a high t statistic for advertising expenditures when in fact advertising has no effect on sales is less than 5 percent. Again, caution must be used when interpreting these individual regression coefficients. It is important not to extend the analysis beyond the range of data used to estimate the regression coefficients.
The standard error of the estimate or SEE of 11.2 units can be used to construct a confidence interval within which actual values are likely to be found based on the size of individual regression coefficients and various values for the X variables. For example, given this regression model and values of Pt = $3,200, PSEt = $18,750, and ADt = $22,500 for the independent X variables, the fitted value Yt = 170.76 can be calculated (see Table 3.5). Given these values for the independent X variables, 95 percent of the time actual observations will lie within roughly two standard errors of the estimate; 99 percent of the time actual observations will lie within roughly three standard errors of the estimate. Thus, the bounds for the 95 percent confidence interval are given by the expression 170.76 ± (2 X 11.2), or from 148.36 to 193.16 units. Bounds for the 99 percent confidence interval are given by the expression 170.76 ± (3 X 11.2), or from 137.16 to 204.36. units.
Finally, the coefficient of determination R2 = 96.6 percent and indicates the share of variation in EDP demand explained by the regression model. Only 3.4 percent is left unexplained. Moreover, the adjusted coefficient of determination is R2 = 95.3% percent and reflects only a modest downward adjustment to R2 based on the size of the sample analyzed relative to the number of estimated coefficients. This suggests that the regression model explains a significant share of demand variation—a suggestion that is supported by the F statistic. F38 = 76.17 and is far greater than five, meaning that the hypothesis of no relation between sales and this group of independent X variables can be rejected with 99 percent confidence. There is less than a 1 percent chance of encountering such a large F statistic when in fact there is no relation between sales and these X variables as a group.
This chapter introduces various methods for characterizing central tendency and dispersion throughout samples and populations of data. An understanding of these statistics is a necessary prelude to the detailed examination of the highly useful regression analysis technique for the study of statistical relations.
• Summary and descriptive measures of the overall population, called population parameters, are seldom known and must typically be estimated. The most effective means for doing so is to rely on sample statistics, or summary and descriptive measures that describe a representative sample.
• Useful measures of central tendency include the arithmetic mean or average, median or "middle" observation, and mode or most frequently encountered value in the sample. If the data are perfectly balanced or symmetrical, then measures of central tendency will converge on a single typical value. Otherwise, skewness and a lack of symmetry in sample dispersion is implied.
3 The t statistics for both price and selling expenses exceed 3.355, the precise critical t value for the a = 0.01 level and n - k = 12 - 4 = 8 degrees of freedom. The t statistic for advertising exceeds 2.306, the critical t value for the a = 0.05 level and 8 degrees of freedom, meaning that there can be 95 percent confidence that advertising has an effect on sales. Note also that F38 = 76.17 > 7.58, the precise critical F value for the a = 0.01 significance level.
• Commonly employed measures of dispersion include the range, or the difference between the largest and smallest sample observations; variance, or average squared deviation from the mean; and standard deviation, or square root of the variance. The standard deviation measures dispersion in the same units as the underlying data. The coefficient of variation compares the standard deviation to the mean in an attractive relative measure of dispersion. The coefficient of determination shows the share of variation in Y that is explained by the regression model.
• A hypothesis test is a statistical experiment used to measure the reasonableness of a given theory or premise. Type I error is the incorrect rejection of a true hypothesis; Type II error is the failure to reject a false hypothesis. The z statistic is a test statistic that is normally distributed with a mean of zero and a standard deviation of one. A t statistic has the same distribution for large samples, but is approximately normal over small samples. Critical t values are adjusted upward as sample size is reduced, depending on degrees of freedom, or the number of observations beyond the absolute minimum required to calculate the statistic.
• A deterministic relation is one that is known with certainty. A statistical relation exists if the average of one variable is related to another, but it is impossible to predict with certainty the value of one based on the value of another.
• A time series of data is a daily, weekly, monthly, or annual sequence of economic data. A cross section of data is a group of observations on an important economic variable at any given point in time.
• A scatter diagram is a plot of data where the dependent variable is plotted on the vertical or Y-axis, and the independent variable is plotted on the horizontal or X-axis.
• The most common specification for economic relations is a linear model, or straight-line relation, where the marginal effect of each X variable on Y is constant. Another common regression model form is the multiplicative model, or log-liner relation, used when the marginal effect of each independent variable is thought to depend on the value of all independent variables in the regression equation.
• A simple regression model involves only one dependent Y variable and one independent X variable. A multiple regression model also entails one Y variable, but includes two or more X variables.
• The standard error of the estimate, or SEE, measures the standard deviation of the dependent Y variable after controlling for the influence of all X variables.
• In a simple regression model with only one independent variable, the correlation coefficient, r, measures goodness of fit. The coefficient of determination, or R2, shows how well a multiple regression model explains changes in the value of the dependent Y variable.
• The F statistic provides evidence on whether or not a statistically significant share of variation in the dependent Y variable has been explained by all the X variables. T statistics are used to measure the significance of the relation between a dependent Y variable and a given X variable.
Methods examined in this chapter are commonly employed by both large and small corporations and other organizations in their ongoing statistical analysis of economic relations.
Given the continuing rise in both the diversity and complexity of the economic environment, the use of such tools is certain to grow in the years ahead.
Was this article helpful?
Don't Blame Us If You End Up Enjoying Your Retired Life Like None Of Your Other Retired Friends. Already Freaked-Out About Your Retirement? Not Having Any Idea As To How You Should Be Planning For It? Started To Doubt If Your Later Years Would Really Be As Golden As They Promised? Fret Not Right Guidance Is Just Around The Corner.