Discussion: Estimating Models Using Dummy Variables
Nonconstant Error Variance
In: Regression Diagnostics
By: John Fox
Pub. Date: 2011
Access Date: April 26, 2021
Publishing Company: SAGE Publications, Inc.
City: Thousand Oaks
Print ISBN: 9780803939714
Online ISBN: 9781412985604
DOI: https://dx.doi.org/10.4135/9781412985604
Print pages: 49-53
© 1991 SAGE Publications, Inc. All Rights Reserved.
This PDF has been generated from SAGE Research Methods. Please note that the pagination of the
online version will vary from the pagination of the print book.
Nonconstant Error Variance
Detecting Nonconstant Error Variance
One of the assumptions of the regression model is that the variation of the dependent variable around the
regression surface—the error variance—is everywhere the same: V(∊) = V(y|x1, …, xk) = σ2. Nonconstant
error variance is often termed “heteroscedasticity.” Although the least-squares estimator is unbiased and
consistent even when the error variance is not constant, its efficiency is impaired and the usual formulas for
coefficient standard errors are inaccurate, the degree of the problem depending on the degree to which error
variances differ. I describe graphical methods for detecting nonconstant error variance in this chapter. Tests
for heteroscedasticity are discussed in Chapter 8 on discrete data and in Chapter 9 on maximum-likelihood
methods.
Because the regression surface is k dimensional, and imbedded in a space of k + 1 dimensions, it is generally
impractical to assess the assumption of constant error variance by direct graphical examination of the data
for k larger than 1 or 2. Nevertheless, it is common for error variance to increase as the expectation of y
grows larger, or there may be a systematic relationship between error variance and a particular x. The former
situation can be detected by plotting residuals against fitted values, and the latter by plotting residuals against
each x. It is worth noting that plotting residuals against y (as opposed to ŷ) is generally unsatisfactory. The plot
will be tilted: There is a built-in linear correlation between e and y, because y = ŷ + e; in fact, the correlation
between y and e is r(y, e) = . In contrast, the least-squares fit insures that r(ŷ, e) = 0, producing a plot
that is much easier to examine for evidence of nonconstant spread.
Because the least-squares residuals have unequal variances even when the errors have constant variance,
I prefer plotting the studentized residuals against fitted values. Finally, a pattern of changing spread is often
more easily discerned in a plot of |ti| or ti 2 versus ŷ, perhaps augmented by a lowess scatterplot smooth (see
Appendix A6.1); smoothing this plot is particularly useful when the sample size is very large or the distribution
of ŷ is very uneven. An example appears in Figure 6.2.
An illustrative plot of studentized residuals against fitted values is shown in Figure 6.1a. In Figure 6.1b,
studentized residuals are plotted against log2(3 + ŷ); by correcting the positive skew in these ŷ values,
the second plot makes it easier to discern the tendency of the residual spread to increase with ŷ. The
data for this example are drawn from work by Ornstein (1976) on interlocking directorates among the 248
largest Canadian firms. The number of interlocking directorate and executive positions maintained by the
corporations is regressed on corporate assets (square-root transformed to make the relationship linear; see
Chapter 7); 9 dummy variables representing 10 industrial classes, with heavy manufacturing serving as the
baseline category; and 3 dummy variables representing 4 nations of control, with Canada as the baseline
category. The results of the regression are given in the left-hand columns of Table 6.1; the results shown on
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 2 of 6 Nonconstant Error Variance
the right of the table are discussed below. Note that part of the tendency for the residual scatter to increase
with ŷ is due to the lower bound of 0 for y: Because e = y - ŷ, the smallest possible residual corresponding to
a particular ŷ value is e = 0 - ŷ = -ŷ.
Figure 6.1. Plots of studentized residuals versus fitted values for Ornstein's interlocking-directorate
regression. (a) t versus ŷ. (b) t versus log2(3 + ŷ). The log transformation serves to reduce the skew
of the fitted values, making the increasing residual spread easier to discern.
Correcting Nonconstant Error Variance
Transformations frequently serve to correct a tendency of the error variance to increase or, less commonly,
decrease with the magnitude of the dependent variable: Move y down the ladder of powers and roots if
the residual scatter broadens with the fitted values; move y up the ladder if the residual scatter narrows.
An effective transformation may be selected by trial and error (but see Chapter 9 for an analytic method of
selecting a variance-stabilizing transformation).
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 3 of 6 Nonconstant Error Variance
TABLE 6.1 Regression of Number of Interlocking Directorate and Executive Positions Maintained by
248 Major Canadian Corporations on Corporate Assets, Sector, and Nation of Control
If the error variance is proportional to a particular x, or if the pattern of V(∊i) is otherwise known up to
a constant of proportionality, then an alternative to transformation of y is weighted-least-squares (WLS)
estimation (see Appendix A6.2). It also is possible to correct the estimated standard errors of the least-
squares coefficients for heteroscedasticity: A method proposed by White (1980) is described in Appendix
A6.3. An advantage of this approach is that knowledge of the pattern of nonconstant error variance (e.g.,
increased variance with the level of y or with an x) is not required. If, however, the heteroscedasticity problem
is severe, and the corrected standard errors therefore are substantially larger than those produced by the
usual formula, then discovering the pattern of nonconstant variance and correcting for it—by a transformation
or WLS estimation—offers the possibility of more efficient estimation. In any event, unequal error variances
are worth correcting only when the problem is extreme—where, for example, the spread of the errors varies
by a factor of about three or more (i.e., the error variance varies by a factor of about 10 or more; see Appendix
A6.4).
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 4 of 6 Nonconstant Error Variance
Figure 6.2. Plot of absolute studentized residuals versus fitted values for the square-root transformed
interlocking-directorate data. The line on the graph is a lowess smooth using f = 0.5 and 2 robustness
iterations.
For Ornstein's interlocking-directorate regression, for example, a square-root transformation appears to
correct the dependence of the residual spread on the level of the dependent variable. A plot of |ti| versus ŷi
for the transformed data is given in Figure 6.2, and the regression results appear in the right-hand columns of
Table 6.1. The lowess smooth in Figure 6.2 (see Appendix A6.1) shows little change in the average absolute
studentized residuals as the fitted values increase.
The coefficients for the original and transformed regressions in Table 6.1 cannot be compared directly,
because the scale of the dependent variable has been altered. It is clear, however, that assets retain their
positive effect and that the nations of control maintain their ranks. The sectoral ranks are also similar
across the two analyses, although not identical. In comparing the two sets of results, recall that the baseline
categories for the sets of dummy regressors—Canada and heavy manufacturing—implicitly have coefficients
of zero.
Transforming y also changes the shape of the error distribution and alters the shape of the regression of y
on the xs. It is frequently the case that producing constant residual variation through a transformation also
makes the distribution of the residuals more symmetric. At times, eliminating nonconstant spread also makes
the relationship of y to the xs more nearly linear (see the next chapter). These by-products are not necessary
consequences of correcting nonconstant error variance, however, and it is particularly important to check data
for non-linearity following a transformation of y. Of course, because there generally is no reason to suppose
that the regression is linear prior to transforming y, we should check for nonlinearity even when y is not
transformed.
Finally, nonconstant residual spread sometimes is evidence for the omission of important effects from the
model. Suppose, for example, that there is an omitted categorical independent variable, such as regional
location, that interacts with assets in affecting interlocks; in particular, the assets slope, although positive in
every region, is steeper in some regions than in others. Then the omission of region and its interactions with
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 5 of 6 Nonconstant Error Variance
assets could produce a fan-shaped residual plot even if the errors from the correct model have constant
spread. The detection of this type of specification error therefore requires substantive insight into the process
generating the data and cannot rely on diagnostics alone.
http://dx.doi.org/10.4135/9781412985604.n6
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 6 of 6 Nonconstant Error Variance
- Nonconstant Error Variance
- In: Regression Diagnostics