Discussion: Estimating Models Using Dummy Variables

profileReeb79
regression-diagnostics.pdf

Nonconstant Error Variance

In: Regression Diagnostics

By: John Fox

Pub. Date: 2011

Access Date: April 26, 2021

Publishing Company: SAGE Publications, Inc.

City: Thousand Oaks

Print ISBN: 9780803939714

Online ISBN: 9781412985604

DOI: https://dx.doi.org/10.4135/9781412985604

Print pages: 49-53

© 1991 SAGE Publications, Inc. All Rights Reserved.

This PDF has been generated from SAGE Research Methods. Please note that the pagination of the

online version will vary from the pagination of the print book.

Nonconstant Error Variance

Detecting Nonconstant Error Variance

One of the assumptions of the regression model is that the variation of the dependent variable around the

regression surface—the error variance—is everywhere the same: V(∊) = V(y|x1, …, xk) = σ2. Nonconstant

error variance is often termed “heteroscedasticity.” Although the least-squares estimator is unbiased and

consistent even when the error variance is not constant, its efficiency is impaired and the usual formulas for

coefficient standard errors are inaccurate, the degree of the problem depending on the degree to which error

variances differ. I describe graphical methods for detecting nonconstant error variance in this chapter. Tests

for heteroscedasticity are discussed in Chapter 8 on discrete data and in Chapter 9 on maximum-likelihood

methods.

Because the regression surface is k dimensional, and imbedded in a space of k + 1 dimensions, it is generally

impractical to assess the assumption of constant error variance by direct graphical examination of the data

for k larger than 1 or 2. Nevertheless, it is common for error variance to increase as the expectation of y

grows larger, or there may be a systematic relationship between error variance and a particular x. The former

situation can be detected by plotting residuals against fitted values, and the latter by plotting residuals against

each x. It is worth noting that plotting residuals against y (as opposed to ŷ) is generally unsatisfactory. The plot

will be tilted: There is a built-in linear correlation between e and y, because y = ŷ + e; in fact, the correlation

between y and e is r(y, e) = . In contrast, the least-squares fit insures that r(ŷ, e) = 0, producing a plot

that is much easier to examine for evidence of nonconstant spread.

Because the least-squares residuals have unequal variances even when the errors have constant variance,

I prefer plotting the studentized residuals against fitted values. Finally, a pattern of changing spread is often

more easily discerned in a plot of |ti| or ti 2 versus ŷ, perhaps augmented by a lowess scatterplot smooth (see

Appendix A6.1); smoothing this plot is particularly useful when the sample size is very large or the distribution

of ŷ is very uneven. An example appears in Figure 6.2.

An illustrative plot of studentized residuals against fitted values is shown in Figure 6.1a. In Figure 6.1b,

studentized residuals are plotted against log2(3 + ŷ); by correcting the positive skew in these ŷ values,

the second plot makes it easier to discern the tendency of the residual spread to increase with ŷ. The

data for this example are drawn from work by Ornstein (1976) on interlocking directorates among the 248

largest Canadian firms. The number of interlocking directorate and executive positions maintained by the

corporations is regressed on corporate assets (square-root transformed to make the relationship linear; see

Chapter 7); 9 dummy variables representing 10 industrial classes, with heavy manufacturing serving as the

baseline category; and 3 dummy variables representing 4 nations of control, with Canada as the baseline

category. The results of the regression are given in the left-hand columns of Table 6.1; the results shown on

SAGE

1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 2 of 6 Nonconstant Error Variance

the right of the table are discussed below. Note that part of the tendency for the residual scatter to increase

with ŷ is due to the lower bound of 0 for y: Because e = y - ŷ, the smallest possible residual corresponding to

a particular ŷ value is e = 0 - ŷ = -ŷ.

Figure 6.1. Plots of studentized residuals versus fitted values for Ornstein's interlocking-directorate

regression. (a) t versus ŷ. (b) t versus log2(3 + ŷ). The log transformation serves to reduce the skew

of the fitted values, making the increasing residual spread easier to discern.

Correcting Nonconstant Error Variance

Transformations frequently serve to correct a tendency of the error variance to increase or, less commonly,

decrease with the magnitude of the dependent variable: Move y down the ladder of powers and roots if

the residual scatter broadens with the fitted values; move y up the ladder if the residual scatter narrows.

An effective transformation may be selected by trial and error (but see Chapter 9 for an analytic method of

selecting a variance-stabilizing transformation).

SAGE

1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 3 of 6 Nonconstant Error Variance

TABLE 6.1 Regression of Number of Interlocking Directorate and Executive Positions Maintained by

248 Major Canadian Corporations on Corporate Assets, Sector, and Nation of Control

If the error variance is proportional to a particular x, or if the pattern of V(∊i) is otherwise known up to

a constant of proportionality, then an alternative to transformation of y is weighted-least-squares (WLS)

estimation (see Appendix A6.2). It also is possible to correct the estimated standard errors of the least-

squares coefficients for heteroscedasticity: A method proposed by White (1980) is described in Appendix

A6.3. An advantage of this approach is that knowledge of the pattern of nonconstant error variance (e.g.,

increased variance with the level of y or with an x) is not required. If, however, the heteroscedasticity problem

is severe, and the corrected standard errors therefore are substantially larger than those produced by the

usual formula, then discovering the pattern of nonconstant variance and correcting for it—by a transformation

or WLS estimation—offers the possibility of more efficient estimation. In any event, unequal error variances

are worth correcting only when the problem is extreme—where, for example, the spread of the errors varies

by a factor of about three or more (i.e., the error variance varies by a factor of about 10 or more; see Appendix

A6.4).

SAGE

1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 4 of 6 Nonconstant Error Variance

Figure 6.2. Plot of absolute studentized residuals versus fitted values for the square-root transformed

interlocking-directorate data. The line on the graph is a lowess smooth using f = 0.5 and 2 robustness

iterations.

For Ornstein's interlocking-directorate regression, for example, a square-root transformation appears to

correct the dependence of the residual spread on the level of the dependent variable. A plot of |ti| versus ŷi

for the transformed data is given in Figure 6.2, and the regression results appear in the right-hand columns of

Table 6.1. The lowess smooth in Figure 6.2 (see Appendix A6.1) shows little change in the average absolute

studentized residuals as the fitted values increase.

The coefficients for the original and transformed regressions in Table 6.1 cannot be compared directly,

because the scale of the dependent variable has been altered. It is clear, however, that assets retain their

positive effect and that the nations of control maintain their ranks. The sectoral ranks are also similar

across the two analyses, although not identical. In comparing the two sets of results, recall that the baseline

categories for the sets of dummy regressors—Canada and heavy manufacturing—implicitly have coefficients

of zero.

Transforming y also changes the shape of the error distribution and alters the shape of the regression of y

on the xs. It is frequently the case that producing constant residual variation through a transformation also

makes the distribution of the residuals more symmetric. At times, eliminating nonconstant spread also makes

the relationship of y to the xs more nearly linear (see the next chapter). These by-products are not necessary

consequences of correcting nonconstant error variance, however, and it is particularly important to check data

for non-linearity following a transformation of y. Of course, because there generally is no reason to suppose

that the regression is linear prior to transforming y, we should check for nonlinearity even when y is not

transformed.

Finally, nonconstant residual spread sometimes is evidence for the omission of important effects from the

model. Suppose, for example, that there is an omitted categorical independent variable, such as regional

location, that interacts with assets in affecting interlocks; in particular, the assets slope, although positive in

every region, is steeper in some regions than in others. Then the omission of region and its interactions with

SAGE

1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 5 of 6 Nonconstant Error Variance

assets could produce a fan-shaped residual plot even if the errors from the correct model have constant

spread. The detection of this type of specification error therefore requires substantive insight into the process

generating the data and cannot rely on diagnostics alone.

http://dx.doi.org/10.4135/9781412985604.n6

SAGE

1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 6 of 6 Nonconstant Error Variance

  • Nonconstant Error Variance
    • In: Regression Diagnostics