Correlation and Bivariate Regression

wjm3774
regression-diagnostics1.pdf

Nonlinearity

In: Regression Diagnostics

By: John Fox

Pub. Date: 2011

Access Date: October 10, 2019

Publishing Company: SAGE Publications, Inc.

City: Thousand Oaks

Print ISBN: 9780803939714

Online ISBN: 9781412985604

DOI: https://dx.doi.org/10.4135/9781412985604

Print pages: 54-61

© 1991 SAGE Publications, Inc. All Rights Reserved.

This PDF has been generated from SAGE Research Methods. Please note that the pagination of the

online version will vary from the pagination of the print book.

Nonlinearity

The assumption that E(?) is everywhere zero implies that the specified regression surface captures the

dependency of y on the xs. Violating the assumption of linearity therefore implies that the model fails

to capture the systematic pattern of relationship between the dependent and independent variables. For

example, a partial relationship specified to be linear may be nonlinear, or two independent variables specified

to have additive partial effects may interact in determining y. Nevertheless, the fitted model is frequently

a useful approximation even if the regression surface E(y) is not precisely captured. In other instances,

however, the model can be extremely misleading.

The regression surface is generally high dimensional, even after accounting for regressors (such as

polynomial terms, dummy variables, and interactions) that are functions of a smaller number of fundamental

independent variables. As in the case of nonconstant error variance, therefore, it is necessary to focus on

particular patterns of departure from linearity. The graphical diagnostics discussed in this chapter represent

two-dimensional views of the higher-dimensional point-cloud of observations {yi, x1i, …, xki,}. With modern

computer graphics, the ideas here can usefully be extended to three dimensions, permitting, for example, the

detection of two-way interactions between independent variables (Monette, 1990).

Residual and Partial-Residual Plots

Although it is useful in multiple regression to plot y against each x, these plots do not tell the whole story—and

can be misleading—because our interest centers on the partial relationship between y and each x, controlling

for the other xs, not on the marginal relationship between y and a single x. Residual-based plots are

consequently more relevant in this context.

Plotting residuals or studentized residuals against each x, perhaps augmented by a lowess smooth (see

Appendix A6.1), is helpful for detecting departures from linearity. As Figure 7.1 illustrates, however, residual

plots cannot distinguish between monotone (i.e., strictly increasing or decreasing) and nonmonotone (e.g.,

falling and then rising) nonlinearity. The distinction between monotone and non-monotone nonlinearity is lost

in the residual plots because the least-squares fit ensures that the residuals are linearly uncorrelated with

each x. The distinction is important, because, as I shall explain below, monotone nonlinearity frequently can

be corrected by simple transformations. In Figure 7.1, for example, case a might be modeled by y = β0 +

β1x 2 + ?, whereas case b cannot be linearized by a power transformation of x and might instead be dealt with

by a quadratic specification,y = β0 + β1x + β2x 2 + ?. (Case b could, however, be accommodated by a more

complex transformation of x: y = β0 + β1(x - α) 2 + ?; I shall not pursue this approach here.)

SAGE

1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 2 of 8 Nonlinearity

Figure 7.1. Scatterplots (a and b) and corresponding residual plots (a' and b') in simple regression.

The residual plots do not distinguish between (a) a nonlinear but monotone relationship, and (b) a

nonlinear, nonmonotone relationship.

In contrast to simple residual plots, partial-regression plots, introduced in Chapter 4 for detecting influential

data, can reveal nonlinearity and suggest whether a relationship is monotone. These plots are not always

useful for locating a transformation, however: The partial-regression plot adjusts xj for the other xs, but it is the

unadjusted xj that is transformed in respecifying the model. Partial-residual plots, also called component-plus-

residual plots, are often an effective alternative. Partial-residual plots are not as suitable as partial-regression

plots for revealing leverage and influence.

Define the partial residual for they'th regressor as

In words, add back the linear component of the partial relationship between y and xj to the least-squares

residuals, which may include an unmodeled nonlinear component. Then plot e(j) versus xj. By construction,

the multiple-regression coefficient bj is the slope of the simple linear regression of e (j) on xj, but nonlinearity

should be apparent in the plot as well. Again, a lowess smooth may help in interpreting the plot.

The partial-residual plots in Figure 7.2 are for a regression of the rated prestige P of 102 Canadian

occupations (from Pineo and Porter, 1967) on the average education (E) in years, average income (I) in

SAGE

1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 3 of 8 Nonlinearity

dollars, and percentage of women (W) in the occupations in 1971. (Related results appear in Fox and

Suschnigg [1989]; cf. Duncan's regression for similar U.S. data reported in Chapter 4.) A lowess smooth is

shown in each plot. The results of the regression are as follows:

= -6.79+ 4.19E + 0.00131I - 0.00891W (3.24) (0.39) (0.00028) (0.0304)

R2 = 0.80 s = 7.85

Note that the magnitudes of the regression coefficients should not be compared, because the independent

variables are measured in different units: In particular, the unit for income is small—the dollar—and that for

education is comparatively large—the year. Interpreting the regression coefficients in light of the units of the

corresponding independent variables, the education and income coefficients are both substantial, whereas

the coefficient for percentage of women is very small.

There is apparent monotone nonlinearity in the partial-residual plots for education and, much more strongly,

income (Figure 7.2, parts a and b); there also is a small apparent tendency for occupations with intermediate

percentages of women to have lower prestige, controlling for income and educational levels (Figure 7.2c).

To my eye, the patterns in the partial-residual plots for education and percentage of women are not easily

discernible without the lowess smooth: The departure from linearity is not great. The nonlinear patterns for

income and percentage of women are simple: In the first case, the lowess curve opens downwards; in the

second case, it opens upwards. For education, however, the direction of curvature changes, producing a more

complex nonlinear pattern.

SAGE

1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 4 of 8 Nonlinearity

Figure 7.2. Partial-residual plots for the regression of the rated prestige of 102 Canadian occupations

on 1971 occupational characteristics: (a) education, (b) income, and (c) percentage of women. The

observation index is plotted for each point. In each graph, the linear least-squares fit (broken line) and

the lowess smooth (solid line for f = 0.5 with 2 robustness iterations) are shown.

SOURCE: Data taken from B. Blishen, W. Carroll, and C. Moore, personal communication; Census of Canada

(Statistics Canada, 1971, Part 6, pp. 19.1–19.21); Pineo and Porter (1967).

Mallows (1986) has suggested a variation on the partial-residual plot that sometimes reveals nonlinearity

more clearly: First, add a quadratic term in xj to the model, which becomes

Then, after fitting the model, form the “augmented” partial residual

Note that in general bj differs from the regression coefficient for xj in the original model, which does not include

the squared term. Finally, plot e'(j) versus xj.

Transformations for Linearity

SAGE

1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 5 of 8 Nonlinearity

To consider how power transformations can serve to linearize a monotone nonlinear relationship, examine

Figure 7.3. Here, I have plotted y = (1/5)x2 for x = 1, 2, 3, 4, 5. By construction, the relationship can be

linearized by taking x' = x2, in which case y = (1/5)x'; or by taking y' = , in which case y' = x. Figure

7.3 reveals how each transformation serves to stretch one of the axes differentially, pulling the curve into a

straight line.

As illustrated in Figure 7.4, there are four simple patterns of monotone nonlinear relationships. Each can be

straightened by moving y, x, or both up or down the ladder of powers and roots: The direction of curvature

determines the direction of movement on the ladder; Tukey (1977) calls this the “bulging rule.” Specific

transformations to linearity can be located by trial and error (but see Chapter 9 for an analytic approach).

In multiple regression, the bulging rule may be applied to the partial-residual plots. Generally, we transform

xj in preference to y, because changing the scale of y disturbs its relationship to other regressors and

because transforming y changes the error distribution. An exception occurs when similar nonlinear patterns

are observed in all of the partial-residual plots. Furthermore, the logit transformation often helps for dependent

variables that are proportions.

Figure 7.3. How a transformation of y (a to b) or x (a to c) can make a simple monotone nonlinear

relationship linear.

As suggested in connection with Figure 7.1b, nonmonotone nonlinearity (and some complex monotone

patterns) frequently can be accommodated by fitting polynomial functions in an x; quadratic specifications are

SAGE

1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 6 of 8 Nonlinearity

often useful in applications. As long as the model remains linear in its parameters, it may be fit by linear least-

squares regression.

Trial-and-error experimentation with the Canadian occupational prestige data leads to the log transformation

of income. The possibly curvilinear partial relationship of prestige to the percentage of women in the

occupations suggests the inclusion of linear and quadratic terms for this independent variable. These

changes produce a modest, though discernible, improvement in the fit of the model:

Figure 7.4. Determining a transformation to linearity by the “bulging rule.”

SOURCE: Adapted from Tukey, Exploratory Data Analysis, © 1977 by Addison-Wesley Publishing Co.

Adapted and reprinted by permission of Addison-Wesley Publishing Co., Inc., Reading, MA.

= -111 + 3.77E + 9.36log2I - 0.139W + 0.00215W 2

(15) (0.35) (1.30) (0.087) (0.00094)

R2 = 0.84 s = 6.95

Note the statistically significant quadratic term for percentage of women. The partial effect of this variable is

relatively small, however, ranging from a minimum of -2.2 prestige points for an occupation with 32% women

to 7.6 points for a hypothetical occupation consisting entirely of women. Because the nonlinear pattern in

the partial-residual plot for education is complex, a power transformation of this independent variable is

not promising: Trial and error suggests that the best that we can do is to increase R2 to 0.85 by squaring

education.

In transforming data or respecifying the functional form of the model, there should be an interplay between

substantive and modeling considerations. We must recognize, however, that social theories are almost never

mathematically concrete: Theory may tell us that prestige should increase with income, but it does not specify

the functional form of the relationship.

SAGE

1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 7 of 8 Nonlinearity

Still, in certain contexts, specific transformations may have advantages of interpretability. For example,

log transformations often can be given meaningful substantive interpretation: To increase log2x by 1, for

instance, represents a doubling of x. In the respecified Canadian occupational prestige regression, therefore,

doubling income is associated on average with a 9-point increment in prestige, holding education and gender

composition constant.

Likewise, the square root of an area or cube root of a volume can be interpreted as a linear measure of

distance or length, the inverse of the amount of time required to traverse a particular distance is speed, and so

on. If both y and xj are log-transformed, then the regression coefficient for x'j is interpretable as the “elasticity”

of y with respect to xj—that is, the approximate percentage of change in y corresponding to a 1% change

in xj. In many contexts, a quadratic relationship will have a clear substantive interpretation (in the example,

occupations with a gender mix appear to pay a small penalty in prestige), but a fourth-degree polynomial may

not.

Finally, although it is desirable to maintain simplicity and interpretability, it is not reasonable to distort the data

by insisting on a functional form that is clearly inadequate. It is possible, in any event, to display the fitted

relationship between y and an x graphically or in a table, using the original scales of the variables if they have

been transformed, or to describe the effect at a few strategic x values (see, e.g., the brief description above

of the partial effect of percentage of women on occupational prestige).

http://dx.doi.org/10.4135/9781412985604.n7

SAGE

1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 8 of 8 Nonlinearity

  • Nonlinearity
    • In: Regression Diagnostics