Correlation and Bivariate Regression
Nonlinearity
In: Regression Diagnostics
By: John Fox
Pub. Date: 2011
Access Date: October 10, 2019
Publishing Company: SAGE Publications, Inc.
City: Thousand Oaks
Print ISBN: 9780803939714
Online ISBN: 9781412985604
DOI: https://dx.doi.org/10.4135/9781412985604
Print pages: 54-61
© 1991 SAGE Publications, Inc. All Rights Reserved.
This PDF has been generated from SAGE Research Methods. Please note that the pagination of the
online version will vary from the pagination of the print book.
Nonlinearity
The assumption that E(?) is everywhere zero implies that the specified regression surface captures the
dependency of y on the xs. Violating the assumption of linearity therefore implies that the model fails
to capture the systematic pattern of relationship between the dependent and independent variables. For
example, a partial relationship specified to be linear may be nonlinear, or two independent variables specified
to have additive partial effects may interact in determining y. Nevertheless, the fitted model is frequently
a useful approximation even if the regression surface E(y) is not precisely captured. In other instances,
however, the model can be extremely misleading.
The regression surface is generally high dimensional, even after accounting for regressors (such as
polynomial terms, dummy variables, and interactions) that are functions of a smaller number of fundamental
independent variables. As in the case of nonconstant error variance, therefore, it is necessary to focus on
particular patterns of departure from linearity. The graphical diagnostics discussed in this chapter represent
two-dimensional views of the higher-dimensional point-cloud of observations {yi, x1i, …, xki,}. With modern
computer graphics, the ideas here can usefully be extended to three dimensions, permitting, for example, the
detection of two-way interactions between independent variables (Monette, 1990).
Residual and Partial-Residual Plots
Although it is useful in multiple regression to plot y against each x, these plots do not tell the whole story—and
can be misleading—because our interest centers on the partial relationship between y and each x, controlling
for the other xs, not on the marginal relationship between y and a single x. Residual-based plots are
consequently more relevant in this context.
Plotting residuals or studentized residuals against each x, perhaps augmented by a lowess smooth (see
Appendix A6.1), is helpful for detecting departures from linearity. As Figure 7.1 illustrates, however, residual
plots cannot distinguish between monotone (i.e., strictly increasing or decreasing) and nonmonotone (e.g.,
falling and then rising) nonlinearity. The distinction between monotone and non-monotone nonlinearity is lost
in the residual plots because the least-squares fit ensures that the residuals are linearly uncorrelated with
each x. The distinction is important, because, as I shall explain below, monotone nonlinearity frequently can
be corrected by simple transformations. In Figure 7.1, for example, case a might be modeled by y = β0 +
β1x 2 + ?, whereas case b cannot be linearized by a power transformation of x and might instead be dealt with
by a quadratic specification,y = β0 + β1x + β2x 2 + ?. (Case b could, however, be accommodated by a more
complex transformation of x: y = β0 + β1(x - α) 2 + ?; I shall not pursue this approach here.)
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 2 of 8 Nonlinearity
Figure 7.1. Scatterplots (a and b) and corresponding residual plots (a' and b') in simple regression.
The residual plots do not distinguish between (a) a nonlinear but monotone relationship, and (b) a
nonlinear, nonmonotone relationship.
In contrast to simple residual plots, partial-regression plots, introduced in Chapter 4 for detecting influential
data, can reveal nonlinearity and suggest whether a relationship is monotone. These plots are not always
useful for locating a transformation, however: The partial-regression plot adjusts xj for the other xs, but it is the
unadjusted xj that is transformed in respecifying the model. Partial-residual plots, also called component-plus-
residual plots, are often an effective alternative. Partial-residual plots are not as suitable as partial-regression
plots for revealing leverage and influence.
Define the partial residual for they'th regressor as
In words, add back the linear component of the partial relationship between y and xj to the least-squares
residuals, which may include an unmodeled nonlinear component. Then plot e(j) versus xj. By construction,
the multiple-regression coefficient bj is the slope of the simple linear regression of e (j) on xj, but nonlinearity
should be apparent in the plot as well. Again, a lowess smooth may help in interpreting the plot.
The partial-residual plots in Figure 7.2 are for a regression of the rated prestige P of 102 Canadian
occupations (from Pineo and Porter, 1967) on the average education (E) in years, average income (I) in
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 3 of 8 Nonlinearity
dollars, and percentage of women (W) in the occupations in 1971. (Related results appear in Fox and
Suschnigg [1989]; cf. Duncan's regression for similar U.S. data reported in Chapter 4.) A lowess smooth is
shown in each plot. The results of the regression are as follows:
= -6.79+ 4.19E + 0.00131I - 0.00891W (3.24) (0.39) (0.00028) (0.0304)
R2 = 0.80 s = 7.85
Note that the magnitudes of the regression coefficients should not be compared, because the independent
variables are measured in different units: In particular, the unit for income is small—the dollar—and that for
education is comparatively large—the year. Interpreting the regression coefficients in light of the units of the
corresponding independent variables, the education and income coefficients are both substantial, whereas
the coefficient for percentage of women is very small.
There is apparent monotone nonlinearity in the partial-residual plots for education and, much more strongly,
income (Figure 7.2, parts a and b); there also is a small apparent tendency for occupations with intermediate
percentages of women to have lower prestige, controlling for income and educational levels (Figure 7.2c).
To my eye, the patterns in the partial-residual plots for education and percentage of women are not easily
discernible without the lowess smooth: The departure from linearity is not great. The nonlinear patterns for
income and percentage of women are simple: In the first case, the lowess curve opens downwards; in the
second case, it opens upwards. For education, however, the direction of curvature changes, producing a more
complex nonlinear pattern.
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 4 of 8 Nonlinearity
Figure 7.2. Partial-residual plots for the regression of the rated prestige of 102 Canadian occupations
on 1971 occupational characteristics: (a) education, (b) income, and (c) percentage of women. The
observation index is plotted for each point. In each graph, the linear least-squares fit (broken line) and
the lowess smooth (solid line for f = 0.5 with 2 robustness iterations) are shown.
SOURCE: Data taken from B. Blishen, W. Carroll, and C. Moore, personal communication; Census of Canada
(Statistics Canada, 1971, Part 6, pp. 19.1–19.21); Pineo and Porter (1967).
Mallows (1986) has suggested a variation on the partial-residual plot that sometimes reveals nonlinearity
more clearly: First, add a quadratic term in xj to the model, which becomes
Then, after fitting the model, form the “augmented” partial residual
Note that in general bj differs from the regression coefficient for xj in the original model, which does not include
the squared term. Finally, plot e'(j) versus xj.
Transformations for Linearity
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 5 of 8 Nonlinearity
To consider how power transformations can serve to linearize a monotone nonlinear relationship, examine
Figure 7.3. Here, I have plotted y = (1/5)x2 for x = 1, 2, 3, 4, 5. By construction, the relationship can be
linearized by taking x' = x2, in which case y = (1/5)x'; or by taking y' = , in which case y' = x. Figure
7.3 reveals how each transformation serves to stretch one of the axes differentially, pulling the curve into a
straight line.
As illustrated in Figure 7.4, there are four simple patterns of monotone nonlinear relationships. Each can be
straightened by moving y, x, or both up or down the ladder of powers and roots: The direction of curvature
determines the direction of movement on the ladder; Tukey (1977) calls this the “bulging rule.” Specific
transformations to linearity can be located by trial and error (but see Chapter 9 for an analytic approach).
In multiple regression, the bulging rule may be applied to the partial-residual plots. Generally, we transform
xj in preference to y, because changing the scale of y disturbs its relationship to other regressors and
because transforming y changes the error distribution. An exception occurs when similar nonlinear patterns
are observed in all of the partial-residual plots. Furthermore, the logit transformation often helps for dependent
variables that are proportions.
Figure 7.3. How a transformation of y (a to b) or x (a to c) can make a simple monotone nonlinear
relationship linear.
As suggested in connection with Figure 7.1b, nonmonotone nonlinearity (and some complex monotone
patterns) frequently can be accommodated by fitting polynomial functions in an x; quadratic specifications are
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 6 of 8 Nonlinearity
often useful in applications. As long as the model remains linear in its parameters, it may be fit by linear least-
squares regression.
Trial-and-error experimentation with the Canadian occupational prestige data leads to the log transformation
of income. The possibly curvilinear partial relationship of prestige to the percentage of women in the
occupations suggests the inclusion of linear and quadratic terms for this independent variable. These
changes produce a modest, though discernible, improvement in the fit of the model:
Figure 7.4. Determining a transformation to linearity by the “bulging rule.”
SOURCE: Adapted from Tukey, Exploratory Data Analysis, © 1977 by Addison-Wesley Publishing Co.
Adapted and reprinted by permission of Addison-Wesley Publishing Co., Inc., Reading, MA.
= -111 + 3.77E + 9.36log2I - 0.139W + 0.00215W 2
(15) (0.35) (1.30) (0.087) (0.00094)
R2 = 0.84 s = 6.95
Note the statistically significant quadratic term for percentage of women. The partial effect of this variable is
relatively small, however, ranging from a minimum of -2.2 prestige points for an occupation with 32% women
to 7.6 points for a hypothetical occupation consisting entirely of women. Because the nonlinear pattern in
the partial-residual plot for education is complex, a power transformation of this independent variable is
not promising: Trial and error suggests that the best that we can do is to increase R2 to 0.85 by squaring
education.
In transforming data or respecifying the functional form of the model, there should be an interplay between
substantive and modeling considerations. We must recognize, however, that social theories are almost never
mathematically concrete: Theory may tell us that prestige should increase with income, but it does not specify
the functional form of the relationship.
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 7 of 8 Nonlinearity
Still, in certain contexts, specific transformations may have advantages of interpretability. For example,
log transformations often can be given meaningful substantive interpretation: To increase log2x by 1, for
instance, represents a doubling of x. In the respecified Canadian occupational prestige regression, therefore,
doubling income is associated on average with a 9-point increment in prestige, holding education and gender
composition constant.
Likewise, the square root of an area or cube root of a volume can be interpreted as a linear measure of
distance or length, the inverse of the amount of time required to traverse a particular distance is speed, and so
on. If both y and xj are log-transformed, then the regression coefficient for x'j is interpretable as the “elasticity”
of y with respect to xj—that is, the approximate percentage of change in y corresponding to a 1% change
in xj. In many contexts, a quadratic relationship will have a clear substantive interpretation (in the example,
occupations with a gender mix appear to pay a small penalty in prestige), but a fourth-degree polynomial may
not.
Finally, although it is desirable to maintain simplicity and interpretability, it is not reasonable to distort the data
by insisting on a functional form that is clearly inadequate. It is possible, in any event, to display the fitted
relationship between y and an x graphically or in a table, using the original scales of the variables if they have
been transformed, or to describe the effect at a few strategic x values (see, e.g., the brief description above
of the partial effect of percentage of women on occupational prestige).
http://dx.doi.org/10.4135/9781412985604.n7
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 8 of 8 Nonlinearity
- Nonlinearity
- In: Regression Diagnostics