Regression Data Output and Writeup
Warner, R. M. (2021). Applied sta s cs II: Mul variable and Mul variate Techniques (3rd ed.). Los Angeles, CA: Sage Publica ons. ISBN: 9781544398723.
CHAPTER 7 MODERATION: INTERACTION IN MULTIPLE REGRESSION 7.1 TERMINOLOGY In factorial analysis of variance (ANOVA), when the effect of one predictor variable (Factor A) on the outcome variable Y differs within separate groups defined by Factor B, we call this an interac on between Factors A and B. Interac on can also be called nonaddi vity because the presence of interac on means we need more than just addi ve main effects for Factors A and B to predict cell means for Y; an A × B interac on term is also needed. Most factorial ANOVA programs (including the SPSS general linear model [GLM] procedure) include all possible interac ons among factors by default. For instance, if there are three factors, A, B, and C, the ANOVA source table includes SS terms for A, B, C, A × B, A × C, B × C, and A × B × C. This default can be overridden by specifying a custom model; that is, the user can delete some interac on terms from the factorial model. When this is done, df and SS terms for the deleted interac on term are pooled with error df and SS. O en, but not always, data analysts who use factorial designs have predic ons about interac ons. Interac ons between predictors can also be examined in mul ple regression. In the context of regression, interac on is usually called modera on. Consider an example in which we want to predict interview skills (Y) from the categorical variable sex (X1) and a quan ta ve measure of emo onal intelligence (X2). If we find that the slope that relates X2 to Y is different for men than for women, we would say that sex is a moderator variable. The effect of X2 on Y is moderated by, or changed by, or dependent upon sex. By default, most regression analyses do not include interac ons between predictors (i.e., they do not test hypotheses about modera on). Because of this, regression analysts can easily miss seeing interac ons. Analysts can add interac on terms to regression equa ons by crea ng new predictor variables that are products between predictor variables such as X1 and X2, as discussed in this chapter. If a data analyst has theore cal predic ons about interac ons, terms should be added to the regression model to represent these interac ons. If interac ons are not predicted ahead of me, adding interac on terms to regression models in search of significant outcomes can be viewed and reported as exploratory, and is reasonable if the interac ons make sense. However, if interac on terms are added to a regression model during an extended process of data torturing, in an a empt to obtain smaller p values for tests of the model, it is a form of p-hacking; this should be avoided. In a classic paper, Baron and Kenny (1986) discussed modera on and media on analyses. These terms should not be confused with each other; media on is a completely different kind of hypothesis about associa ons among variables. An example of a media on hypothesis is that first, X1 causes X2, and then, X2 causes Y. If this model is consistent with regression results, one possible interpreta on is that the effect of X1 on Y may be mediated by X2. Other interpreta ons are equally plausible, as discussed in Chapter 9, on media on. Media on is completely different from modera on.
When the term effect is used in sta s cs, it can imply causality if data come from an experiment. However, when data are not from an experiment, the term effect does not imply causality; it describes only the extent to which scores on an outcome variable Y can be predicted from one or more X variables. In most regression analysis situa ons, data come from a nonexperimental study. Results of regression analysis based on nonexperimental data should not be interpreted in causal terms. Regression results based on nonexperimental designs can be either consistent with, or inconsistent with, hypotheses about modera on and media on. A significant correla on between predictor variables X1 and X2 is not evidence of modera on or interac on (this seems to be a common misconcep on). An interac on can be present when predictors are correlated; however, it can also be present when the predictors are uncorrelated. The existence of correla on between predictors tells us nothing about poten al interac on. Hypotheses about modera on can be represented in path model form in two different ways. Figure 7.1 has an arrow from X2 that points toward the X1, Y path. This path diagram represents the idea that the coefficient for the X1, Y path is modified by X2. A second way to represent modera on appears in Figure 7.2. Modera on, or interac on between X1 and X2 as predictors of Y, can be assessed by including the product of X1 and X2 as an addi onal predictor variable in a regression model. The unidirec onal arrows toward the outcome variable Y represent a regression model in which Y is predicted from X1, X2 and from a product term that represents an interac on between X1 and X2. In this model, the three predictors are shown as correlated with one another (these correla ons are represented by the double-headed arrows). Predictors are o en correlated in regression analyses, but as noted previously, those correla ons are not evidence for or against the possible existence of modera on. Figure 7.1 Path Model That Represents Modera on of the X1, Y Rela onship by X2 Descrip on Figure 7.2 A Different Path Model That Represents Modera on of the X1, Y Rela onship by X2 The path model in Figure 7.2 is a more convenient way to represent modera on; however, the diagram in Figure 7.1 is some mes used. The same assump ons are required for modera on analysis as for other regression analyses that include mul ple predictor variables. The dependent variable Y must be quan ta ve, and ideally it should be approximately normally distributed with no extreme outliers. (Formally it is more correct to say that the residuals from regression are assumed to be normally distributed; however, the distribu on of residuals o en resembles the distribu on for the Y variable.) All rela ons between variables are assumed to be linear. Other assump ons (e.g., homogeneity of variance of Y scores across levels of X1 and other predictor variables) should also be met. Sample size requirements are similar to those for mul ple regression with mul ple predictors (also see Aiken & West, 1991).
This chapter is limited to considera on of the case for two predictors and their interac on. The generaliza on of the methods to larger number of predictors is reasonably straigh orward. When we ask about the types of variables for X1 and X2, there are three possible combina ons.
1. X1 and X2 can both be categorical variables. 2. One of the predictor variables can be categorical, the other can be quan ta ve. The
categorical predictor variable is usually treated as the moderator variable. 3. Both X1 and X2 can be quan ta ve variables.
The following sec ons consider each of these three situa ons. 7.2 INTERACTION BETWEEN TWO CATEGORICAL PREDICTORS: FACTORIAL ANOVA When both predictor variables are categorical, the most convenient way to analyze the data is usually factorial ANOVA (see Volume I, Chapter 16 [Warner, 2020]). In this situa on, we might call categorical variable X1 Factor A and categorical variable X2 Factor B. In a factorial design, the values of a factor (or levels of a factor) can correspond to different types or amounts of treatment, or they may iden fy naturally occurring group memberships. A two-way factorial ANOVA provides F tests for a main effect for Factor A, a main effect for Factor B, and the A × B interac on. Eta squared (η2) effect sizes can be obtained for each of these. Cell (group) means must be provided in some form to understand the nature of the interac on (and the nature of main effect group differences). The nature of an interac on can be illustrated as a line graph (in which lines connect the points that represent cell means), or as a bar graph (in which there is one bar for each group or cell, and the heights of bars represent group means). Equivalent informa on is provided by a table of cell means. Factorial designs are common in experiments (in which a researcher manipulates one or more variables). However, they can also be used with categorical predictor variables that are not manipulated (i.e., that represent naturally occurring group membership). As a brief review, consider results from a study conducted by Lyon and Greenberg (1991). (This data set is not provided.) They set up a 2 × 2 factorial study in which the first factor (family background of par cipant, i.e., whether the father was diagnosed with alcoholism) was assessed by self-report. This represents naturally occurring group membership. The second factor was experimentally manipulated. For the manipula on, a research assistant acted out a script; he asked the female par cipants to volunteer their me for another study to help him out. Number of minutes volunteered was the dependent variable. The manipulated variable was the way the research assistant presented himself. In the nurturant (“Mr. Right”) condi on, he talked about the way he was suppor ve and helpful to other people. In the exploi ve (“Mr. Wrong”) condi on, he talked about ge ng other people to write his papers and do his laundry. Women were randomly assigned to hear either the Mr. Right or Mr. Wrong performance. On the basis of theories about codependence, the researchers hypothesized that women who had fathers diagnosed with alcoholism would show a “codependent” response; that is, they would offer more me to help Mr. Wrong than Mr. Right. Conversely, women who did not have fathers diagnosed with alcoholism were expected to offer more me to Mr. Right.
In this study, there was one experimentally manipulated factor (nurturant vs. exploi ve person) and one naturally occurring membership (whether the woman had a father diagnosed with alcoholism or not). In such situa ons, the naturally occurring group membership factor is o en thought of as the moderator. That is, in describing the nature of the interac on, the focus was on the difference in the way women who did or did not have fathers diagnosed with alcoholism reacted to the Mr. Right and Mr. Wrong scenarios. The F for the interac on was sta s cally significant; the nature of the interac on can be seen in Figure 7.3. Descrip on Figure 7.3 Plot of Cell Means for the Significant Interac on Between Type of Person Needing Help and Family Background of Female Par cipant as Predictors of Amount of Time Offered to Help The nature of the interac on in Figure 7.3 was consistent with their predic on. On average, women who had fathers diagnosed with alcoholism volunteered more me to help an exploi ve person than a nurturant person. On the other hand, women who did not have fathers diagnosed with alcoholism volunteered more me to help a person who was described as nurturant (and in fact, none of them volunteered any me to help the exploi ve person). When an interac on is present, we can say that there are “different slopes for different folks” (as suggested by Robert Rosenthal). In this example, the “different folks” correspond to the naturally occurring family background groups (women who did vs. did not have fathers diagnosed with alcoholism). Women with fathers diagnosed with alcoholism offered more me in the Mr. Wrong condi on; they offered less me to Mr. Right. On the other hand, women who did not have fathers diagnosed with alcoholism offered more me to Mr. Right and offered no
me to Mr. Wrong. The effect of the manipula on (whether the research assistant was described as Mr. Right or Mr. Wrong) was different for the two types of folks (women from different family backgrounds). This interac on is disordinal (that is, the lines cross; the rank order of mean amount of me given to Mr. Right and Mr. Wrong was opposite for the two groups of women). Many interac ons are not disordinal; none of the interac ons in later graphs of this chapter show disordinal interac ons. When all predictor variables are categorical and the outcome variable is quan ta ve, and interac ons between categorical predictors are of interest, it is usually most convenient to do the analysis as a factorial ANOVA (using the SPSS GLM procedure). It is possible to do the same analysis as a regression (using dummy variables to represent group membership on Factor A and Factor B and products between dummy variables to represent interac ons). A regression analysis with dummy predictor variables yields the same results as a factorial ANOVA, although the informa on is presented in different ways in the output. Regression output usually does not directly include group means, although these can be calculated from the coefficients of the regression equa on. The results for predic on of a quan ta ve Y variable from two categorical variables are the same for ANOVA and regression (provided that the regression includes an interac on term) because these analyses are both specific cases of the GLM.
The remainder of this chapter examines interac ons that involve one categorical and one quan ta ve predictor or two quan ta ve predictors. It is convenient to use mul ple regression analysis for these situa ons. 7.3 INTERACTION BETWEEN ONE CATEGORICAL AND ONE QUANTITATIVE PREDICTOR The data set in the file firstexample.sav is used to demonstrate analysis of poten al interac ons between one categorical and one quan ta ve predictor variable. This hypothe cal data set provides informa on to predict annual salary (given in thousands of dollars per year) from sex, college, and years of experience. For instance, if a faculty member’s annual salary is given as 45.3, this corresponds to 45.3 × $1,000 = $45,300. The categorical variables used as predictors include sex (coded 0 = female, 1 = male), and college (coded 1 = liberal arts, 2 = business, and 3 = science and engineering). In all following examples, the dependent (Y) variable is salary. In Chapter 6, on dummy variables, you saw that adding a categorical variable (such as sex) as a predictor in a regression equa on provides informa on about mean differences in salary between groups defined by that categorical variable (i.e., men and women). In the following equa on: Salary′ = b0 + b1Sex + b2Years, a sta s cally significant value for the b1 coefficient tells us whether mean salaries differ for men versus women (a er adjus ng for the effect of the other predictor, years). The sign of the b1 coefficient tells us which group had a higher mean or higher intercept. In other words, b1 tells us about the difference between the intercepts of the regression lines that predict salary from years for the male versus female groups. This equa on, Salary′ = b0 + b1Sex + b2Years, assumes that there is no interac on between sex and years as predictors of salary. Interac on can be understood as “different slopes for different folks.” The sex, years, and salary example is a situa on in which intercepts and slopes of lines to predict salary from years have unusually simple interpreta ons. The intercepts correspond to star ng salaries (i.e., the predicted salaries for persons with 0 years of experience). Slopes within the male and female groups correspond to annual raises (i.e., for each addi onal year of experience, how much does salary tend to change?). Predic on of salary from years is a nice example because the quan ta ve variable, years, can have a value of 0 (when a person is first hired). In this situa on, the intercept b0 (the predicted value of salary when years = 0) has a meaningful interpreta on: It corresponds to average star ng salary. In many real-world situa ons, it is not possible for anyone to have a score of 0 on the X predictor variable. In these cases, an intercept b0 es mate is necessary to locate the regression line in a plot, but it has no meaningful interpreta on. Men and women may also have different average star ng salaries. This can be called a “main effect” of sex on salary. In this regression analysis, men and women can have different intercepts (b0 for women and b0 + b1 for men). If b1 differs significantly from 0, they have significantly different star ng salaries. (This was discussed in detail in Chapter 6.) (In later examples you will see that men and women may also receive different annual salary increases. In other words, there can be a different slope to predict salary from years for men than for
women. That is called an interac on between sex and years as predictors of salary, or modera on of the effect of years on salary by sex.) In the following example, the quan ta ve variable salary is predicted from the categorical variable sex and the quan ta ve variable years. (A new predictor to test significance of an interac on between sex and years will be introduced in a later sec on.) The imaginary data in the file firstexample.sav provides informa on to predict salary (Y, given in thousands of dollars) from sex and years. Chapter 3 showed that you can get a preliminary look at within-group regression slopes by spli ng the data file into groups (e.g., male and female groups) and examining the regression lines separately for each group. That is done in Sec on 7.4. This provides a preview look at data within each group. The preliminary look at data in Sec on 7.4 does not tell us whether any difference in slopes we might see by visual examina on is large enough to be judged sta s cally significant. Examina on of separate sca erplots also makes it possible to see poten al bivariate outliers within each group and to assess whether the X2, Y associa on is linear within each group. 7.4 PRELIMINARY DATA SCREENING: ONE CATEGORICAL AND ONE QUANTITATIVE PREDICTOR Regressions that include interac on terms require assump ons like those for other mul ple regressions. Scores on the Y′ dependent variable must be quan ta ve; a histogram should show reasonably normally distributed scores without extreme outliers. Distribu on shape for quan ta ve predictors is less cri cal, but an approximately normal distribu on shape without extreme outliers is desirable. For categorical predictors, numbers of cases in each group should be reasonably large. (I prefer n > 30 per group, but people some mes se le for smaller group sizes.) Associa ons among all quan ta ve variables (including predictor and dependent) should be linear (as assessed by sca erplots) and should not have extreme bivariate outliers. To examine the associa on of an X2 quan ta ve predictor and a Y outcome variable separately within groups, we set up sca erplots for Y and X2 (separately for men and women). When we examine these separate sca erplots, we generally hope that rela ons between Y and X2 are linear within both groups and that there are no bivariate outliers. (If rela ons within groups do not appear linear, more complicated analyses for assessment of nonlinear interac ons is needed; see Aiken & West, 1991.) Evalua on of normality, outliers, and linearity is covered in Volume I (Warner, 2020) and in Chapter 2 in the present volume and are not repeated here. The following sec on shows how to examine associa ons of X1 and Y separately within groups. 7.5 SCATTERPLOT FOR PRELIMINARY ASSESSMENT OF POSSIBLE INTERACTION BETWEEN CATEGORICAL AND QUANTITATIVE PREDICTOR When one predictor is categorical, data screening should include examina on of regression lines within subgroups (such as male and female groups) in sca erplots. Sca erplots can be used to evaluate whether these slopes are linear and can provide a preview of possible interac on. The data used in this sec on are in the file firstexample.sav. To obtain these separate regression lines, use the <Sca er/Dot> command. First, make the following menu selec ons: <Graphs> →
<Legacy Dialogs> → <Sca er/Dot>. In the first dialog box, shown in Figure 7.4, select “Simple Sca er” and click Define. In the next dialog box, shown in Figure 7.5, enter the regression dependent variable in the box for “Y Axis,” the quan ta ve predictor years in the box for “X Axis,” and the categorical moderator variable (sex) in the box for “Set Markers by.” The categorical moderator variable has only two groups in this example, however, this procedure can also be used when categorical variables iden fy more than two groups. Then click OK. A sca erplot, similar to those seen in earlier regressions, appears (not shown here). The Chart Editor is used to obtain the separate regression lines within male and female subgroups. To open the Chart Editor, double-click on the sca erplot; the Chart Editor dialog box appears (as shown in Figure 7.6). The sca erplot in Figure 7.6 is shown in the opened Chart Editor window. To open the Chart Editor, double-click on the sca erplot; the menu headings seen in Figure 7.6 appear. In Figure 7.6, I have already edited the lines and case markers to make them clearly dis nguishable (unfilled triangles for each male case, filled circles for female cases). To request separate regression lines for subgroups (men and women), make the following menu selec ons at the top-level menu bar in the Chart Editor window in Figure 7.6: <Elements> → <Fit Line at Subgroups>. The resul ng regression lines appear in Figure 7.7 (this figure has been edited to improve appearance). Further chart edi ng procedures are not demonstrated here. There are many online tutorials (YouTube videos and PDF files) that demonstrate addi onal edi ng features. Alterna vely, you may prefer to use other programs to generate figures. Most graphs in this book were created using SPSS and then extensively edited to improve appearance. Figure 7.4 First Step in Obtaining Separate Regression Lines for Subgroups Descrip on Figure 7.5 Simple Sca erplot Dialog Box to Iden fy Predictor, Outcome, and Marker Variables The upper (solid, heavier) line represents predicted salary for men. The lower (do ed, lighter) line represents salary predic ons for women. You should be able to see that, overall, women earn less on average than men in this hypothe cal data set and that average predicted salary increases as a func on of years. Also note that associa on between years and salary appears to be fairly linear in both groups; there appear to be some bivariate outliers. If curves had appeared, analysis of interac on would need to be take nonlinear interac on into account (see Aiken & West, 1991). The lines are not perfectly parallel. Perfectly parallel lines would indicate complete absence of interac on. The slight departure from parallel seen in Figure 7.7 may be due to sampling error. A regression analysis will tell us whether the observed departure from parallel is substan al enough, rela ve to sampling error, to be sta s cally significant.
Figure 7.6 Ini al Sca erplot Shown in Chart Editor Dialog Box Descrip on Figure 7.7 Separate Regression Lines to Predict Salary From Years Within Female and Male Subgroups in Data File firstexample.sav 7.6 REGRESSION TO ASSESS STATISTICAL SIGNIFICANCE OF INTERACTION BETWEEN ONE CATEGORICAL AND ONE QUANTITATIVE PREDICTOR The general equa on for an analysis to examine the predic on of Y from X1 and X2 (without an interac on) is: Other
(7.1) With an added term to represent an interac on between X1 and X2, the equa on becomes: Other
(7.2) Note that when an interac on term (such as X1 × X2) is included, the variables involved in that interac on (X1, X2) must also be included in the regression equa on. The interac on term is created by forming the product of the two predictors; the new variable X1byX2 = X1 × X2. In this example, we create a product variable called sexbyyears that is the product of sex by years. For the current example, the equa on is: Salary′ = b0 + b1Sex + b2Years + b3(Sexbyyears). To create this new interac on variable in SPSS, use the <Transform> → <Compute Variable> command, as shown in Figure 7.8. (Later you will see that if both X1 and X2 are quan ta ve, scores should be centered before they are mul plied. It is not necessary to center scores when one or both predictors are categorical; Aiken & West, 1991.) Recall that in this data set (firstexample.sav), sex is coded 0 for female and 1 for male. By subs tu ng those values of 0 and 1 into Equa on 7.2, we can obtain separate predic on equa ons for men and women: Other
Other
Descrip on Figure 7.8 Compute Variable Dialog Box to Compute Product (Interac on) Term
For women, the intercept for the regression line is b0, and the slope is b2. For men, the intercept is (b0 + b1), and the slope is (b2 + b3). A sta s cally significant value for b1 indicates that the intercepts of the lines differ significantly for men versus women. If b3 is sta s cally significant, it tells us that the slopes differ significantly for men versus women (in other words, that modera on or interac on is present). Now let’s examine a regression analysis that uses sex, years, and sexbyyears to predict salary in the data set firstexample.sav. All three variables are entered on one step as shown in Figure 7.9. As in other uses of the SPSS regression procedure, click the Sta s cs bu on and request part and par al correla ons (these provide effect size informa on). Part of the results appear in Figure 7.10. Note that the coefficient for b3, which corresponds to a poten al interac on, is not sta s cally significant; b3 = .258, t(46) = .808, p = .423. This tells us that the lines in the graph in Figure 7.7 do not depart significantly from parallel. We can also say that we do not have different slopes for different folks; there is no interac on between sex and years; sex does not moderate the effect of years on salary. Descrip on Figure 7.9 Regression to Predict Salary From Sex, Years, and Sexbyyears Descrip on Figure 7.10 Regression Results for Predic on of Salary From Sex, Years, and Sexbyyears for Data in firstexample.sav Upon finding that an interac on is not sta s cally significant, a data analyst may want to rerun the regression analysis using only sex and years as predictors. However, repor ng a regression that includes a nonsignificant interac on term between sex and years would drama ze the absence of an interac on. If the original research ques ons included hypothesized interac on, an analysis that includes test for that interac on should be reported. On the other hand, if an interac on term was “thrown into” the analysis as part of an a empt to wring sta s cally significant results out of data, this is a form of p-hacking. In this situa on it makes no sense to report a nonsignificant interac on term that was not predicted (par cularly if the nature of the interac on is not interpretable). 7.7 INTERACTION ANALYSIS WITH MORE THAN THREE CATEGORIES The file firstexample.sav includes the categorical variable college. The variable college represents college membership as 1 = liberal arts, 2 = business, and 3 = science and engineering. As discussed in Chapter 6, on dummy variables, when a categorical variable with k groups is used as a predictor in regression, (k – 1) dummy variables are needed to represent group membership in order to use that categorical variable as a predictor in a regression equa on. Dummy-coded dummy variables are used in this example. Each dummy variable can be thought of as the answer to a yes/no ques on.
In this example, D1 is the yes/no ques on, Is this person in liberal arts (coded 1= yes, 0 = no)? If the coefficient associated with D1 is sta s cally significant, it tells us that mean salary differs significantly between liberal arts and the other two colleges (business and science and engineering), and the b coefficient associated with D1 tells us the magnitude of that difference. D2 is a different yes/no ques on: Is this person in business (coded 1 = yes, 0 = no)? If the coefficient associated with D2 is sta s cally significant, that tells whether the mean salary in business differs from mean salary in the other two colleges, and the magnitude of the b coefficient associated with D2 tells us the difference between means. To set up a regression to assess a possible interac on between college and years, we need to form all possible products between the dummy variables that represent college (D1 and D2) and the quan ta ve variable years (e.g., D1 × Years and D2 × Years, called D1byyears and D2byyears). This can be done using the <Transform> → <Compute Variable> command. The resul ng equa on (interac on of a categorical variable with three groups with a quan ta ve predictor) has the following form. Note that when an interac on term is included in a regression, all variables involved in the interac on (in this example, D1, D2, and years) must also be included in the regression equa on. The following equa on includes main effects for college (D1, D2) and years as predictors of salary and also the two interac on terms needed to represent their interac on (D1byyears, D2byyears). Other (7.3)
If we want a significance test for the overall college by years interac on, we need to find out how much predic ve informa on the two combined interac on terms add to the analysis and whether this addi onal predicted variance is large enough to be sta s cally significant. This can be done using hierarchical regression (user-determined order of entry), which was covered in earlier chapters about mul ple regression. In the following example (s ll using data in the file firstexample.sav), in Step 1, the variables that represent college and years are entered; in Step 2 (shown in Figure 7.11) the two interac on variables are added. In addi on, request R squared change as an addi onal sta s c (as shown in Figure 7.12). Descrip on Figure 7.11 Step 2 in Hierarchical Entry of Predictors: Interac on Terms Entered as Block 2 (Sex, D1, and D2 Were Entered in Block 1) Descrip on Figure 7.12 Request R Squared Change Table Descrip on Figure 7.13 Results: R2 Change Across Two Models: Step 1, Years and College; Step 2, Interac on Between Years and College
For Model 1 in Step 1, the R2 for the combined predic ve contribu on of years, D1, and D2 (college membership) was sta s cally significant, F(3,46) = .776, p < .001. The increase in R2 for Model 2, in which the interac on terms were added to the model, was not sta s cally significant, R2 change = .024, F(2,44) = 2.642, p = .082. There was no evidence of an interac on between college and year as predictors of salary. Some authors (e.g., Aiken & West, 1991) refer to predictors such as years, D1, and D2 as represen ng “main effects” for years and college in the context of regressions that also include interac on terms between them. This is poten ally confusing. In a regression without an interac on term (e.g., Y′ = b0 + b1D1 + b2D2 + b3Years), it makes sense to interpret D1 and D2 as informa on about a college main effect. However, in the presence of an interac on term, D1 and D2 are be er interpreted as informa on about the intercepts of different regression lines (Crawford, Jussim, & Pilanski, 2014; Whisman & McClelland, 2005). Interpreta on of coefficients associated with dummy coded predictors such as D1, D2 in analyses that do not include interac on effects was discussed in the Chapter 6, on dummy variables, and that informa on is not repeated here. 7.8 EXAMPLE WITH DIFFERENT DATA: SIGNIFICANT SEX-BY-YEARS INTERACTION Next we examine a different hypothe cal data set that was constructed to show a sex-by-years interac on: secondexample.sav. A regression analysis is run to predict salary from sex, years, and sexbyyears. Results appear in Figure 7.14. The raw score regression equa on, based on informa on in Figure 7.14, is: Other
The overall regression model was significantly predic ve of salary, F(3,56) = 144.586, p < .001, with an extremely large effect size, R2 = .886. (I used extremely large effect sizes in many examples to make it easy to see and interpret pa erns of results. In real-world applica ons, in many disciplines, R2 values in excess of .20 are rare.) The effect for sex was not sta s cally significant, b1 = –1.5, t(56) = –.385, p = .702, and the corresponding effect size was essen ally 0. This tells us that the intercepts for the male-only and female-only regression lines do not differ significantly. The intercept for women is b0, the intercept for men is b0 + b1, and the difference between female and male intercepts is b1 (– 1.5). The effect for years was sta s cally significant, b2 = 2.346, t(56) = 11.101, p < .001, sr2 = .25. Because the sex-by-years interac on was also sta s cally significant, interpreta on of the effect of years on salary differs by sex and should be interpreted taking sex into account. For the sex- by-years interac on, b3 = 2.002, t(56) = 5.98, p < .001, sr2 = .073.
Descrip on Figure 7.14 SPSS Output for Regression: Sex, Years, and Sex × Years Interac on as Predictors of Salary (in Tens of Thousands of Dollars) for Data in secondexample.sav Using codes female = 0 and male = 1, the female group is the reference group. For women, the intercept is b0 and the slope is b2. For men, the intercept is (b0 + b1) and the slope is (b2 + b3). If b1 is sta s cally significant, the intercepts of the regression lines differ for men and women. If b3 is sta s cally significant, the slopes of the regression lines differ for men and women (there is evidence of interac on or modera on). Values of b1 and b3 can be nega ve, posi ve, or 0. The values of b1 and b3 are independent (e.g., whether intercepts differ is unrelated to whether slopes differ). Subs tu ng in values of 0 and 1 for women and men, two separate regression equa ons are obtained: Other
In words, women’s average star ng salary (at years = 0) is $34,582, and their average annual raise is $2,346. In other words, a female faculty member can calculate her predicted salary by subs tu ng her years of experience; if she has 10 years of experience, the model predicts her salary = 34.582 + 2.346 × 10 = 58.042 (i.e., $58,042). Because the overall mul ple R is so high for these hypothe cal data, that predicted salary is probably close to actual salary for most members of the sample. In actual data, with much lower values of R, the es mates for many individuals will be incorrect by many thousands of dollars. Other
Men’s average star ng salary is $33,082, and their average annual raise is $4,438. A sca erplot with separate lines for female and male subgroups makes the nature of the interac on clear. The sca erplot in Figure 7.15 was created using the same procedures as in the previous example. Note that the regression equa ons for these lines correspond to the ones above that were deduced by subs tu ng in scores of 0 and 1. Men and women have approximately equal salaries at year = 0 (the intercept). In the hypothe cal data in secondexample.sav, the average annual raises received by men are much larger than those received by women, and the line to predict salary for years has a steeper slope for men than women. Using the Chart Editor, you can request separate regression lines by subgroup (in this example, male and female are the subgroups). In Figure 7.15 you can see that the intercept (star ng salary) was almost exactly equal for the male and female groups. However, the slope (average salary increase for each addi onal year) was steeper for men than for women. Men
tended to receive higher annual raises than women in this hypothe cal example. Note that the regression equa ons for subgroups that appear in Figure 7.15 are the same as those deduced from the regression equa on by subs tu ng in values of 0 and 1 for sex. (The free interac on graphing program created by Daniel Soper can also be used to generate these lines; use of that program is demonstrated later in this chapter.) 7.9 FOLLOW-UP: ANALYSIS OF SIMPLE MAIN EFFECTS We now have graphs of separate lines for men and women (in Figure 7.15). For each of these lines we may want to know: Is the mul ple R to predict salary from years significant within each group? This is called an analysis of simple main effects (a term also used in factorial ANOVA). This is a common follow-up to evaluate significant interac ons. It is possible, for example, that years is a significant predictor of salary for men but not for women. Descrip on Figure 7.15 Graph of Sex-by-Years Interac on in Predic on of Salary for Data in secondexample.sav To obtain separate regressions for female and male subgroups (in the secondexample.sav data set), use the SPSS split file procedure: <Data> → <Split File>. (Note that this is different from the <Split into Files> menu op on.) Within the Split File dialog box in Figure 7.16, select the radio bu on for “Organize output by groups”; then move the categorical moderator variable sex into the “Groups Based on” window, and click OK. Un l this command is reversed (by going back into the Split File dialog box and choosing the radio bu on for “Analyze all cases, do not create groups”), all subsequent analyses are conducted separately for the male and female groups. The next step is to run the regression to predict salary from years separately for each group. The variables sex and sexbyyears are not included in this set of regressions because, within each group, these would be constants. When this is done, the following output is obtained (Figures 7.17 and 7.18). Within the female group, there was a sta s cally significant associa on between years and salary. Because there is only one predictor (years), we can either report F and R for the en re equa on or t and sr2 for the single predictor. The regression coefficients obtained here are the same as those obtained earlier using by-hand subs tu on of dummy variable codes into the regression equa on that included the interac on term. For men, the regression to predict salary from years was also sta s cally significant, with t(28) = 17.173, p < .001 (Figure 7.18). The sta s cally significant interac on reported in Figure 7.14 tells us that these two groups have significantly different slopes to predict salary from years. Interac ons can be different from this example. For example, it is possible that a Y variable is significantly predictable from X2 for one group (such as men) but not for the other group (women). Or the slope that relates X2 to Y might be posi ve for one sex and nega ve for the other (this would be a disordinal interac on). In this example, the nature of the interac on was that men had a significantly larger posi ve slope to predict salary from years for women; in
other words, they received higher salary increases per year (we know this because b3 was sta s cally significant). However, star ng salaries did not differ between groups (because b1 was not significant). Descrip on Figure 7.16 SPSS Split File Dialog Box Descrip on Figure 7.17 Simple Main Effects: Regression to Predict Salary From Years for Female Subgroup Descrip on Figure 7.18 Simple Main Effects: Predic on of Salary From Years for Male Subgroup 7.10 INTERACTION BETWEEN TWO QUANTITATIVE PREDICTORS The sta s cal significance of an interac on between quan ta ve X1 and X2 predictors in a linear regression can be assessed by forming a new variable that is the product of the predictors and including this product term in a regression, along with the original predictor variables, as shown in Equa on 7.4. A first approxima on to a modera on model with quan ta veX1 and X2 variables as predictors of Y could be given as: Other (7.4)
where Y′ is the predicted score on a quan ta ve Y outcome variable, and X1 and X2 are both quan ta ve predictor variables. However, a problem arises when using the model in Equa on 7.4. The product term (X1 × X2) is highly correlated with both X1 and X2. High correla ons among predictors makes it difficult to dis nguish their unique contribu ons. Aiken and West (1991) tell us that centering scores on X1 and X2 prior to calcula ng the (X1 × X2) product reduces this collinearity. Centering is done for each variable by subtrac ng the mean for that variable from each score (e.g., scores on X1 are centered by subtrac ng the mean of X1 from each individual X1 score). In prac ce, scores are centered for both quan ta ve predictors before calcula ng the product term to represent the interac on. This is a preview of the methods that are demonstrated in the next sec on.
1. Find the mean for each predictor variable (in this example, the means for X1 and X2). This informa on can be obtained from the SPSS <Descrip ves> command.
2. Create new variables for centered scores. I use variable names that remind me which versions of the predictor have been centered. The <Transform> → <Compute Variable> command can be used to calculate these:
Other
Note that I have not centered scores on Y, the dependent variable. This is not necessary. (It isn’t wrong, but it makes interpreta on of results poten ally confusing.)
3. Carry out an addi onal computa on to obtain the product of the centered scores: Other
4. Run a mul ple linear regression with these three predictors: X1, X2, and
productcenteredX1X2. No ce that I use the raw scores for X1 and X2 as predictors, rather than the centered scores. (It is not wrong to use the centered scores, but use of raw scores here makes interpreta on of results easier.) Note that if you center Y or use the centered scores for X1 and X2 as predictors, interpreta on of your results will differ from the examples that follow. In this example, based on Equa on 7.4, the raw score for number of symptoms is predicted by: Other
5. Evaluate regression results (Figure 7.21) for the thirdexample.sav data with the following ques ons in mind:
a) Is the overall model sta s cally significant, and what propor on of variance in Y scores can be predicted using X1 and X2 and their interac on? To answer this ques on, examine F in the ANOVA source table and mul ple R2.
b) Which of the three predictors (if any) are sta s cally significant when controlling for other predictors in the model; and what propor on of variance in Y can be uniquely predicted by each variable (i.e., what are the sr2 effect sizes)? The p values (based on t values) are used to evaluate whether each raw score regression coefficient is sta s cally significant. Square each part correla on value to obtain sr2, the propor on of variance predicted uniquely by each of the independent variables; this provides effect size informa on.
c) If the b3 coefficient for the X1 × X2 product term (productcenteredX1X2) is sta s cally significant, there is a sta s cally significant interac on between X1 and X2 as predictors of Y. If there is a sta s cally significant interac on, you need to be able to describe the nature of that interac on. To do this, set up a graph of predicted Y values for selected values of X1 and X2, as discussed in upcoming Sec on 7.13. In the following example, separate lines will be drawn for selected values of X2 (age). Labels for the graph can be
given in raw score units or standardized (z-score) units. On the basis of the pa ern in this graph, you can say something about the nature of the interac on.
The following empirical example shows how to graph separate regression lines to predict symptoms from habits, separately for selected values of age, using SPSS linear regression and a separate graphics program. Instruc ons for the by-hand computa on of selected predicted Y′ values needed to create an interac on graph by hand, using procedures described by Aiken and West (1991), are provided in Appendix 7A at the end of this chapter. 7.11 SPSS EXAMPLE OF INTERACTION BETWEEN TWO QUANTITATIVE PREDICTORS A hypothe cal study involves assessing how age in years (X2) interacts with number of healthy habits (X1) (such as regular exercise, not smoking, and ge ng 8 hours of sleep per night) to predict number of physical illness symptoms (Y). The data for this hypothe cal study are in the SPSS file named thirdexample.sav. Number of healthy habits ranged from 0 to 7; age ranged from 21 to 55. The equa on that corresponds to this analysis is: Symptoms′ = b0 + b1Age + b2Habits + b3Agebyhabits. Centered scores on age are obtained by finding the sample mean for age and subtrac ng this sample mean from scores on age to create a new variable, which in this example is named age_c (to represent centered age). Figure 7.19 shows the SPSS Compute Variable dialog box; age_c = age – 41.7253, where 41.7253 is the sample mean for age that was obtained by running descrip ve sta s cs. Similarly, scores for habits_c were obtained by subtrac ng the mean number of habits from each individual score for number of habits. A er compu ng centered scores for both age and habits (age_c and habits_c, respec vely), the compute procedure was used to form the interac on/product term agebyhabits = age_c × habits_c, as shown in Figure 7.20. 7.12 RESULTS FOR INTERACTION OF AGE AND HABITS AS PREDICTORS OF SYMPTOMS The SPSS regression results for predic on of symptoms from age, habits, and the interac on of age and habits appear in Figure 7.21. The interac on was sta s cally significant, b3 = –.100, t(320) = –5.148, p < .001. Age and habits were also sta s cally significant predictors of symptoms (i.e., older persons experienced a higher number of average symptoms compared with younger persons; as number of healthy habits increased, number of symptoms decreased). The effect of habits on symptoms should be interpreted in light of the significant interac on between age and habits. Descrip on Figure 7.19 Crea on of Centered Scores: Subtract Mean From Age to Create Age_c Descrip on Figure 7.20 SPSS Compute Variable Dialog Box to Create Product Interac on Term The overall regression model significantly predicted number of symptoms, F(3, 320) = 286.93, p < .001, R2 = .73. Both age and number of healthy habits were significant predictors of
symptoms. For age, b = .606, t(320) = 24.166, p < .001, sr2 = .49. As age increased, number of symptoms also increased. For habits, b = –1.605, t(320) = –8.394, p < .001, sr2 = .059. As number of healthy habits increased, symptoms decreased. These results should be interpreted in the context of the sta s cally significant interac on. For the interac on term, b = –.100, t(320) = –5.148, p < .001, sr2 = .02. To understand the nature of the interac on, a graph of predicted values of symptoms as a func on of selected values for both age and habits is needed. Descrip on Figure 7.21 Results for Regression That Includes Interac on Term Between Two Quan ta ve Variables for Data in the SPSS File Named thirdexample.sav 7.13 GRAPHING INTERACTION FOR TWO QUANTITATIVE PREDICTORS Graphing interac ons is simple in factorial ANOVA; all we need is one value (a group mean) for each cell in the ANOVA. It is also simple when one of the variables involved in the interac on is categorical; all we need is a regression line separately for each group. Graphing is more complex when both predictor variables are quan ta ve. The data analyst selects representa ve values for each of the predictor variables, uses the regression equa on to generate predicted scores on the outcome variable for these selected values of the predictor variables, and then graphs the resul ng predicted values. This interac on graph can be generated by hand, although procedures are somewhat tedious. The by-hand procedures used in Aiken and West (1991) are described in Appendix 7A. The following example uses a free downloadable program called Interac on available at h ps://www.danielsoper.com/interac on/default.aspx. Soper asks you to par cipate in a brief online experiment that involves searching a webpage, then provides a free download of his program. The icon for the Soper Interac on program is an orange exclama on point. When you open the program, you see the screen view in Figure 7.22. Select <File> → < New Interac on Analysis> from the pull-down menu. In the second screen, Figure 7.23, you browse to find the data file. In the third screen view, Figure 7.24, you indicate whether you have missing values (which can be blanks or specific score values that have been designated as missing). To assign roles to your variables (e.g., dependent, predictor, moderator) enter the variables in the appropriate windows in the Assign Model Variables screen (Figure 7.25). In this example it is obvious that symptoms is the dependent variable. You need to decide which of the X1, X2 predictors to define as the moderator. I chose age as the moderator in this variable because I wanted to see separate regression lines for different values of age. (You can also use this program if one or both of the predictors are categorical; if one predictor is categorical, like sex, you will almost certainly call that the moderator. You would obtain separate regression lines for each sex.) Figure 7.22 Step 1 in Soper Interac on Program: Request New Interac on Analysis
Descrip on Figure 7.23 Step 2 in Soper Interac on Program: Select Data Source Descrip on Figure 7.24 Step 3 in Soper Interac on Program: Iden fy Missing Values Descrip on Figure 7.25 Step 4 in Soper Interac on Program: Assign Model Variables Although we talked about the need to center scores on quan ta ve variables before compu ng the product, you do not need to check the box for “Center all con nuous predictors” at this step. This command does not affect the way interac on analysis is conducted; it just changes the labels on the X axis on your sca erplot. If you check this box, ck-mark values of X will correspond to devia ons from the mean of X. I prefer not to check the “Center all con nuous predictors” box; this results in a graph with raw score values of X on the horizontal axis of the graph. The next step (Figure 7.26) is to specify interac on levels. Here you tell the program for which selected values of age you want to see separate regression lines to predict symptoms from habits. These values of age are specified here in terms of z scores for age (distances of age from the mean). The most common choice is to graph lines for zmoderator (zage) = –1, 0, +1; or for zmoderator (zage) = –2, –1, 0, +1, +2). In other words, you will obtain lines for persons who are far below average in age, below average in age, average in age, above average in age, and far above average in age. (I do not recommend including z values of –3 and +3; when you consider values this far from the mean, there will be very few data points that correspond to the plo ed regression lines.) Descrip on Figure 7.26 Step 5 in Soper Interac on Program: Specify Interac on Levels The selected z values for X1 and X2 are the same. If you decide to use z values of –2, –1, 0, +1, and +2 for age, these are also the levels of habit that are used to find the slope to predict symptoms from habits; for example, z = –2 is far below average in number of habits, z = –1 is below average, z = 0 (also called z = M by Soper) is average, z = +1 is above average, and z = +2 is far above average in number of habits. Click Next and then Finish to draw your graph and obtain results for analysis of simple main effects, that is, es mated coefficients and significance tests for each separate regression line. Results appear in Figures 7.27 and 7.28. This graph makes the nature of the interac on clear. For the youngest persons (z age of –2) the slope to predict symptoms from habits is very shallow (it appears slightly nega ve, but in the analysis of simple main effects, it turns out that the regression coefficient is not sta s cally significant). In other words, the youngest persons have very few symptoms whether they prac ce many healthy habits or few. When you examine the line for the oldest adults (z = +2),
the regression line has a substan al nega ve slope. Among older adults, those who prac ce very few healthy habits have high predicted numbers of symptoms; the more healthy habits they prac ce, the smaller the number of predicted symptoms. Intermediate values of z (–1, 0, and +1) suggest that the strength of associa on of habits with symptoms gradually increases as you move from younger to older age groups. The z value labels on the X axis can be converted back into raw score values in years. Recall that X = z × SD + M. For age, M = 41.73 and SD = 9.505. To convert z for age to raw score in years, years = M + z × SD = 41.73 + z × 9.505. The following ages in years (Table 7.1) could be used to label the lines in Figure 7.27. For habits, M = 3.14 and SD = 1.254; z scores can be converted to raw score for number of habits using this equa on: 3.14 + z × 1.254. Sta s cal results for the analysis of simple main effects, and addi onal regression results, are obtained by clicking the “Sta s cal Output” tab (seen in Figure 7.27). Results are reported for analysis of the overall model with follow-up analysis of the simple main effects. Part of these results appear in Figure 7.28. (The simple main effects analysis (i.e., predic on of symptoms from habits within group) is shown only for the lowest age group, z = –2). Descrip on Figure 7.27 Interac on Graph for Age × Habits as Predictors of Symptoms (Generated by Soper Interac on Program) Descrip on Figure 7.28 Selected Sta s cal Results Generated by Soper Program: Sta s cal Significance Tests for Overall Model and Simple Main Effects for Persons with z Age = –2, the Youngest Group
7.14 RESULTS SECTION FOR INTERACTION OF TWO QUANTITATIVE PREDICTORS Following is an example of a “Results” sec on for the interac on of two quan ta ve predictors. Results A regression analysis was performed to assess whether healthy habits interact with age to predict number of symptoms of physical illness for a sample of N = 320 adults. Age ranged from 21 to 55 years, with M = 41.73, SD = 9.505. Number of healthy habits ranged from 0 to 7, with M = 3.14, SD = 1.254. Preliminary data screening did not suggest problems with assump ons of normality and linearity; there were no missing values or extreme outliers. Prior to forming a product term to represent an interac on between age and habits, scores on both variables were centered by subtrac ng sample means. The regression included age, habits, and an agebyhabits interac on term as predictors of symptoms. The overall regression was sta s cally significant, R = .854, R2 = .729, adjusted R2 = .726, F(3, 320) = 286.928, p < .001. Unstandardized regression coefficients are reported, unless otherwise specified. There was a significant Age × Habits interac on, b = –.100, t(320) = –5.148, p < .001, sr2 = .0225. There were also significant effects for age, b = .606, t(320) = 24.166, p < .001, sr2 = .494, and for habits, b = –1.605, t(320) = –8.394, p < .001, sr2 = .0595. The nature of the interac on was examined in a plot (Figure 7.27) and through follow-up analysis of simple main effects. As age increased, the slope to predict symptoms from habits became more nega ve. Within the youngest group of persons (with z scores of –2) the b raw score slope to predict symptoms from age was not sta s cally significant, t(320) = .745, p = .46, two-tailed; 95% confidence interval (CI) [–.477, +1.06]. Within all other age levels (corresponding to z values of –1 or above) the slopes to predict symptoms from habit were sta s cally significant and nega ve. For the youngest persons in the sample, number of
predicted symptoms was not associated with number of healthy habits. Among older par cipants, larger numbers of healthy habits predicted lower numbers of symptoms, and this associa on was strongest for the oldest adults. Addi onal informa on should be included, possibly in other sec ons of the research report. Descrip ve sta s cs for all variables in the analysis are o en reported before other analyses, o en as a table that includes M, SD, CI for M, and so forth. If complete results are included for all five of the regression lines in the analysis of simple main effects, this would include all the informa on for the regression analysis discussed above. It would probably be more convenient to report this in table form. 7.15 ADDITIONAL ISSUES AND SUMMARY Analysis of interac ons in regression is not as difficult as it may seem at first glance, but there are several things you need to keep in mind to avoid confusion. The easiest way to become confused is by losing track of which variables are centered and which are not. In examples in this chapter:
1. Scores for the Y outcome variable are not centered. 2. For categorical predictors with more than two groups, dummy variables are needed to
represent group membership. 3. When either or both predictors in an interac on are categorical variables, centering
scores on the two predictors is not necessary (Aiken & West, 1991). 4. When a sta s cally significant interac on is present, the main “story” in the “Results”
sec on should be about the nature of the interac on. 5. When both variables involved in an interac on are quan ta ve, center their scores
before compu ng the product that represents their interac on. However, when you set up the regression equa on, do not center Y or X1 or X2. The predic ve equa on should be of this form: Y′ (raw score, not centered) = b0 + b1X1 (not centered) + b2X2 (not centered) + b3 (product of centered X1 and centered X2 scores). Your results won’t be incorrect if you center Y and/or X1 and X2, but they will be confusing to interpret.
6. You should examine whether regression lines in your sca erplot extend into areas of the graph where there are few or no data points. You should not extrapolate a linear regression into ranges of scores where you have no data (Bodner, 2016).
7. This chapter did not include computa ons of sta s cal significance tests for the separate regression lines for subgroups, analysis of nonlinear interac ons, or analysis of three- way and higher order interac ons. For informa on about these, see Aiken and West (1991) and Jaccard and Turissi (2003).
When an interac on in a regression analysis involves a dummy variable and a con nuous variable, it is rela vely easy to describe the nature of the interac on; there are “different slopes for different folks” (e.g., separate regression lines to predict salary as a func on of years for men and women). When an interac on involves two quan ta ve predictors, slopes are obtained for selected values of each quan ta ve predictor, and these selected values are specified in terms
of z scores. In both cases, a graph that shows the slope to predict Y from X1, for selected values of X2, can be very helpful in understanding the nature of the interac on. Keep in mind that, as with any other result in sta s cs, a sta s cally significant interac on can be an instance of Type I error. If an interac on is sta s cally significant but was not predicted ahead of me, be skep cal; if you want to pursue it, do replica on studies. If an interac on makes no sense, don’t strain to come up with a story to explain it. APPENDIX 7A: GRAPHING INTERACTIONS BETWEEN QUANTITATIVE VARIABLES “BY HAND” It is easiest to work with z scores for X1 and X2 in this situa on. (You can change numerical labels in your final graph back to the original raw score units at the end of the process.) Using your en re data file, do the following to obtain the regression equa on you will use to generate predicted values to set up your graph. Convert the quan ta ve predictors, X1 and X2, to z scores. This can be done using <Analyze> → <Descrip ve Sta s cs> → <Descrip ves> and checking the box for “Save standardized values as variables.” The Descrip ves dialog box appears in Figure 7.29. Do not convert scores on the dependent variable Y into z scores. Use the SPSS <Transform> → <Compute Variable> command to create the product of zx1 and zx2, as shown in Figure 7.30. Run the regression to predict Y from zX1, zX2, and the product of these z scores. Results for this regression appear in Figure 7.31. On the basis of the results in Figure 7.31, write out the equa on to predict raw scores on Y (symptoms, in this example) from z scores on age (zage) and habits (zhabits) as follows. Note that when you write out this equa on, you need to use the unstandardized regression coefficients (because the dependent variable, symptoms, was not standardized). Other
Equa on with numerical values of coefficients: Other
Hold on to this equa on. You can now close the SPSS data file thirdexample.sav and create a new, much smaller data file in SPSS or Excel. I will call this new file worksheet.sav. Descrip on Figure 7.29 Saving z (Standardized) Scores for Predictor Variables Descrip on Figure 7.30 Computa on of Interac on Term (Using z Scores for Age and Habits)
Descrip on Figure 7.31 Regression Coefficients (From thirdexample.sav File) to Predict Symptoms From Age, Habits, and Their Interac on Enter the selected z scores for X1 (age) and X2 (habits) into the new file worksheet.sav. Column 1 is the selected z scores for age; column 2 is selected z scores for habits (I used the variable names selectedzage and selectedzhabits). Worksheet.sav in Figure 7.31 has one row for each of the possible 25 combina ons (with 5 selected values for age and 5 selected values for habits, e.g., –2, –1, 0, +1, and +2, there are 5 × 5 possible combina ons, so the worksheet file in Figure 7.32 has 25 rows). Use the SPSS <Transform>→ <Compute Variable> command, as shown in Figure 7.33, to obtain the product of the variables selectedzage and selectedzhabits. This will create a new variable, productofzscores, in the third column of worksheet.sav (this step is not shown in the screenshot). Figure 7.32 New SPSS File: worksheet.sav Descrip on Figure 7.33 Use of Transform Compute Statement to Compute Product Term for Interac on in worksheet.sav File Descrip on Figure 7.34 Computa on of Predicted Symptoms Scores on the Basis of Selected z-Score Values of Age and Habits in worksheet.sav File You now have the list of selected predictor values (in z-score terms, in columns 1, 2, and 3 of the file worksheet.sav) that will be subs tuted into the regression equa on from step 4 to obtained predicted symptoms (in raw-score units). These predicted values of symptoms are the data points used to graph the regression lines.
8. To generate predicted values of symptoms for all possible combina ons of values of zX1 and zX2, use <Transform> → <Compute Variable> again, as shown in Figure 7.34 (s ll within the worksheet.sav file). You use the regression equa on you obtained earlier to calculate a predicted Y value from the values of the first three columns in your worksheet file.
Other
A er the <Compute Variable> command in Figure 7.34 has been executed, there will be a fourth column in worksheet.sav that has values for predicted symptoms (not shown).
9. Finally, request a line graph for predicted Y (symptom) values as a func on of values of the other three variables in worksheet.sav (e.g., the selected values for X1, for X2, and for X1X2 (these predictors are all in z-score terms). In this example, X1 (age) is the moderator variable.
Use <Graphs> → <Legacy Dialogs> → <Sca er/Dot> to open the first sca erplot dialog box, and select “Mul ple.” In the Define Mul ple Line: Summaries for Groups of Cases dialog box in Figure 7.36, select the radio bu on for “Other sta s c (e.g., mean),” and enter the name of the outcome variable (in this example, predictedsymptoms). Enter the name of the moderator variable (in this example, selectedzage) in the “Define Lines by” box. Enter the name of the other predictor variable (in this example, selectedzhabit) in the “Category Axis” window. Then click OK. Descrip on Figure 7.35 Ini al Line Charts Window: Request Mul ple Lines The graph in Figure 7.37 was generated using SPSS and edited in Chart Editor to improve its appearance. The lines are iden cal to those in the interac on graph obtained from the Soper Interac on program (except that in Soper’s graph, the X axis labels are in raw scores, and in this graph, X axis labels are in z-score units). As discussed earlier, z-score labels for both the predictor (habits, on the X axis) and the moderator variable (age, the labels for the five separate lines) can be converted back into raw score units that can be used to label the line graphs that show interac ons, as shown in Tables 7.1 and 7.2. Descrip on Figure 7.36 Define Mul ple Line Selec ons to Obtain Plots for Age by Habits Interac on Using Scores in worksheet.sav Descrip on Figure 7.37 Graph of Age-by-Habits Interac on Generated Using Aiken and West (1991) Procedures (Edited in SPSS Chart Editor) COMPREHENSION QUESTIONS What preliminary graphs are needed to examine whether all associa ons between quan ta ve predictor and outcome variables are approximately linear? What graph would provide a preliminary assessment of possible modera on? Briefly explain how the approach to interac on analysis differs for these combina ons of types of predictor variables: two categorical independent variables, one categorical and one quan ta ve independent variable, and two quan ta ve independent variables. Assume a quan ta ve outcome variable. How do you center scores on a quan ta ve predictor variable in SPSS? Why is centering usually used when both predictors are quan ta ve?
The following ques ons involve interpreta on of interac on analyses that use unpublished data in the file qualityofrela onships.sav. The main ques on in the study was the degree to which well-being outcomes (such as happiness, life sa sfac on, and posi ve affect) could be predicted by social rela onship variables such as social stress and social support, by personality variables such as extraversion and neuro cism, and by categorical variables such as sex (coded 1 = male, 2 = female in this data set) and da ng (coded 1 = yes, 0 = no) and whether specific interac ons were present. The literature on stress and social support o en reports that social support has a “buffering” effect; that is, the happiness and well-being of people high in social support are not as nega vely affected by stress, compared with people with lower social support. It has also been suggested that for women, social support is more strongly related to life sa sfac on than for men. Other variables with self-explanatory names are included in the file, and you can use these if you want to run different interac on analyses. Does neuro cism moderate the effects of social stress on life sa sfac on? Specifically, does stress have a worse effect on life sa sfac on for people who score high on neuro cism than low on neuro cism? All three of these variables are quan ta ve. Figure 7.38 shows the SPSS regression results for predic on of life sa sfac on from neuro cism, social stress, and their interac on: Answer the following ques ons using the informa on in Figures 7.38, 7.39, and 7.40. Was the overall model (including all three predictors) sta s cally significant, and what was the effect size? Do you think I needed to center the scores on neuro cism and social stress before compu ng the product? Was the interac on sta s cally significant? Report the numerical values for the interac on (e.g., b, t, df, p). What addi onal informa on would you need to describe effect sizes (e.g., propor on of uniquely predicted variance for each predictor), and how would you obtain this informa on? The interac on graph appears in Figure 7.39. Describe the nature of the interac on. Was this consistent with the predic on? A selected part of the simple main effects analysis appears in Figure 7.40. Explain what is being examined in this analysis and interpret the results. Descrip on Figure 7.38 SPSS Output for Regression to Predict Life Sa sfac on From Neuro cism, Social Stress, and Their Interac on Descrip on Figure 7.39 Graph for Interac on Between Neuro cism and Social Stress as Predictors of Life Sa sfac on