DB #3
645
16.1 ♦ Definition of Mediation
Chapter 10 examined research situations that involve three variables and described sev- eral possible forms of interrelationship. One of these is mediation; this involves a set of causal hypotheses. An initial causal variable X1 may influence an outcome variable Y through a mediating variable X2. (Some books and websites use different notations for the three variables; for example, on Kenny’s mediation Web page, http://www.davidakenny .net/cm/mediate.htm, the initial causal variable is denoted X, the outcome as Y, and the mediating variable as M.) Mediation occurs if the effect of X1 on Y is partly or entirely “transmitted” by X2. A mediated causal model involves a causal sequence; first, X1 causes or influences X2; then, X2 causes or influences Y. X1 may have additional direct effects on Y that are not transmitted by X2. A mediation hypothesis can be represented by a diagram of a causal model. Note that the term causal is used because the path diagram represents hypotheses about possible causal influence; however, when data come from nonexperi- mental designs, we can only test whether a hypothesized causal model is consistent or inconsistent with a particular causal model. That analysis falls short of proof that any specific causal model is correct.
16.1.1 ♦ Path Model Notation
Path model notation was introduced in Chapter 10 (see Table 10.2), and it is briefly reviewed here. We begin with two variables (X and Y). Arrows are used to correspond to paths that represent different types of relations between variables. The absence of an arrow between X and Y corresponds to an assumption that these variables are not related in any way; they are not correlated or confounded, and they are not directly causally con- nected. A unidirectional arrow corresponds to the hypothesis that one variable has a causal influence on the other—for example, X → Y corresponds to the hypothesis that X causes or influences Y; Y → X corresponds to the hypothesis that Y causes or influences X. A bidirectional or double-headed arrow represents a noncausal association, such as correlation or confounding of variables that does not arise from any causal connection between them. In path diagrams, these double-headed arrows may be shown as curved lines.
16 MEDIATION
646——CHAPTER 16
If we consider only two variables, X and Y, there are four possible models: (1) X and Y are not related in any way (this is denoted in a path diagram by the absence of a path between X and Y), (2) X causes Y (X → Y), (3) Y causes X (Y → X), and (4) X and Y are correlated but not because of any causal influence (X Y).1 When a third variable is added, the number of possible relationships among the variables X1, X2, and Y increases substantially, as discussed in Chapter 10. One theoretical model corresponds to X1 and X2 as correlated causes of Y. For this model, the appropriate analysis is a regression to predict Y from both X1 and X2 (as discussed in Chapter 11). Another possible hypothesis is that X2 may be a moderator of the relationship between X1 and Y; this is also described as an interaction between X2 and X1 as predictors of Y. Statistical significance and nature of interaction can be assessed using the procedures described in Chapter 15. Chapter 10 outlined procedures for preliminary exploratory data analyses that can help a data analyst decide which of many possible patterns of relationship need to be examined in further analysis.
16.1.2 ♦ Circumstances When Mediation May Be a Reasonable Hypothesis
Because a mediated causal model includes the hypothesis that X1 causes or influences X2 and the hypothesis that X2 causes or influences Y, it does not make sense to consider mediation analysis in situations where one or both of these hypotheses would be non- sense. For X1 to be hypothesized as a cause of X2, X1 should occur before X2, and there should be a plausible mechanism through which X1 could influence X2. For example, sup- pose we are interested in a possible association between height and salary (a few studies suggest that taller people earn higher salaries). It is conceivable that height influences salary (perhaps employers have a bias that leads them to pay tall people more money). It is not conceivable that a person’s salary changes his or her height.
16.2 ♦ A Hypothetical Research Example Involving One Mediating Variable
The hypothetical data introduced in Chapter 10 as an illustration of a mediation hypoth- esis involved three variables: X1, age; X2, body weight, and Y, systolic blood pressure (BloodPressure or SBP). The data are in an SPSS file named ageweightbp.sav; the scores also appear in Table 10.3. The hypothetical dataset has N = 30 to make it easy to carry out the same analyses using the data in Table 10.3. Note that for research applications of mediation analysis, much larger sample sizes should be used.
For these variables, it is plausible to hypothesize the following causal connections. Blood pressure tends to increase as people age. As people age, body weight tends to increase (this could be due to lower metabolic rate, reduced activity level, or other factors). Other factors being equal, increased body weight makes the cardiovascular system work harder, and this can increase blood pressure. It is possible that at least part of the age-related increase in blood pressure might be mediated by age-related weight gain. Figure 16.1 is a path model that represents this mediation hypothesis for this set of three variables.
To estimate the strength of association that corresponds to each path in Figure 16.1, a series of three ordinary least squares (OLS) linear regression analyses can be run. Note
Mediation——647
that a variable is dependent if it has one or more unidirectional arrows pointing toward it. We run a regression analysis for each dependent variable (such as Y), using all variables that have unidirectional arrows that point toward Y as predictors. For the model in Figure 16.1, the first regression predicts Y from X1 (blood pressure from age). The second regression predicts X2 from X1 (weight from age). The third regression predicts Y from both X1 and X2 (blood pressure predicted from both age and weight).
16.3 ♦ Limitations of Causal Models
Path models similar to the one in Figure 16.1 are called “causal” models because each unidirectional arrow represents a hypothesis about a possible causal connection between two variables. However, the data used to estimate the strength of relationship for the paths are almost always from nonexperimental studies, and nonexperimental data cannot prove causal hypotheses. If the path coefficient between two variables such as X2 and Y (this coef- ficient is denoted b in Figure 16.1) is statistically significant and large enough in magni- tude to indicate a change in the outcome variable that is clinically or practically important, this result is consistent with the possibility that X2 might cause Y, but it is not proof of a causal connection. Numerous other situations could yield a large path coefficient between
X1 Age Y SBP
a b
X1 Age
X2 Weight
Y SBP c’
c
Figure 16.1 ♦ Hypothetical Mediation Example: Effects of Age on Systolic Blood Pressure (SBP)
NOTES: Top panel: The total effect of age on SBP is denoted by c. Bottom panel: The path coefficients (a, b, c′) that estimate the strength of hypothesized causal associations are estimated by unstandardized regression coefficients. The product a × b estimates the strength of the mediated or indirect effect of age on SBP, that is, how much of the increase in SBP that occurs as people age is due to weight gain. The c′ coefficient estimates the strength of the direct (also called partial) effect of age on SPB, that is, any effect of age on SBP that is not mediated by weight. The coefficients in this bottom panel decompose the total effect (c) into a direct effect (c′) and an indirect effect (a × b). When ordinary least squares regression is used to estimate unstandardized path coefficients, c = (a × b) + c′; the total relationship between age and SBP is the sum of the direct relationship between age and SBP and the indirect or mediated effect of age on SBP through weight.
648——CHAPTER 16
X2 and Y. For example, Y may cause X2; both Y and X2 may be caused by some third vari- able, X3; X2 and Y may actually be measures of the same variable; the relationship between X2 and Y may be mediated by other variables, X4, X5, and so on; or a large value for the b path coefficient may be due to sampling error.
16.3.1 ♦ Reasons Why Some Path Coefficients May Be Not Statistically Significant
If the path coefficient between two variables is not statistically significantly different from zero, there are also several possible reasons. If the b path coefficient in Figure 16.1 is close to zero, this could be because there is no causal or noncausal association between X2 and Y. However, a small path coefficient could also occur because of sampling error or because assumptions required for regression are severely violated.
16.3.2 ♦ Possible Interpretations for a Statistically Significant Path
A large and statistically significant b path coefficient is consistent with the hypothesis that X2 causes Y, but it is not proof of that causal hypothesis. Replication of results (such as values of a, b, and c′ path coefficients in Figure 16.1) across samples increases confi- dence that findings are not due to sampling error. For predictor variables and/or hypoth- esized mediating variables that can be experimentally manipulated, experimental studies can be done to provide stronger evidence whether associations between variables are causal (MacKinnon, 2008). By itself, a single mediation analysis only provides prelimi- nary nonexperimental evidence to evaluate whether the proposed causal model is plausi- ble (i.e., consistent with the data).
16.4 ♦ Questions in a Mediation Analysis
Researchers typically ask two questions in a mediation analysis. The first question is whether there is a statistically significant mediated path from X1 to Y via X2 (and whether the part of the Y outcome variable score that is predictable from this path is large enough to be of practical importance). Recall from the discussion of the tracing rule in Chapter 10 that when a path from X to Y includes more than one arrow, the strength of the relationship for this multiple-step path is obtained by multiplying the coefficients for each included path. Thus, the strength of the mediated relationship (the path from X1 to Y through X2 in Figure 16.1) is estimated by the product of the a × b (ab) coefficients. The null hypothesis of interest is H0: ab = 0. Note that the unstandardized regression coefficients are used for this significance test. Later sections in this chapter describe test statistics for this null hypothesis. If this mediated path is judged to be nonsignificant, the mediation hypothesis is not supported, and the data analyst would need to consider other explanations.
If there is a significant mediated path (i.e., the ab product differs significantly from zero), then the second question in the mediation analysis is whether there is also a sig- nificant direct path from X1 to Y; this path is denoted c′ in Figure 16.1. If c′ is not statisti- cally significant (or too small to be of any practical importance), a possible inference is that the effect of X1 on Y is completely mediated by X2. If c′ is statistically significant and large enough to be of practical importance, a possible inference is that the influence of X1
Mediation——649
on Y is only partially mediated by X2 and that X1 has some additional effect on Y that is not mediated by X2. In the hypothetical data used for the example in this chapter (in the SPSS file ageweightbp.sav), we will see that the effects of age on blood pressure are only partially mediated by body weight.
Of course, it is possible that there could be additional mediators of the effect of age on blood pressure, for example, age-related changes in the condition of arteries might also influence blood pressure. Models with multiple mediating variables are discussed briefly later in the chapter.
16.5 ♦ Issues in Designing a Mediation Analysis Study
A mediation analysis begins with a minimum of three variables. Every unidirectional arrow that appears in Figure 16.1 represents a hypothesized causal connection and must correspond to a plausible theoretical mechanism. A model such as age → body weight → blood pressure seems reasonable; processes that occur with advancing age, such as slow- ing metabolic rate, can lead to weight gain, and weight gain increases the demands on the cardiovascular system, which can cause an increase in blood pressure. However, it would be nonsense to propose a model of the following form: blood pressure → body weight → age, for example; there is no reasonable mechanism through which blood pressure could influence body weight, and weight cannot influence age in years.
16.5.1 ♦ Type and Measurement of Variables in Mediation Analysis
Usually all three variables (X1, X2, and Y) in a mediation analysis are quantitative. A dichotomous variable can be used as a predictor in regression (Chapter 12), and therefore it is acceptable to include an X1 variable that is dichotomous (e.g., treatment vs. control) as the initial causal variable in a mediation analysis; OLS regression methods can still be used in this situation. However, both X2 and Y are dependent variables in mediation analysis; if one or both of these variables are categorical, then logistic regression is needed to estimate regression coefficients, and this complicates the interpretation of out- comes (see MacKinnon, 2008, chap. 11).
It is helpful if scores on the variables can be measured in meaningful units because this makes it easier to evaluate whether the strength of influence indicated by path coef- ficients is large enough to be clinically or practically significant. For example, suppose that we want to predict annual salary in dollars (Y) from years of education (X1). An unstandardized regression coefficient is easy to interpret. A student who is told that each additional year of education predicts a $50 increase in annual salary will understand that the effect is too weak to be of any practical value, while a student who is told that each additional year of education predicts a $5,000 increase in annual salary will understand that this is enough money to be worth the effort. Often, however, measures are given in arbitrary units (e.g., happiness rated on a scale from 1 = not happy at all to 7 = extremely happy). In this kind of situation, it may be difficult to judge the practical significance of a half-point increase in happiness.
As in other applications of regression, measurements of variables are assumed to be reliable and valid. If they are not, regression results can be misleading.
650——CHAPTER 16
16.5.2 ♦ Temporal Precedence or Sequence of Variables in Mediation Studies
Hypothesized causes must occur earlier in time than hypothesized outcomes (tempo- ral precedence, as discussed in Chapter 1). It seems reasonable to hypothesize that “being abused as a child” might predict “becoming an abuser as an adult”; it would not make sense to suggest that being an abusive adult causes a person to have experiences of abuse in childhood. Sometimes measurements of the three variables X1, X2, and Y are all obtained at the same time (e.g., in a one-time survey). If X1 is a retrospective report of experiencing abuse as a child, and Y is a report of current abusive behaviors, then the requirement for temporal precedence (X1 happened before Y) may be satisfied. In some studies, measures are obtained at more than one point in time; in these situations, it would be preferable to measure X1 first, then X2, and then Y; this may help to establish temporal precedence. When all three variables are measured at the same point in time and there is no logical reason to believe one of them occurs earlier in time than the others, it may not be possible to establish temporal precedence.
16.5.3 ♦ Time Lags Between Variables
When measures are obtained at different points in time, it is important to consider the time lag between measures. If this time lag is too brief, the effects of X1 may not be appar- ent yet when Y is measured (e.g., if X1 is initiation of treatment with either placebo or Prozac, a drug that typically does not have full antidepressant effects until about 6 weeks, and Y is a measure of depression and is measured one day after X1, then the full effect of the drug will not be apparent). Conversely, if the time lag is too long, the effects of X1 may have worn off by the time Y is measured. Suppose that X1 is receiving positive feedback from a relationship partner and Y is relationship satisfaction, and Y is measured 2 months after X1. The effects of the positive feedback (X1) may have dissipated over this period of time. The optimal time lag will vary depending on the variables involved; some X1 inter- ventions or measured variables may have immediate but not long-lasting effects, while others may require a substantial time before effects are apparent.
16.6 ♦ Assumptions in Mediation Analysis and Preliminary Data Screening
Unless the types of variables involved require different estimation methods (e.g., if a dependent variable is categorical, logistic regression methods are required), the coeffi- cients (a, b, and c′) associated with the paths in Figure 16.1 can be estimated using OLS regression. All of the assumptions required for regression (see Chapters 9 and 11) are also required for mediation analysis. Because preliminary data screening has been presented in greater detail earlier, data screening procedures are reviewed here only briefly. For each variable, histograms or other graphic methods can be used to assess whether scores on all quantitative variables are reasonably normally distributed, without extreme outliers. If the X1 variable is dichotomous, both groups should have a reasonably large number of cases. Scatter plots can be used to evaluate whether relationships between each pair of variables appear to be linear (X1 with Y, X1 with X2, and X2 with Y) and to identify bivari- ate outliers. Decisions about handling any identified outliers should be made at an early stage in the analysis, as discussed in Chapter 4.
Mediation——651
Baron and Kenny (1986) suggested that a mediation model should not be tested unless there is a significant relationship between X1 and Y. In more recent treatments of media- tion, it has been pointed out that in situations where one of the path coefficients is nega- tive, there can be significant mediated effects even when X1 and Y are not significantly correlated (A. F. Hayes, 2009). This can be understood as a form of suppression; see Section 10.12.5.3 for further discussion with examples. If none of the pairs of variables in the model are significantly related to each other in bivariate analyses, however, there is not much point in testing mediated models.
16.7 ♦ Path Coefficient Estimation
The most common way to obtain estimates of the path coefficients that appear in Figure 16.1 is to run the following series of regression analyses. These steps are similar to those rec- ommended by Baron and Kenny (1986), except that, as suggested in recent treatments of mediation (MacKinnon, 2008), a statistically significant outcome on the first step is not considered a requirement before going on to subsequent steps.
Step 1. First, a regression is run to predict Y (blood pressure or SBP) from X1 (age). (SPSS procedures for this type of regression were provided in Chapter 9.) The raw or unstandard- ized regression coefficient from this regression corresponds to path c. This step is some- times omitted; however, it provides information that can help evaluate how much controlling for the X2 mediating variable reduces the strength of association between X1 and Y. Figure 16.2 shows the regression coefficients part of the output. The unstandardized regression coefficient for the prediction of Y (BloodPressure—note that there is no blank within the SPSS variable name) from X1 (age) is c = 2.862; this is statistically significant, t(28) = 6.631, p < .001. (The N for this dataset is 30; therefore, the df for this t ratio is N – 2 = 28.) Thus, the overall effect of age on blood pressure is statistically significant.
Step 2. Next a regression is performed to predict the mediating variable (X2, weight) from the causal variable (X1, age). The results of this regression provide the path coefficient for the path denoted a in Figure 16.1 and also the standard error of a (sa) and the t test for the statistical significance of the a path coefficient (ta). The coefficient table for this regression appears in Figure 16.3. For the hypothetical data, the unstandardized a path coefficient was 1.432, with t(28) = 3.605, p = .001.
Figure 16.2 ♦ Regression Coefficient to Predict Blood Pressure (Y) From Age (X1)
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) 10.398 26.222 .397 .695
Age 2.862 .432 .782 6.631 .000
a. Dependent Variable: BloodPressure
NOTE: The raw score slope in this equation, 2.862, corresponds to coefficient c in the path diagram in Figure 16.1.
652——CHAPTER 16
Step 3. Finally, a regression is performed to predict the outcome variable Y (blood pres- sure) from both X1 (age) and X2 (weight). (Detailed examples of regression with two predictor variables appeared in Chapter 11.) This regression provides estimates of the unstandardized coefficients for path b (and sb and tb) and also path c′ (the direct or remaining effect of X1 on Y when the mediating variable has been included in the analy- sis). See Figure 16.1 for the corresponding path diagram. From Figure 16.4, path b = .49, t(27) = 2.623, p = .014; path c′ = 2.161, t(27) = 4.551, p < .001. These unstandardized path coefficients are used to label the paths in a diagram of the causal model (top panel of Figure 16.5). These values are also used later to test the null hypothesis H0: ab = 0. In many research reports, particularly when the units in which the variables are measured are not meaningful or not easy to interpret, researchers report the standardized path coefficients (these are called beta coefficients on the SPSS output); the lower panel of Figure 16.5 shows the standardized path coefficients. Sometimes the estimate of the c coefficient appears in parentheses, next to or below the c′ coefficient, in these diagrams.
In addition to examining the path coefficients from these regressions, the data analyst should pay some attention to how well the X1 and X2 variables predict Y. From Figure 16.4, R2 = .69, adjusted R2 is .667, and this is statistically significant, F(2, 27) = 30.039, p < .001. These two variables do a good job of predicting variance in blood pressure.
16.8 ♦ Conceptual Issues: Assessment of Direct Versus Indirect Paths
When a path that leads from a predictor variable X to a dependent variable Y involves other variables and multiple arrows, the overall strength of the path is estimated by mul- tiplying the coefficients for each leg of the path (as discussed in the introduction to the tracing rule in Section 11.10).
16.8.1 ♦ The Mediated or Indirect Path: ab
The strength of the indirect or mediated effect of age on blood pressure through weight is estimated by multiplying the ab path coefficients. In many applications, one or more of the variables are measured in arbitrary units (e.g., happiness may be rated on a scale from 1 to 7). In such situations, the unstandardized regression coefficients may not be very
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) 78.508 24.130 3.254 .003
Age 1.432 .397 .563 3.605 .001
a. Dependent Variable: Weight
Figure 16.3 ♦ Regression Coefficient to Predict Weight (Mediating Variable X2) From Age (X1)
NOTE: The raw score slope from this equation, 1.432, corresponds to the path labeled a in Figure 16.1.
Mediation——653
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .831a .690 .667 36.692
a. Predictors: (Constant), Weight, Age
ANOVAb
Model Sum of Squares df Mean Square F Sig.
1 Regression 80882.132 2 40441.066 30.039 .000a
Residual 36349.735 27 1346.286
Total 117231.867 29
a. Predictors: (Constant), Weight, Age
b. Dependent Variable: BloodPressure
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) -28.046 27.985 -1.002 .325
Age 2.161 .475 .590 4.551 .000
Weight .490 .187 .340 2.623 .014
a. Dependent Variable: BloodPressure
Figure 16.4 ♦ Regression Coefficient to Predict Blood Pressure (Y) From Age (X1) and Mediating Variable Weight (X2)
NOTE: The raw score slope for a in this equation, 2.161, corresponds to the path labeled c′ in Figure 16.1; the raw score slope for weight in this equation, .490, corresponds to the path labeled b.
Figure 16.5 ♦ Path Coefficients for the Age/Weight/Systolic Blood Pressure (SBP) Mediation Analysis
a 1.432**
b .490*
2.161*
(c = 2.862***)
Age
Standardized Path Coefficients
Unstandardized Path Coefficients
Weight
SBP C’
a .563**
b .340*
.590***
(c = .782)
Age
Weight
SBP C’
*p < .05, **p < .01, ***p < .001, all two-tailed.
654——CHAPTER 16
informative, and research reports often focus on standardized coefficients.2 The stan- dardized (β) coefficients for the paths in the age/weight/blood pressure hypothetical data appear in the bottom panel of Figure 16.5. Throughout the remainder of this section, all path coefficients are given in standardized (β coefficient) form.
Recall from Chapter 10 that, when the path from X to Y has multiple parts or arrows, the overall strength of the association for the entire path is estimated by multiplying the coefficients for each part of the path. Thus, the unit-free index of strength of the medi- ated effect (the effect of age on blood pressure, through the mediating variable weight) is given by the product of the standardized estimates of the path coefficients, ab. For the standardized coefficients, this product = (.563 × .340) = .191. The strength of the direct or nonmediated path from age to blood pressure (SBP) corresponds to c′; the standard- ized coefficient for this path is .590. In other words, for a one–standard deviation increase in zAge, we predict a .191 increase in zSBP through the mediating variable zWeight. In addition, we predict a .590 increase in zSBP due to direct effects of zAge (effects that are not mediated by zWeight); this corresponds to the c′ path. The total effect of zAge on zSBP corresponds to path c, and the standardized coefficient for path c is .782 (the beta coefficient to predict zSBP from zAge in Figure 16.5).
16.8.2 ♦ Mediated and Direct Path as Partition of Total Effect
The mediation analysis has partitioned the total effect of age on blood pressure (c = .782) into a direct effect (c′ = .590) and a mediated effect (ab = .191). (Both of these are given in terms of standardized/unit-free path coefficients.) It appears that mediation through weight, while statistically significant, explains only a small part of the total effect of age on blood pressure in this hypothetical example. Within rounding error, c = c′ + ab, that is, the total effect is the sum of the direct and mediated effects. These terms are addi- tive when OLS regression is used to obtain estimates of coefficients; when other estima- tion methods such as maximum likelihood are used (as in structural equation modeling programs), these equalities may not hold. Also note that if there are missing data, each regression must be performed on the same set of cases in order for this additive associa- tion to work.
Note that even if the researcher prefers to label and discuss paths using standardized regression coefficients, information about the unstandardized coefficients is required to carry out additional statistical significance tests (to find out whether the product ab dif- fers significantly from zero, for example).
16.8.3 ♦ Magnitude of Mediated Effect
When variables are measured in meaningful units, it is helpful to think through the magnitude of the effects in real units, as discussed in this paragraph. (The discussion in this paragraph is helpful primarily in research situations in which units of measurement have some real-world practical interpretation.) All of the path coefficients in the rest of this paragraph are unstandardized regression coefficients. From the first regression analysis, the c coefficient for the total effect of age on blood pressure was c = 2.862. In simple language, for each 1-year increase in age, we predict an increase in blood pressure of 2.862 mm Hg. Based on the t test result in Figure 16.2, this is statistically significant.
Mediation——655
Taking into account that people in wealthy countries often live to age 70 or older, this implies substantial age-related increases in blood pressure; for example, for a 30-year increase in age, we predict an increase of 28.62 mm Hg in blood pressure, and that is suf- ficiently large to be clinically important. This tells us that the total effect of age on systolic blood pressure is reasonably large in terms of clinical or practical importance. From the second regression, we find that the effect of age on weight is a = 1.432; this is also statis- tically significant, based on the t test in Figure 16.3. For a 1-year increase in age, we pre- dict almost 1.5 pounds in weight gain. Again, over a period of 10 years, this implies a sufficiently large increase in predicted body weight (about 14.32 pounds) to be of clinical importance. The last regression (in Figure 16.4) provides information about two paths, b and c′. The b coefficient that represents the effect of weight on blood pressure was b = .49; this was statistically significant. For each 1-pound increase in body weight, we predict almost a half-point increase in blood pressure. If we take into account that people may gain 30 or 40 pounds over the course of a lifetime, this would imply weight-related increases in blood pressure on the order of 15 or 20 mm Hg. This also seems large enough to be of clinical interest. The indirect effect of age on blood pressure is found by multi- plying a × b, in this case, 1.432 × .49 = .701. For each 1-year increase in age, a .7–mm Hg increase in blood pressure is predicted through the effects of age on weight. Finally, the direct effect of age on blood pressure when the mediating variable weight is statistically controlled/taken into account is represented by c′ = 2.161. Over and above any weight- related increases in blood pressure, we predict about a 2.2-unit increase in blood pressure for each additional year of age. Of the total effect of age on blood pressure (a predicted 2.862–mm Hg increase in SBP for each 1-year increase in age), a relatively small part is mediated by weight (.701), and the remainder is not mediated by weight (2.161). (Because these are hypothetical data, this outcome does not accurately describe the importance of weight as a mediator in real-life situations.) The mediation analysis partitions the total effect of age on blood pressure (c = .2.862) into a direct effect (c′ = 2.161) and a mediated effect (ab = .701). Within rounding error, c = c′ + ab, that is, the total effect c is the sum of the direct (c′) and mediated (ab) effects.
16.9 ♦ Evaluating Statistical Significance
Several methods to test statistical significance of mediated models have been proposed. The four most widely used procedures are briefly discussed: Baron and Kenny’s (1986) causal-steps approach, joint significance tests for the a and b path coefficients, the Sobel test (Sobel, 1982) for H0: ab = 0, and the use of bootstrapping to obtain confidence inter- vals for the ab product that represents the mediated or indirect effect.
16.9.1 ♦ Causal-Steps Approach
Fritz and MacKinnon (2007) reviewed and evaluated numerous methods for testing whether mediation is statistically significant. A subset of these methods is described here. Their review of mediation studies conducted between 2000 and 2003 revealed that the most frequently reported method was the causal-steps approach described by Baron and Kenny (1986). In Baron and Kenny’s initial description of this approach, in order to
656——CHAPTER 16
conclude that mediation may be present, several conditions were required: first, a signifi- cant total relationship between X1, the initial cause, and Y, the final outcome variable (i.e., a significant path c); significant a and b paths; and a significant ab product using the Sobel (1982) test or a similar method, as described in Section 16.9.3. The decision whether to call the outcome partial or complete mediation then depends on whether the c′ path that represents the direct path from X1 to Y is statistically significant; if c′ is not statistically significant, the result may be interpreted as complete mediation; if c′ is statis- tically significant, then only partial mediation may be occurring. Kenny has also noted elsewhere (http://www.davidakenny.net/cm/mediate.htm) that other factors, such as the sizes of coefficients and whether they are large enough to be of practical significance, should also be considered and that, as with any other regression analysis, meaningful results can only be obtained from a correctly specified model.
This approach is widely recognized, but it is not the most highly recommended proce- dure at present for two reasons. First, there are (relatively rare) cases in which mediation may occur even when the original X1, Y association is not significant. For example, if one of the paths in the mediation model is negative, a form of suppression may occur such that positive direct and negative indirect effects tend to cancel each other out to yield a small and nonsignificant total effect. (If a is negative, while b and c′ are positive, then when we combine a negative ab product with a positive c′ coefficient to reconstitute the total effect c, the total effect c can be quite small even if the separate positive direct path and negative indirect paths are quite large.) MacKinnon, Fairchild, and Fritz (2007) refer to this as “inconsistent mediation”; the mediator acts as a suppressor variable. See Section 10.12.5.3 for further discussion and an example of inconsistent mediation. Second, among the methods compared by Fritz and MacKinnon (2007), this approach had relatively low statistical power.
16.9.2 ♦ Joint Significance Test
Fritz and MacKinnon (2007) also discussed a joint significance test approach to testing the significance of mediation. The data analyst simply asks whether the a and b coeffi- cients that constitute the mediated path are both statistically significant; the t tests from the regression results are used. (On his mediation Web page at http://www.davidakenny .net/cm/mediate.htm, Kenny suggested that if this approach is used, and if an overall risk of Type I error of .05 is desired, each test should use a = .025, two-tailed, as the criterion for significance.) This approach is easy to implement and has moderately good statistical power compared with the other test procedures reviewed by Fritz and MacKinnon (2007). However, it is not the most frequently reported method; journal reviewers may prefer bet- ter known procedures.
16.9.3 ♦ Sobel Test of H0: ab = 0
Another method to assess the significance of mediation is to examine the product of the a, b coefficients for the mediated path. (This is done as part of the Baron & Kenny [1986] causal-steps approach.) The null hypothesis, in this case, is H0: ab = 0. To set up a z test statistic, an estimate of the standard error of this ab product (SEab) is needed. Sobel (1982) provided the following approximate estimate for SEab.
Mediation——657
SE b s a sab a b≈
2 2 2 2+ , (16.1)
where
a and b are the raw (unstandardized) regression coefficients that represent the effect of X1 on X2 and the effect of X2 on Y, respectively;
sa is the standard error of the a regression coefficient;
sb is the standard error of the b regression coefficient.
Using the standard error from Equation 16.1 as the divisor, the following z ratio for the Sobel (1982) test can be set up to test the null hypothesis H0: ab = 0:
z = ab/SEab. (16.2)
The ab product is judged to be statistically significant if z is greater than +1.96 or less than –1.96. This test is appropriate only for large sample sizes. The Sobel (1982) test is relatively conservative, and among the procedures reviewed by Fritz and MacKinnon (2007), it had moderately good statistical power. It is sometimes used in the context of the Baron and Kenny (1986) causal-steps procedure and sometimes reported without the other causal steps. The Sobel test can be done by hand; Preacher and Hayes (2008) pro- vide an online calculator at http://people.ku.edu/~preacher/sobel/sobel.htm to compute this z test given either the unstandardized regression coefficients and their standard errors or the t ratios for the a and b path coefficients. Their program also provides z tests based on alternate methods of estimating the standard error of ab suggested by the Aroian test (Aroian, 1947) and Goodman test (Goodman, 1960).
The Sobel (1982) test was carried out for the hypothetical data on age, weight, and blood pressure. (Note again that the N in this demonstration dataset is too small for the Sobel test to yield accurate results; these data are used only to illustrate the use of the techniques.) For these hypothetical data, a = 1.432, b = .490, sa = .397, and sb = .187. These values were entered into the appropriate lines of the calculator provided at the Preacher Web page; the results appear in Figure 16.6. Because z = 2.119, with p = .034, two-tailed, the ab product that represents the effect of age on blood pressure mediated by weight can be judged statistically significant.
Note that the z tests for the significance of ab assume that values of this ab product are normally distributed across samples from the same population; it has been demonstrated empirically that this assumption is incorrect for many values of a and b. Because of this, authorities on mediation analysis (MacKinnon, Preacher, and their colleagues) now rec- ommend bootstrapping methods to obtain confidence intervals for estimates of ab.
16.9.4 ♦ Bootstrapped Confidence Interval for ab
Bootstrapping has become widely used in situations where the analytic formula for the standard error of a statistic is not known and/or there are violations of assumptions
658——CHAPTER 16
of normal distribution shape (Iacobucci, 2008). A sample is drawn from the population (with replacement), and values of a, b, and ab are calculated for this sample. This process is repeated many times (bootstrapping procedures typically allow users to request from 1,000 up to 5,000 different samples). The value of ab is tabulated across these samples; this provides an empirical sampling distribution that can be used to derive a value for the standard error of ab. Results of such bootstrapping indicate that the distribution of ab values is often asymmetrical, and this asymmetry should be taken into account when setting up confidence interval (CI) estimates of ab. This CI provides a basis for evalua- tion of the single estimate of ab obtained from analysis of the entire data set. Bootstrapped CIs do not require that the ab statistic have a normal distribution across samples. If this CI does not include zero, the analyst concludes that there is statistically significant mediation. Some bootstrapping programs include additional refinements, such as bias correction (see Fritz & MacKinnon, 2007). Most structural equation model- ing (SEM) programs, such as Amos, can provide bootstrapped CIs (a detailed example is presented in Section 16.13).
For data analysts who do not have access to an SEM program, Preacher and Hayes (2008) provide online scripts and macros for SPSS and SAS that provide bootstrapped CIs for tests of mediation (go to Hayes’s Web page at http://www.afhayes.com/spss-sas-and- mplus-macros-and-code.html and look for the link to download the SPSS script, on the
Figure 16.6 ♦ Sobel Test Results for H0: ab = 0, Using Calculator Provided by Preacher and Leonardelli at http://www.people.ku.edu/~preacher/sobel/sobel.htm
Input: Test statistic: Std. Error: p -value:
a 1.43 2 Sobel test: 2.119 0.330 0.034
b .490 Aroian test: 2.068 0.339 0.038
s a .397 Goodman test: 2.175 0.322 0.029
s b .187 R eset all
Input: Test statistic: p -value:
ta 3.60 5 Sobel test: 2.120 0.033
tb 2.62 3 Aroian test: 2.069 0.038
Goodman test: 2.176 0.029
R eset all
NOTE: This test is only recommended for use with large N samples. The dataset used for this example has N = 30; this was used only as a demonstration.
Mediation——659
line that says “Script: Indirect.sbs”; download the indirect.sbs file to your computer). An SPSS script is a syntax file that generates a dialog window for the procedure that makes it easy for the user to enter variable names and select options.
To run the script, open your SPSS data file; from the top-level menu on the Data View page, select the menu options <File> → <Open>; from the pull-down menu, select Script as the type of file to open. See Figure 16.7 for an SPSS screen shot. Then locate the file indirect.sbs downloaded from the Hayes website and open it. This will appear as shown in Figure 16.8. Do not modify the script in any way. To run the script, on the menu bar across the top of the indirect.SPS script window, click on the right arrow button (that resembles the play button on an audio or video player). This opens a dialog window for the Indirect procedure, as shown in Figure 16.9.
For the hypothetical data in this chapter, the dependent variable blood pressure is moved into the window for dependent variable Y. The proposed mediator is weight. The independent variable (X) is age. Note that this procedure allows entry of multiple media- tors; this will be discussed in a later section of the chapter; it also allows one or more covariates to be included in the analysis. Under the heading Bootstrap Samples, the num- ber of samples can be selected from a menu (with values that range from 1,000 to 5,000). The confidence level for the CI for ab is set at 95% as a default value, and this can be changed by the user. In addition, there are different choices of estimation procedures for the CI; the default is “Bias corrected and accelerated.” (Accelerated refers to a correction for possible skewness in the sampling distribution of ab.)
Figure 16.7 ♦ SPSS Menu Selections to Open the SPSS Indirect Script File
NOTE: <File> → <Open>, then select Script from the pull-down menu.
660——CHAPTER 16
When these selections have been made, click OK; the output appears in Figure 16.10. Many of the results duplicate those from the earlier regression results; for example, the estimates of the unstandardized path coefficients for paths a, b, c, and c′ are the same as those obtained using regression methods. From this printout, we can confirm that the (unstandardized) path coefficients are a = 1.432, b = .4897, c′ = 2.161, and c = 2.8622 (these agree with the regression values reported earlier, except for some rounding error). The value of ab = .7013. A normal theory test (i.e., a test that assumes that a z statistic similar to the Sobel test is valid) in the output from the Indirect procedure provides z = 2.1842; this is close to the Sobel test value reported in Figure 16.6.
Figure 16.8 ♦ SPSS Script Indirect.sbs in Syntax Editor Window (Preacher & Hayes, 2008)
Figure 16.9 ♦ SPSS Dialog Window for Indirect Script
Mediation——661
Run MATRIX procedure:
Dependent, Independent, and Proposed Mediator Variables: DV = BloodPre IV = Age MEDS = Weight
Sample size 30
IV to Mediators (a paths) Coeff se t p Weight 1.4321 .3972 3.6054 .0012
Direct Effects of Mediators on DV (b paths) Coeff se t p Weight .4897 .1867 2.6228 .0142
Total Effect of IV on DV (c path) Coeff se t p Age 2.8622 .4317 6.6308 .0000
Direct Effect of IV on DV (c-prime path) Coeff se t p Age 2.1610 .4749 4.5507 .0001
Model Summary for DV Model R-sq Adj R-sq F df1 df2 p .6899 .6670 30.0390 2.0000 27.0000 .0000
******************************************************************
NORMAL THEORY TESTS FOR INDIRECT EFFECTS
Indirect Effects of IV on DV through Proposed Mediators (ab paths) Effect se Z p TOTAL .7013 .3211 2.1842 .0289 Weight .7013 .3211 2.1842 .0289
*****************************************************************
Figure 16.10 ♦ Output From SPSS Indirect Script: One Mediating Variable
(Continued)
662——CHAPTER 16
Based on bootstrapping, the Indirect procedure also provides a 95% CI for the value of the indirect effect ab (again, this is in terms of unstandardized coefficients). The lower limit of this CI is .0769; the upper limit is 2.0792. Because this CI does not include zero, the null hypothesis that ab = 0 can be rejected.
16.10 ♦ Effect-Size Information
Effect-size information is usually given in unit-free form (Pearson’s r and r2 can both be interpreted as effect sizes). The raw or unstandardized path coefficients from mediation analysis can be converted to standardized slopes; alternatively, we can examine the cor- relation between X1 and X2 to obtain effect-size information for the a path, as well as the partial correlation between X2 and Y (controlling for X1) to obtain effect-size information for the b path. There are potential problems with comparisons among standardized regression or path coefficients. For example, if the same mediation analysis involving the same set of three variables is conducted in two different samples (e.g., a sample of women and a sample of men), these samples may have different standard deviations on variables such as the predictor X1 and the outcome variable Y. Suppose that the male and female samples yield b and c′ coefficients that are very similar, suggesting that the amount of change in Y as a function of X1 is about the same across the two groups. When we con- vert raw score slopes to standardized slopes, this may involve multiplying and dividing by
BOOTSTRAP RESULTS FOR INDIRECT EFFECTS
Indirect Effects of IV on DV through Proposed Mediators (ab paths) Data boot Bias SE TOTAL .7013 .7788 .0775 .5315 Weight .7013 .7788 .0775 .5315
Bias Corrected and Accelerated Confidence Intervals Lower Upper TOTAL .0769 2.0792 Weight .0769 2.0792
*****************************************************************
Level of Confidence for Confidence Intervals: 95
Number of Bootstrap Resamples: 5000
------ END MATRIX -----
Figure 16.10 ♦ (Continued)
Mediation——663
different standard deviations for men and women, and different standard deviations within these groups could make it appear that the groups have different relationships between variables (different standardized slopes but similar unstandardized slopes).
Unfortunately, both raw score (b) and standardized (β) regression coefficients can be influenced by numerous sources of artifact that may operate differently in different groups. Chapter 7 reviewed numerous factors that can artifactually influence the size of r (such as outliers, curvilinearity, different distribution shapes for X and Y, unreliability of measure- ment of X and Y, etc.). Chapter 11 demonstrated that β coefficients can be computed from bivariate correlations and that b coefficients are rescaled versions of β. When Y is the out- come and X is the predictor, b = β × (SDY/SDX). Both b and β coefficients can be influenced by many of the same problems as correlations. Therefore, if we try to compare regression coefficients across groups or samples, differences in regression coefficients across samples may be partly due to artifacts discussed in Chapter 7. Considerable caution is required whether we want to compare standardized or unstandardized coefficients.
Despite concerns about potential problems with standardized regression slopes (as discussed by Greenland et al., 1991), data analysts often include standardized path coef- ficients in reports of mediation analysis, particularly when some or all of the variables are not measured in meaningful units. In reporting results, authors should make it clear whether standardized or unstandardized path coefficients are reported. Given the difficul- ties just discussed, it is a good idea to include both types of path coefficients.
16.11 ♦ Sample Size and Statistical Power
Assuming that the hypothesis of primary interest is H0: ab = 0, how large does sample size need to be to have an adequate level of statistical power? Answers to questions about sample size depend on several pieces of information: the alpha level, desired level of power, the type of test procedure, and the population effect sizes for the strength of the association between X1 and X2, as well as X2 and Y. Often, information from past studies can help researchers make educated guesses about effect sizes for correlations between variables. In the discussion that follows, a = .05 and desired power of .80 are assumed. We can use the correlation between X1 and X2 as an estimate of the effect-size index for a and the partial correlation between X2 and Y, controlling for X1, as an estimate of the effect size for b. Based on recommendations about verbal labels for effect size given by Cohen (1988), Fritz and MacKinnon (2007) designated a correlation of .14 as small, a correlation of .39 as medium, and a correlation of .59 as large. They reported statistical power for combinations of small (S), medium (M), and large (L) effect sizes for the a and b paths. For example, if a researcher plans to use the Sobel (1982) test and expects that both the a and b paths correspond to medium effects, the minimum recommended sample size from Table 16.1 would be 90.
A few cautions are in order: Sample sizes from this table may not be adequate to guar- antee significance, even if the researcher has not been overly optimistic about anticipated effect size. Even when the power table suggests that fewer than 100 cases might be adequate for statistical power for the test of H0: ab = 0, analysts should keep in mind that small samples lead to more sampling error in estimates of path coefficients. For most studies that test mediation models, minimum sample sizes of 150 to 200 would be advisable if possible.
664——CHAPTER 16
ab Effect Sizea Joint Significanceb Sobelc Bootstrapped Confidence Intervald
SS 530 667 558
SM 403 422 406
SL 403 412 398
MS 405 421 404
MM 74 90 78
ML 58 66 59
LS 405 410 401
LM 59 67 59
LL 36 42 36
Table 16.1 ♦ Empirical Estimates of Sample Size Needed for Power of .80 When Using a = .05 as the Criterion for Statistical Significance in Three Different Types of Mediation Analysis
Fritz and MacKinnon (2007) have also made SAS and R programs available so that data analysts can input other values for population effect sizes and desired statistical power; see http://www.public.asu.edu/~davidpm/ripl/mediate.htm (scroll down to the line that says “Programs for Estimating Empirical Power”).
16.12 ♦ Additional Examples of Mediation Models
Several variations of the basic mediation model in Figure 16.1 are possible. For example, the effect of X1 on Y could be mediated by multiple variables instead of just one (see Figure 16.11). Mediation could involve a multiple-step causal sequence. Mediation and moderation can both occur together. The following sections provide a brief introduction to each of these research situations; for more extensive discussion, see MacKinnon (2008).
16.12.1 ♦ Tests of Multiple Mediating Variables
In many situations, the effect of a causal variable X1 on an outcome Y might be medi- ated by more than one variable. Consider the effects of personality traits (such as extra- version and neuroticism) on happiness. Extraversion is moderately positively correlated with happiness. Tkach and Lyubomirsky (2006) suggested that the effects of trait extra- version on happiness may be at least partially mediated by behaviors such as social activ- ity. For example, people who score high on extraversion tend to engage in more social
SOURCE: Adapted from Fritz and MacKinnon (2007, Table 3, p. 237).
NOTE: These power estimates may be inaccurate when measures of variables are unreliable, assumptions of normality are violated, or categorical variables are used rather than quantitative variables.
a. SS indicates both a and b are small effects; SM indicates a is small and b is medium; SL indicates a is small and b is large.
b. Joint significance test: Requirement that the a and b coefficients each are statistically significant.
c. A z test for H0: ab using a method to estimate SEab proposed by Sobel (1982).
d. Without bias correction.
Mediation——665
activities, and people who engage in more social activities tend to be happier. They demonstrated that, in their sample, the effects of extraversion on happiness were par- tially mediated by engaging in social activity, but there was still a significant direct effect of extraversion on happiness. Their mediation analyses examined only one behavior at a time as a potential mediator. However, they also noted that there are many behaviors (other than social activity) that may influence happiness. What happens if we consider multiple behaviors as possible mediators? The SPSS script Indirect.sbs (discussed in Section 16.9.4) can be used to conduct simultaneous tests for more than one mediating variable. Figure 16.11 shows standardized path coefficients obtained
Figure 16.11 ♦ Path Model for Multiple Mediating Variables Showing Standardized Path Coefficients
Positive/ Proactive Behaviors
Happiness
Spiritual Behaviors
Health Behaviors
Extraversion
.45**
.13**
.33***
.44***(.60***)
.24**
.08
.16**
SOURCE: Adapted from Warner and Vroman (2011).
NOTE: Coefficient estimates and statistical significance testes were obtained using the Indirect.sps script (output not shown). The effect of extraversion on happiness was partially mediated by behaviors. Positive/proactive behaviors (a1 × b1) and health behaviors (a3 × b3) were significant mediators; spiritual behaviors did not significantly mediate effects of extraversion on happiness.
666——CHAPTER 16
using the Indirect.sps script to test a multiple-mediation model (Warner & Vroman, 2011) that included three different behaviors as mediators between extraversion and hap- piness. (Output similar to Figure 16.10 was obtained but is not included here.) Results indicated that the effects of extraversion on happiness were only partially mediated by behavior. Positive/prosocial behaviors and health behaviors were both significant media- tors of the effect of extraversion on happiness. Spiritual behaviors did not significantly mediate the effects of extraversion on happiness (the path from spiritual behaviors to happiness was not statistically significant).
16.12.2 ♦ Multiple-Step Mediated Paths
It is possible to examine a mediation sequence that involves more than one intermedi- ate step, as in the sequence X1 → X2 → X3 → Y. If only partial mediation occurs, addi- tional paths would need to be included in this type of model; for further discussion, see Taylor, MacKinnon, and Tein (2008).
16.12.3 ♦ Mediated Moderation and Moderated Mediation
It is possible for moderation (as described in Chapter 15) to co-occur with mediation in two different ways. Mediated moderation occurs when two initial causal variables (let’s call these variables A and B) have an interaction (A × B), and the effects of this interaction involve a mediating variable. In this situation, A, B, and the A × B interaction are included as initial causal variables, and the mediation analysis is conducted to assess the degree to which a potential mediating variable explains the impact of the A × B interaction on the outcome variable.
Moderated mediation occurs when you have two different groups (e.g., males and females), and the strength or signs of the paths in a mediation model for the same set of variables differ across these two groups. Many structural equation modeling programs, such as Amos, make it possible to compare path models across groups and to test hypoth- eses about whether one, or several, path coefficients differ between groups (e.g., males vs. females). Further discussion can be found in Edwards and Lambert (2007); Muller, Judd, and Yzerbyt (2005); and Preacher, Rucker, and Hayes (2007). Comparison of models across groups using the Amos structural equation modeling program is demonstrated by Byrne (2009).
16.13 ♦ Use of Structural Equation Modeling Programs to Test Mediation Models
SEM programs such as LISREL, EQS, MPLUS, and Amos make it possible to test models that include multiple-step paths (e.g., mediation hypotheses) and to compare results across groups (to test moderation hypotheses). In addition, SEM programs make it pos- sible to include multiple indicator variables for some or all of the constructs; in theory, this makes it possible to assess multiple indicator measurement reliability. Most SEM programs now also provide bootstrapping; most analysts now view SEM programs as the preferred method for assessment of mediated models. More extensive discussion of other
Mediation——667
types of analyses that can be performed using structural equation modeling is beyond the scope of this book; for further information, see Byrne (2009) or Kline (2010).
16.13.1 ♦ Comparison of Regression and SEM Tests of Mediation
As described in earlier sections of this chapter, simple mediated models can be tested by using OLS linear regression in SPSS and then conducting the Sobel test to assess whether the indirect path(s) are significant. In the following example, Amos Graphics will be used to analyze the same empirical example. Amos is an add-on structural equation modeling program for IBM SPSS that is licensed separately from IBM SPSS. Use of SEM programs provides two advantages compared to the regression methods described earlier in this chapter. First, they make it possible to test more complex path models involving a larger number of variables. Second, most SEM programs provide bootstrapped confidence intervals and associated statistical significance tests for ab indirect paths; bootstrapped confidence intervals are now regarded as the best method for statistical significance testing for indirect effects, particularly when assumptions of normality may be violated. In this section, Amos is used only to perform one specific type of analysis, that is, to obtain CIs and significance tests for the ab indirect effect for a simple three-variable mediated model.
16.13.2 ♦ Steps in Running Amos
Running analyses in Amos Graphics involves the following steps, each of which is dis- cussed in more detail in subsequent sections.
1. Open the Amos Graphics program and use the drawing tools to draw a path dia- gram that represents the hypothesized mediated causal model.
2. Name the variables in this diagram (the variable names must correspond exactly to the names of the variables in the SPSS data file).
3. Open the SPSS data file.
4. Edit the Analysis Properties to specify how the analysis will be performed and what output you want to see.
5. Run the analysis and check to make sure that the analysis ran successfully; if it did not, you may need to correct variable names and/or make changes in the path model.
6. View and interpret the output.
16.13.3 ♦ Opening the Amos Graphics Program
From the Windows operating system, begin with the <Start> Menu (usually this is in the lower left corner of the screen). When you click the <Start> button, make the following selections from the popup menus, as shown in Figure 16.12: <IBM SPSS Statistics> → <IBM SPSS Amos 19> → <Amos Graphics>. The initial view of the Amos worksheet appears in Figure 16.13 (if there is already a path diagram in the right-hand panel, click the <File> → <New> menu selections from the top menu bar to start with a blank worksheet).
668——CHAPTER 16
Figure 16.12 ♦ Initial Menu Selection From Start Menu to Start Amos 19 Amos Graphics Program
NOTE: From the <Start> Menu, Select <IBM SPSS Statistics> → <IBM SPSS Amos 19> → <Amos Graphics>.
Figure 16.13 ♦ Initial Screen View in Amos Graphics
Mediation——669
The Amos Graphics worksheet has several parts. Across the top, as in most Windows applications, there is a menu bar. Down the left-hand side are icons that represent numer- ous tools for drawing and modifying models and doing other operations (shown in greater detail in Figure 16.14). Just to the right of the tools is a set of small windows with headings: Group Number 1, Default Model, and so forth. (In this example, only one group is used; Amos can estimate and compare model parameters for multiple groups; for example, it can compare mediated models for males and females.) These windows are used later to select which part of the output you want to see. To the right, the largest win- dow is a blank drawing sheet that provides space for you to draw a path model that rep- resents your hypotheses about causal connections.
Figure 16.14 ♦ Amos Drawing Tools
16.13.4 ♦ Amos Tools
In this brief introduction, only a few of the drawing tools in Figure 16.14 are used (Byrne, 2009, provides more extensive examples of tool use). Beginning in the upper left- hand corner: The rectangle tool creates a rectangle that corresponds to an observed (measured) variable. (An example at the end of Chapter 20 also includes latent variables; these are represented by ovals.) The single-headed arrow tool is used to draw a causal path (a detailed discussion of types of paths was presented in Chapter 10). (The double- headed arrow tool , not used in this example, is used to indicate that predictor varia- bles are correlated.) The tool is used to create an error term for each dependent variable in the path model (error terms must be explicitly included in SEM path models). Three additional tools that are not used in this example are useful to know: The moving truck tool can be used to move objects in the path model, the delete tool is used to delete objects from the graph, and the clipboard is used to copy an Amos path model into the Windows clipboard so that it can be pasted into other applications (such as Word or PowerPoint).
670——CHAPTER 16
16.13.5 ♦ First Steps Toward Drawing and Labeling an Amos Path Model
The path model for this example is the same as the one that appeared earlier in the bottom panel of Figure 16.1. The goal of the analysis is to assess to degree to which the effects of age on blood pressure may be mediated by weight. All of the steps are shown below; you can see a similar analysis (using different variable names) as an animated tutorial at this URL: http://amosdevelopment.com/video/indirect/flash/indirect.html (you need the Adobe Flash player to view this animation).
To draw the path model, start with the observed variables. Left click on the rectangle tool, move the cursor over to the blank worksheet on the right, then right click; a popup
menu appears; left click on the menu option to “draw observed variable” (see top panel of Figure 16.15). The popup menu will then disappear. Left click (and continue to hold the button on the mouse down) on the blank worksheet in the location where you want the variable to appear and drag the mouse; a rectangle will appear. Drag the mouse until the location and dimensions of the rectangle look the way you want and then release the mouse button. Your worksheet should now contain a rectangle similar to the one that appears in the bottom panel of Figure 16.15.
Figure 16.15 ♦ Drawing a Rectangle That Corresponds to an Observed/Measured Variable
Mediation——671
To give this variable a name, point the cursor at the rectangle and right click. From the popup menu that appears (as shown in Figure 16.16), click on Object Properties. This opens the Object Properties dialog window; this appears in Figure 16.16 near the bottom of the worksheet. In the space to the right of “Variable name,” type in the name of the first variable (age). Font size and style can be modified. (Variable labels are not used in this example. The name typed in the Variable name window must correspond exactly to the name of the variable in the SPSS data file. If you want to have a different label appear in the Amos diagram, enter this in the box for the Variable label.)
16.13.6 ♦ Adding Variables and Paths to the Amos Path Diagram
For this analysis, the path model needs to include the following additional elements. Rectangles must be added for the other observed variables (weight, blood pressure). Note that conventionally, causal sequences are diagrammed from left to right (or from top to bottom). Age is the initial cause, and so it is placed on the left. Blood pressure is the final outcome, so it is placed on the right. The hypothesized mediator, weight, is placed between and above the other two variables, as shown in Figure 16.17.
To add paths to the model, left click on the unidirectional arrow tool , left click on the initial causal variable (rectangle) in the path model and continue to hold the mouse button down, and drag the mouse until the cursor points at the outcome or dependent variable, then release the mouse button. An arrow will appear in the diagram. For this model, you need three unidirectional arrows: from age to weight, from weight to blood pressure, and from weight to blood pressure, as shown in Figure 16.17.
Figure 16.16 ♦ The Object Properties Popup Menu
672——CHAPTER 16
16.13.7 ♦ Adding Error Terms for Dependent Variables
Each dependent variable (a variable is dependent if it has a unidirectional arrow pointing toward it) must have an explicit error term. To create the error terms shown in Figure 16.17, left click on the error term tool , move the mouse to position the cursor over a dependent variable such as weight, and left click again. An error term (a circle with an arrow that points toward the observed variable) will appear in the path model. Note that this arrow has a coefficient of 1 associated with it; this predetermined value for this path is required so that Amos can scale the error term to be consistent with the variance of the observed variable. In Figure 16.17, this 1 was edited to display as a larger font than initially appeared in Amos Graphics; to do this, right click near the arrow that represents this path (positioning is tricky for this) and click on the Object Properties window; within the Object Properties window, the font size for this path coefficient can be changed. Each error term also is preassigned a mean of 0; for this reason, a small 0 appears near each circle that represents an error term. You must give each error term a name, and the names for error terms must not correspond to the names of any SPSS observed variables. It is conventional to give error terms brief names such as e1 and e2, as shown in Figure 16.17.
16.13.8 ♦ Correcting Mistakes and Printing the Path Model
During this process, if you make a mistake or want to redraw some element of the model, you can use the delete tool to remove any variable or path from the model. (Amos has other tools that can be used to make the elements of these path model dia- grams look nicer, such as the moving truck ; see Byrne, 2009, for details.) If you want
Figure 16.17 ♦ Final Path Model for the Mediation Analysis
Age BloodPressure
Weight
e1
e2
0,
0,
1
1
Mediation——673
to paste a copy of this diagram into a Word document or other application, left click on the clipboard icon in the tool bar ( ) and then use the Paste command within Word. When you have completed all these drawing steps, your path diagram should look similar to the final path model that appears in Figure 16.17.
16.13.9 ♦ Opening a Data File From Amos
The next step is to open the SPSS data file that contains scores for the observed varia- bles in this model (age, weight, blood pressure). From the top-level menu, make the fol- lowing selections: <File> → <Data Files…>, as shown in Figure 16.18. This opens the dialog window for Data Files, as shown in Figure 16.19. Click on the File Name button to open a browsing window (not shown here); in this window, you can navigate to the folder that contains your SPSS data file. (The first time you open this window, the default direc- tory is Amos examples; you will need to navigate to one of your own data directories to locate your data file.) When you have located the SPSS data file (for this example, it is the file named ageweightbp.sav), highlight it, then click the Open button and then the OK button. This will return you to the screen view in Figure 16.17.
16.13.10 ♦ Specification of Analysis Method and Request for Output
The next step is to tell Amos how to do the analysis and what output you want to see. To do this, go to the top-level menu (as shown in Figure 16.17) and make these menu selec- tions (see Figure 16.20): <View> → <Analysis Properties…>. This opens the Analysis Properties dialog window, as shown in Figure 16.21. Note the series of tabs across the top of this window from left to right; only a few of these are used in this example. Click on the
Figure 16.18 ♦ Amos Menu Selections to Open the SPSS Data File
674——CHAPTER 16
“Estimation” tab to specify the estimation method, select the radio button for “Maximum likelihood,” and check the box for “Estimate means and intercepts.” The radio button for “Fit the saturated and independence models”3 is also selected in this example. (Amos is not very forgiving about missing data. Some options are not available, and other options must be selected, if the SPSS data file contains any missing values. These limitations can be avoided by either removing cases with missing data from the SPSS data file or using impu- tation methods to replace missing values; refer to Chapter 4 for further discussion of miss- ing values in SPSS Statistics.)
Next, still in the “Analysis Properties” dialog window, click on the “Output” tab; in the checklist that appears (see Figure 16.22), check the boxes for “Minimization history,” “Standardized estimates,” “Squared multiple correlations,” and “Indirect, direct and total effects.”
Figure 16.19 ♦ Amos Dialog Window: Data Files
Figure 16.20 ♦ Amos Pull-Down Menu: Analysis Properties
Mediation——675
Figure 16.21 ♦ Estimation Tab in Analysis Properties Dialog Window
Figure 16.22 ♦ Output Tab in Analysis Properties Dialog Window
676——CHAPTER 16
Continuing in the “Analysis Properties” dialog window, click on the “Bootstrap” tab. Click the checkbox for “Perform bootstrap,” and in the window for “Number of bootstrap samples,” type in a reasonably large number (usually between 1,000 and 5,000; in this example, 2,000 bootstrap samples were requested). Also check the box for “Bias-corrected confidence intervals.”
To finish work in the “Analysis Properties” window, click the X in the upper right-hand corner of this window to close it. This returns you to the screen view that appears in Figure 16.17.
16.13.11 ♦ Running the Amos Analysis and Examining Preliminary Results
The next step is the run the requested analysis. From the top-level menu (as it appears in Figure 16.17), make the following menu selections: <Analyze> → <Calculate Estimates>. After you do this, new information appears in the center column of the work- sheet that reports preliminary information about results, as shown in Figure 16.24. Numbers were added to this screen shot to highlight the things you will want to look at. Number 1 points to this element in the screen: . This pair of icons provides a way to toggle between two views of the model in Amos. If you click on the left-hand icon, this puts you in model specification mode; in this view, you can draw or modify the path model. When you click on the right-hand icon, results of the most recent analysis are displayed as path coefficients superimposed on the path model, as shown in Figure 16.24.
Figure 16.23 ♦ Bootstrap Tab in Analysis Properties Window
Mediation——677
The first thing you need to know after you have run an analysis is whether the analysis ran successfully. Amos can fail to run an analysis for many reasons—for example, the path model was not drawn correctly, or missing data in the SPSS file require different specifications for the analysis. See number 2 in Figure 16.24; you may need to scroll up and down in this window. If the analysis failed to run, there will be an error message in this window. If the analysis ran successfully, then numerical results (such as the chi- square4 for model fit) will appear in this window.
16.13.12 ♦ Unstandardized Path Coefficients on Path Diagram
Path coefficients now appear on the path model diagram, as indicated by number 3 in Figure 16.24 (the initial view shows unstandardized path coefficients). The user can tog- gle back and forth between viewing unstandardized versus standardized coefficient esti- mates by highlighting the corresponding terms in the window indicated by number 4. In Figure 16.24, the standardized path coefficients (these correspond to b coefficients in regression) appear. When the user highlights the option for standardized coefficients indicated by number 4, the path model is displayed with standardized coefficients (these correspond to b coefficients in regression), as shown in Figure 16.25. Object properties were modified to make the display fonts larger for these coefficients. Because this model does not include latent variables, the values of path coefficients reported by Amos are the same as those reported earlier from linear regression analyses (see Figure 16.5).5 The values adjacent to the rectangles that represent weight (.32) and blood pressure (.69), in
Figure 16.24 ♦ Project View After Analysis Has Run Successfully
NOTE: Initial view shows unstandardized (b) path coefficients.
678——CHAPTER 16
Figure 16.25, are the squared multiple correlations or R2 values for the prediction of these dependent variables. (Sometimes the locations of the numbers on Amos path model dia- grams do not make it clear what parameter estimates they represent; ambiguity about this can be resolved by looking at the text output, as described next.)
16.13.13 ♦ Examining Text Output From Amos
To view the text output, from the top-level menu, make the following menu selections: <View> → <Text Output>, as shown in Figure 16.26.
This opens up the Text Output window. The left-hand panel of this window provides a list of the output that is available (this is similar to the list of output that appears on the left-hand side of the SPSS Statistics output window). Only selected output will be exam- ined and interpreted here. Use the cursor to highlight the Estimates portion of the output, as shown on the left-hand side of Figure 16.28. The complete output for Amos estimates (of path coefficients, multiple R2 values, indirect effects, and other results) appears in Figure 16.29.
In Figure 16.29, the unstandardized path coefficients are reported where it is marked with the letter a; these coefficients correspond to the b/unstandardized regression coef- ficients reported earlier (in Figure 16.4, for example). The column headed C. R. (this stands for “critical ratio,” and this is similar but not identical to a simple t ratio) reports the ratio of each path coefficient estimate to its standard error; the computation of stan- dard error is different in SEM than in linear regression. The p value (shown as capital P in
Figure 16.25 ♦ Standardized Path Coefficients and Squared Multiple Correlations
Mediation——679
Figure 16.26 ♦ Amos Menu: View Text Output
Figure 16.27 ♦ List of Available Text Output (Left-Hand Side)
Figure 16.28 ♦ Screen View: Amos Estimates
680——CHAPTER 16
Figure 16.29 ♦ Complete Amos Estimates
Mediation——681
Amos output) appears as *** by default when it is zero to more than three decimal places. Double clicking on any element in the output (such as a specific p value) opens up a text box that provides an explanation of each term. Although the C. R. values are not identical to the t values obtained when the same analysis was performed using linear regression earlier (results in Figure 16.4), the b coefficient estimates from Amos and the judgments about their statistical significance are the same as for the linear regression results. The output table labeled “b” contains the corresponding standard- ized path coefficients.
682——CHAPTER 16
Moving to the bottom of Figure 16.28, unstandardized and standardized estimates of the strength of the indirect effect (denoted ab in earlier sections of this chapter) are reported where the letter c appears.
16.13.14 ♦ Locating and Interpreting Output for Bootstrapped CI for the ab Indirect Effect
To obtain information about statistical significance, we must examine the output for bootstrapped CIs. To see this information, double click the left mouse button on the Estimates in the list of available output (upper left-hand panel of Figure 16.30). This opens up a list that includes Scalars and Matrices. Double click the left mouse button on Matrices to examine the options within this Matrices list. Within this list, select Indirect effects (move the cursor to highlight this list entry and then left click on it). Now you will see the Estimates/Bootstrap menu in the window in the lower left-hand side of Figure 16.30. Left click on “Bootstrap confidence.” (To see an animation that shows this series of menu selections, view the video at this URL: http://amosdevelopment.com/video/ indi rect/flash/indirect.html.) The right-hand panel in Figure 16.30 shows the 95% CI results for the estimate of the unstandardized ab indirect effect (in this example, the effect of age on blood pressure, mediated by weight). The lower and upper limits of this 95% CI are .122 and 2.338. The result of a statistical significance test for H0: ab = 0, using an error term derived from bootstrapping, is p = .011. While there are differences in some numeri- cal values, the Amos analysis of the mediation model presented earlier (Figure 16.1) was generally similar to the results obtained used linear regression.
Figure 16.30 ♦ Output for Bootstrapped Confidence Interval for ab Indirect of Mediated Effect of Age on Blood Pressure Through Weight
NOTE: For an animated demonstration of the series of selections in the text output list that are required to view this result, view the video at this URL: http://amosdevelopment.com/video/indirect/flash/indirect.html.
Mediation——683
16.13.15 ♦ Why Use Amos/SEM Rather Than OLS Regression?
There are two reasons why it is worthwhile to learn how to use Amos (or other SEM programs) to test mediated models. First, it is now generally agreed that bootstrapping is the preferred method to test the statistical significance of indirect effects in mediated models; bootstrapping may be more robust to violations of assumptions of normality. Second, once a student has learned to use Amos (or other SEM programs) to test simple mediation models similar to the example in this chapter, the program can be used to add additional predictor and/or mediator variables, as shown in Figure 16.11. SEM programs have other uses that are briefly discussed at the end of Chapter 20; for example, SEM programs can be used to do confirmatory factor analysis, and SEM models can include latent variables with multiple indicators.
16.14 ♦ Results Section
For the hypothetical data in this chapter, a Results section could read as follows. Results presented here are based on the output from linear regression (Figures 16.2–16.4) and the Sobel test result in Figure 16.6. (Results would include slightly different numerical values if the Amos output is used.)
Results
A mediation analysis was performed using the Baron and Kenny (1986) causal- steps approach; in addition, a bootstrapped confidence interval for the ab indi- rect effect was obtained using procedures described by Preacher and Hayes (2008). The initial causal variable was age, in years; the outcome variable was systolic blood pressure (SBP), in mm Hg; and the proposed mediating variable was body weight, measured in pounds. [Note to reader: The sample N, mean, standard deviation, minimum and maximum scores for each variable, and cor- relations among all three variables would generally appear in earlier sections.] Refer to Figure 16.1 for the path diagram that corresponds to this mediation hypothesis. Preliminary data screening suggested that there were no serious violations of assumptions of normality or linearity. All coefficients reported here are unstandardized, unless otherwise noted; a = .05 two-tailed is the criterion for statistical significance.
The total effect of age on SBP was significant, c = 2.862, t(28) = 6.631, p < .001; each 1-year increase in age predicted approximately a 3-point increase in SBP in mm Hg. Age was significantly predictive of the hypothesized mediating variable, weight; a = 1.432, t(28) = 3.605, p = .001. When controlling for age, weight was significantly predictive of SBP, b = .490, t(27) = 2.623, p = .014. The estimated direct effect of age on SBP, controlling for weight, was c′ = 2.161, t(27) = 4.551, p < .001.
SBP was predicted quite well from age and weight, with adjusted R2 = .667 and F(2, 27) = 30.039, p < .001.
The indirect effect, ab, was .701. This was judged to be statistically significant using the Sobel (1982) test, z = 2.119, p = .034. [Note to reader: The Sobel test should be used only with much larger sample sizes than the N of 30 for this hypothetical dataset.] Using the SPSS script for the Indirect procedure (Preacher
684——CHAPTER 16
& Hayes, 2008), bootstrapping was performed; 5,000 samples were requested; a bias-corrected and accelerated confidence interval (CI) was created for ab. For this 95% CI, the lower limit was .0769 and the upper limit was 2.0792.
Several criteria can be used to judge the significance of the indirect path. In this case, both the a and b coefficients were statistically significant, the Sobel test for the ab product was significant, and the bootstrapped CI for ab did not include zero. By all these criteria, the indirect effect of age on SBP through weight was statistically significant. The direct path from age to SBP (c′) was also statistically significant; therefore, the effects of age on SBP were only partly mediated by weight.
The upper diagram in Figure 16.5 shows the unstandardized path coefficients for this mediation analysis; the lower diagram shows the corresponding stan- dardized path coefficients.
Comparison of the coefficients for the direct versus indirect paths (c′ = 2.161 vs. ab = .701) suggests that a relatively small part of the effect of age on SBP is mediated by weight. There may be other mediating variables through which age might influence SBP, such as other age-related disease processes.
16.15 ♦ Summary
This chapter demonstrates how to assess whether a proposed mediating variable (X2) may partly or completely mediate the effect of an initial causal variable (X1) on an out- come variable (Y). The analysis partitions the total effect of X1 on Y into a direct effect, as well as an indirect effect through the X2 mediating variable. The path model represents causal hypotheses, but readers should remember that the analysis cannot prove causality if the data are collected in the context of a nonexperimental design. If controlling for X2 completely accounts for the correlation between X1 and Y, this could happen for reasons that have nothing to do with mediated causality; for example, this can occur when X1 and X2 are highly correlated with each other because they measure the same construct. A mediation analysis should be undertaken only when there are good reasons to believe that X1 causes X2 and that X2 in turn causes Y. In addition, it is highly desirable to collect data in a manner that ensures temporal precedence (i.e., X1 occurs first, X2 occurs second, and Y occurs third).
These analyses can be done using OLS regression; however, use of SPSS scripts pro- vided by Preacher and Hayes (2008) provides bootstrapped estimates of confidence intervals, and most analysts now believe this provides better information than statistical significance tests that assume normality. SEM programs provide even more flexibility for assessment of more complex models.
If a mediation analysis suggests that partial or complete mediation may be present, additional research is needed to establish whether this is replicable and real. If it is pos- sible to manipulate or block the effect of the proposed mediating variable experimentally, experimental work can provide stronger evidence of causality (MacKinnon, 2008).
Notes
1. It is also possible to hypothesize bidirectional causality, such that X causes Y and that Y in return also influences X; this hypothesis of reciprocal causation would be denoted with two
Mediation——685
unidirectional arrows, X →← Y, not with a double-headed arrow. Information about additional predictors of X and Y are needed to obtain separate estimates of the strengths of these two causal paths; see Felson and Bohrnstedt (1979) and Smith (1982).
2. For discussion of potential problems with comparisons among standardized regression coefficients, see Greenland et al. (1991). Despite the problems they and others have identified, research reports still commonly report standardized regression or path coefficients, particularly in situations where variables have arbitrary units of measurement.
3. Because each variable has a direct path to every other variable in this example, the chi- square for model fit is 0 (this means that the path coefficients can perfectly reconstruct the vari- ances and covariances among the observed variables). In more advanced applications of SEM, some possible paths are omitted, and then the model usually cannot exactly reproduce the observed variances and covariances. In those situations, it becomes important to examine several different indexes of model fit to evaluate the consistency between model and observed data. For further discussion, see Kline (2010).
4. For reasons given in Note 3, in this example, chi-square equals 0. 5. SEM programs such as Amos typically use some form of maximum likelihood estimation,
while linear regression uses ordinary least squares estimation methods (see Glossary for defini- tions of these terms). For this reason, the estimates of path coefficients and other model param- eters may differ, particularly for more complicated models.
686——CHAPTER 16
Comprehension Questions
1. Suppose that a researcher first measures a Y outcome variable, then measures an X1 predictor and an X2 hypothesized mediating variable. Why would this not be a good way to collect data to test the hypothesis that the effects of X1 on Y may be mediated by X2?
2. Suppose a researcher wants to test a mediation model that says that the effects of math ability (X1) on science achievement (Y) are mediated by sex (X2). Is this a reasonable mediation hypothesis? Why or why not?
3. A researcher believes that the prediction of Y (job achievement) from X1 (need for power) is different for males versus females (X2). Would a mediation analysis be appropriate? If not, what other analysis would be more appropriate in this situation?
4. Refer to Figure 16.1. If a, b, and ab are all statistically significant (and large enough to be of practical or clinical importance), and c′ is not statistically sig- nificant and/or not large enough to be judged practically or clinically important, would you say that the effects of X1 on Y are partially or completely mediated by X2?
5. What pattern of outcomes would you expect to see for coefficient estimates in Figure 16.1—for example, which coefficients would need to be statistically sig- nificant and large enough to be of practical importance, for the interpretation that X2 only partly mediates the effects of X1 on Y? Which coefficients (if any) should be not statistically significant if the effect of X1 on Y is only partly medi- ated by X2?
6. In Figure 16.1, suppose that you initially find that path c (the total effect of X1 on Y) is not statistically significant and too small to be of any practical or clinical importance. Does it follow that there cannot possibly be any indirect effects of X1 on Y that are statistically significant? Why or why not?
7. Using Figure 16.1 again, consider this equation: c = (a × b) + c′. Which coefficients represent direct, indirect, and total effects of X1 on Y in this equation?
8. A researcher believes that the a path in a mediated model (see Figure 16.1) cor- responds to a medium unit-free effect size and the b path in a mediated model also corresponds to a medium unit-free effect size. If assumptions are met (e.g., scores on all variables are quantitative and normally distributed), and the researcher wants to have power of about .80, what sample size would be needed for the Sobel test (according to Table 16.1)?
C o
m p
re h
en si
o n
Q u
es ti
o n
s
Mediation——687
9. Give an example of a three-variable study for which a mediation analysis would make sense. Be sure to make it clear which variable is the proposed initial predic- tor, mediator, and outcome.
10. Briefly comment on the difference between the use of a bootstrapped CI (for the unstandardized estimate of ab) versus the use of the Sobel test. What programs can be used to obtain the estimates for each case? Which approach is less dependent on assumptions of normality?
C o
m p
reh en
sio n
Q u
estio n
s