Correlation -Stats
Warner, R. M. (2021). Applied sta s cs I: Basic bivariate techniques (3rd ed.). Thousand Oaks, CA: Sage Publica ons. ISBN: 978-1-5063-5280-0.
CHAPTER 10 BIVARIATE PEARSON CORRELATION 10.1 RESEARCH SITUATIONS WHERE PEARSON’S R IS USED Pearson’s r is used when researchers want to know whether scores on two quan ta ve variables, X and Y, are linearly related. Pearson’s r is also called the Pearson product–moment correla on, and in research reports it is just denoted r. Usually, X denotes a predictor or independent variable; Y denotes an outcome or dependent variable. The Pearson correla on obtained in a sample is denoted r. The value of the correla on in the popula on is denoted ρ (pronounced rho, lower case r in Symbol font). We used M to es mate μ in earlier chapters; now we use r to es mate ρ. When researchers evaluate the correla on of X and Y variables, they usually have reasons for their choices of X and Y variables. Here are some common reasons for selec ons of X and Y variables. The researcher has a theory that X might cause or influence Y. Past research suggests that X predicts Y (even if it is not a cause of Y). X and Y may be different ways to measure the same thing. Researchers o en hope to find large correla ons between X and Y as evidence consistent with one of these three ideas. (Occasionally, they hope to find small correla ons as evidence against these ideas.) If data come from an experiment, the researcher may manipulate the X variable. For example, X scores may correspond to different dosages of a drug; Y scores may measure heart rate a er receiving the drug. If a variable is manipulated, it is the independent variable. Correla ons are o en obtained in research situa ons where neither variable has been manipulated. In these situa ons, we can call X the predictor variable if X happens earlier in me than Y, or if there is a plausible theory that says X might cause or influence Y. In some situa ons, X and Y are measured at the same me, and there is no plausible theory to suggest that X might cause or influence Y, or that Y might cause or influence X. In this situa on the choice of which variable to call X is arbitrary. Some mes X and Y are different measures of the same thing. Correla on tells us whether the two measures yield consistent (reliable) results. Suppose that N = 100 students take two tests that assess depression. The X variable is their scores on the Beck Depression Inventory; the Y variable is scores on the Center for Epidemiologic Studies Depression inventory. These are two
widely used self-report measures of depression. If the correla on between X and Y is large and posi ve, this is evidence that these two tests yield consistent results. 10.2 CORRELATION AND CAUSAL INFERENCE People o en say that “correla on does not imply causa on.” That can be stated more precisely as follows: A sta s cal rela onship between X and Y, by itself, does not imply causa on. If X causes Y, we would expect to find a sta s cal rela onship between X and Y using the appropriate bivariate sta s c (such as r, t test, chi squared, or other analyses). Evidence that X and Y co-occur or are sta s cally related is a necessary condi on for any claim that X might cause or influence Y. Sta s cal associa on is a necessary, but not sufficient, condi on for causal inference. We need to be able to rule out rival explana ons before we claim that X causes Y. The addi onal evidence needed to make causal inferences was discussed in Chapter 2. When we interpret correla on results, we must be careful not to use causal-sounding language unless other condi ons for causal inference are met. We should not report correla on results using words such as cause, determine, and influence unless data come from a carefully designed study that makes it possible to rule out rival explana ons. 10.3 HOW SIGN AND MAGNITUDE OF R DESCRIBE AN X, Y RELATIONSHIP Before you obtain a correla on, you need to examine an X, Y sca erplot to see if the associa on between X and Y is approximately linear. Pearson’s r provides useful informa on only about linear rela onships. Addi onal assump ons required for Pearson’s r will be discussed later. Values of r can range from –1.00 through 0 to +1.00. If assump ons for the use of r are sa sfied, then the value of Pearson’s r tells us two things. The sign of r tells us the direc on of the associa on. For posi ve values of r, as values of X increase, values of Y also tend to increase. For nega ve values of r, as values of X increase, values of Y tend to decrease. This tells us the nature or direc on of the associa on. The absolute magnitude of r (without the plus or minus sign) indicates the strength of the associa on. If r is near 0, there is li le or no associa on between X and Y. As r increases in absolute value, there is a stronger associa on. 10.4 SETTING UP SCATTERPLOTS Ini al evalua on of linearity is based on visual examina on of sca erplots. Consider the data in the file perfect linear associa on sca er data.sav in Figure 10.1. In this imaginary data, number of cars sold (X) is the predictor of a salesperson’s salary (Y). A sca erplot can be set up by hand. If you already know how to set up a sca erplot and graph a straight line, you may skip to Sec on 10.5. To create a sca erplot for the X variable cars_sold and the Y variable salary, set up a graph with values of 0 through 10 marked on the X axis (this corresponds to the range of scores for cars_sold, the predictor), and $10,000 through $25,000 marked on the Y axis (the range of
scores for salary, the outcome) as shown in Figure 10.2. If one variable is clearly the predictor or causal variable, that variable is placed on the X axis; in this example, cars_sold predicts salary. To graph one data point, look at one line of data in the file. The ninth line has X = 8 for number of cars sold and Y = $22,000 for salary. Locate the value of X (number of cars sold = 8) on the horizontal axis. Then, move straight up from that value of X to the corresponding value of Y, salary, which is $22,000. Place a dot at the loca on for that combina on of values of X and Y. When pairs of X, Y scores are placed in the graph for all 11 cases, the graph appears as in Figure 10.2. In a later sec on you’ll see how to use SPSS to generate sca erplots. Figure 10.2 shows a perfect posi ve linear associa on. It is posi ve because, as the number of cars sold increases, salary increases. It is a perfectly linear associa on because the dots lie exactly on a straight line. You can exactly predict each person’s salary from the number of units sold. The corresponding Pearson correla on is r = +1.00. In this example, an employee who sells no cars has a base salary of $10,000. For each addi onal car sold, the salary goes up by $1,500. A perfect nega ve correla on of r = –1.00 appears in Figure 10.3. For each addi onal hour of study me, there is one fewer error on the 10-point exam.
Figure 10.1 Data for Sca erplot of Perfect Posi ve Associa on Between X and Y (Correla on of +1)
Figure 10.2 How to Locate an X, Y Data Point in Sca erplot
Figure 10.3 Sca erplot for Perfect Nega ve Correla on, r = −1.00 10.5 MOST ASSOCIATIONS ARE NOT PERFECT In behavioral and social science research, data rarely have correla ons near –1.00 or +1.00; values of r tend to be below .30 in absolute value. When scores are posi vely associated (but not perfectly linearly related), they tend to fall within a roughly cigar-shaped ellipse in a sca erplot, as shown in Figure 10.4. To see how pa erns in sca erplots change as the absolute value of r decreases, consider the following sca erplots that show hypothe cal data for SAT score (a college entrance exam in the United States, X) as a predictor of first-year college grades (Y). Figure 10.5 shows a sca erplot that corresponds to a correla on of +.75 between SAT score (X predictor) and college grade point average (GPA) (Y outcome). The associa on tends to be linear (GPA increases as SAT score increases), but it is not perfectly linear. If we draw a line through the center of the en re cluster of data points, it is a straight line with a posi ve slope (higher GPAs go with higher SAT scores). However, many of the data points are not very close to the line. It is useful to think about the way average values of Y differ across selected values of X (such as low, medium, and high values of X). The ver cal ellipses in Figure 10.5 iden fy groups of people with low, medium, and high SAT scores. You can see that the group of people with low SAT scores has a mean GPA of 1.1, while the group of students with high SAT scores has a mean GPA of 3.6. We can’t predict each person’s GPA exactly from SAT score, but we can see that the average GPA is higher when SAT score is high than when SAT score is low. If the correla on between GPA and SAT score is about +.50, the sca erplot will look like the one in Figure 10.6. In real-world studies, correla ons between SAT and GPA tend to be about +.50 (Stricker, 1991). The difference in mean GPA for the low versus high SAT score groups in the graph for r = +.50 is less than it was in the graph for r = +.75. Also, within the low, medium, and high SAT score groups, GPA varies more when r = +.50 than when r = +.75. SAT scores are less closely related to GPA when r = .50 than when r = .75. Now consider what the sca erplot looks like when correla on is even smaller, for example, r = +.20. Many correla ons in behavioral science research reports are about this magnitude. A sca erplot for r = +.20 appears in Figure 10.7. In this sca erplot, points tend to be even farther away from the line than when r = +.50, and mean GPA does not differ much for the low SAT versus high SAT groups. For correla ons below about r = .50, it becomes difficult to detect any associa on by visual examina on of the sca erplot.
Figure 10.4 Ellipse Drawn Around Scores in an X, Y Sca erplot With a Strong Posi ve Correla on
Figure 10.6 Sca erplot for Hypothe cal GPA and SAT Score With r = +.50
Figure 10.7 Hypothe cal Sca erplot for r = +.20 10.6 DIFFERENT SITUATIONS IN WHICH R = .00 Finally, consider what sca erplots can look like when r is close to 0. An r of 0 tells us that there is no linear rela onship between X and Y. However, there are two different ways r close to 0 can happen. If X and Y are completely unrelated, r will be close to 0. If X and Y have a nonlinear or curvilinear rela onship, r can also be close to 0. Figure 10.8 shows a sca erplot for a situa on in which X is not related to Y at all. If SAT scores were completely unrelated to GPA, the results would look like Figure 10.8. The two groups (low and high SAT scores) have the same mean GPA, and mean GPA for each of these groups is equal to mean GPA for all persons in the sample. Also note that the overall distribu on of points in the sca erplot is approximately circular in shape (instead of ellip cal). However, an r of 0 does not always correspond to a situa on where X and Y are completely unrelated. An r close to 0 can be found when there is a strong but not linear associa on between X and Y. Figure 10.9 shows hypothe cal data for an associa on some mes found in research on anxiety (X) and task performance (such as exam scores, Y). The plot shows a strong, but not linear, associa on between anxiety and exam score. An inverse U-shaped curve corresponds closely to the pa ern of change in Y. In this example, students very low in anxiety obtain low exam scores (perhaps they are not mo vated to study and do not concentrate). Students with medium levels of anxiety have high mean exam scores (they are mo vated to study). However, students with the highest level of anxiety also have low exam scores; at high levels of anxiety, panic may set in and students may do poorly on exams. Pearson’s r is close to 0 for the data in this plot. Pearson’s r does not tell us anything about the strength of this type of associa on, and it is not an appropriate analysis for this situa on. An example of a different curvilinear func on appears in Figure 10.10 (height in feet, Y, and grade in school, X). In this example, r would be large and posi ve, however; a straight line is not a good descrip on of the pa ern. Height increases rapidly from Grades 4 through 7; a er that, height increases slowly and levels off. If you flip Figures 10.9 and 10.10 upside down, they correspond to other possible curvilinear pa erns.
Figure 10.8 An r of 0 That Represents No Associa on Between X and Y For correla ons near 0, you need to examine a sca erplot to evaluate whether the correla on suggests that there is no rela onship between variables or a rela onship that is not linear.
Figure 10.9 Curvilinear Associa on Between Anxiety and Exam Score; r = .00
Figure 10.10 Another Curvilinear Associa on Between X and Y In all studies that report Pearson’s r, sca erplots should be examined ahead of me to evaluate whether assump ons for linear correla on are sa sfied and, a er the correla on is obtained, to understand the pa ern in the data. 10.7 ASSUMPTIONS FOR USE OF PEARSON’S R 10.7.1 Sample Must Be Similar to Popula on of Interest If X, Y scores are obtained from a sample that has been randomly selected from the popula on of interest, it is reasonable to generalize beyond the sample value of r to make inferences about the value of ρ in the popula on. However, r values are o en obtained using convenience samples. In these situa ons, researchers need to evaluate whether their convenience sample is similar to (representa ve of) the popula on of interest. I think most people understand that use of M to es mate μ does not make sense if the sample is not similar to the popula on. I have seen occasional claims that we can use sta s cs (such as r) to es mate ρ even when convenience samples are not similar to popula ons of interest. The implicit assump on is that the value of r is independent of the kinds of people or cases in the study. That is incorrect. Inferences about ρ from r require similarity of sample to popula on, just as much as generaliza ons about μ based on M. Values of r can be highly dependent on context (e.g., composi on of sample, se ng, methods of interven on, measures of outcomes, variances of X and Y in the sample).
10.7.2 X, Y Associa on Must Be Reasonably Linear When X, Y sca erplots show dras cally nonlinear associa ons, Pearson correla on cannot detect strength of associa on. Appendix 10E briefly discusses simple analyses that can be used to evaluate nonlinear associa ons between X and Y. 10.7.3 No Extreme Bivariate Outliers Like the sample mean M, r is not robust against the effect of outliers (values of MX and MY are involved in computa on of r). Values of r can be inflated or deflated by outliers (and that makes sample values of r poor es mates of the unknown popula on correla on coefficient, denoted ρ). The presence of outliers is a serious problem in prac ce. Visual examina on of an X, Y sca erplot is a reasonable way to detect bivariate outliers. (Advanced textbooks provide other methods; see Volume II [Warner, 2020], Chapter 2.) A nonparametric alterna ve to r, such as Spearman’s r, may reduce the impact of bivariate outliers. 10.7.4 Independent Observa ons for X and Independent Observa ons for Y If persons or cases in your sample have opportuni es to influence one another’s scores on X and/or Y, through processes such as coopera on, compe on, imita on, and so on, this assump on will be violated. For further discussion of the assump on of independence among observa ons and the data collec on methods that tend to create problems with this assump on, refer to Chapter 2. When people in the sample have not had opportuni es to influence one another, this assump on is usually met. When this assump on is violated, values of r and significance tests of r can be incorrect. 10.7.5 X and Y Must Be Appropriate Variable Types Some textbooks say that both X and Y must be quan ta ve variables for Pearson’s r; that is the most common situa on when Pearson’s r is reported. However, Pearson’s r can also be used if either X or Y, or both, is a dichotomous variable (for example, if X represents membership in just two groups). When one or both variables are dichotomous, Pearson’s r can be reported with different names. If we correlate the dichotomous variable sex with the quan ta ve variable height, this is called a point biserial r (rpb). If we correlate the dichotomous variable sex with the dichotomous variable poli cal party coded 1 = Republican, 2 = non-Republican, that correla on is called a phi coefficient, denoted ϕ. If X and/or Y have three or more categories, Pearson’s r cannot be used, because it is possible for the pa ern of means on a Y variable to show a nonlinear increase or decrease across groups defined by a categorical X variable. For example, if X is poli cal party membership (coded 1= Democrat, 2 = Republican, 3 = Socialist, and so forth), and Y is a ra ng of the president’s performance, we cannot expect changes in Y across values of X to be linear. Consider the sca erplot in Figure 10.11 that represents an associa on between sex (X) and height (Y). Pearson’s r requires that the associa on between X and Y be linear. When an X predictor variable has only two possible values, the only associa on that X can have with Y is linear.
In the bivariate regression chapter, you will see that the X independent variable can be either quan ta ve or dichotomous. However, the Y dependent variable in regression analysis cannot be dichotomous; it must be quan ta ve. Correla on analysis does not require us to dis nguish variables as independent versus dependent; regression does require that dis nc on. 10.7.6 Assump ons About Distribu on Shapes Textbooks some mes say that the joint distribu on of X and Y must be bivariate normal and/or that X and Y must each be normally distributed. In prac ce this is o en difficult to evaluate. The bivariate normal distribu on is not discussed further here. In prac ce, it is more important that X and Y have similar distribu on shapes (Tabachnick & Fidell, 2018). When X and Y have different distribu on shapes, values of r in the sample are restricted to a narrower range, for example, –.40 to +.40.
Figure 10.11 Sca erplot for Dichotomous Predictor (Sex) and Quan ta ve Outcome (Height) Note: M1 = mean male height; M2 = mean female height. When there are problems with assump ons, nonparametric alterna ve correla ons such as Spearman’s r or Kendall’s tau may be preferred (see Appendix 10A and Kendall, 1962). These also require assump on of linearity, but they are likely to be less influenced by bivariate outliers. 10.8 PRELIMINARY DATA SCREENING FOR PEARSON’S R
The following informa on is needed to evaluate the assump ons. To evaluate representa veness of the sample and independence of observa ons, you need to know how the sample was obtained and how data were collected (see Chapter 2). In addi on: Examine frequency tables to evaluate problems with missing values and/or outliers and/or implausible values for X or Y (as for all analyses). Document numbers of missing values and outliers and how they are handled. Obtain histograms for X and Y. Evaluate whether distribu on shapes are reasonably normal and X and Y have similar shapes. For a dichotomous variable, the closest to normal shape is a 50/50 split in group membership. Obtain an X, Y sca erplot. This is the most important part of data screening for correla on. The sca erplot is used to evaluate linearity and iden fy poten al bivariate outliers. Pearson’s r is not robust against viola ons of most of its assump ons. A sta s c such as the median is described as robust if departures from assump ons and/or the presence of outliers do not have much impact on the value of that sample sta s c. Partly because r is affected badly by viola ons of its assump ons and addi onal problems discussed later, sample sizes for r should be large, ideally at least N = 50 or 100, and data screening should include evalua on of bivariate outliers. 10.9 EFFECT OF EXTREME BIVARIATE OUTLIERS Prior chapters discussed methods for the detec on of univariate outliers (i.e., outliers in the distribu on of a single variable) through examina on of histograms and boxplots. In correla on, we also need to consider possible bivariate outliers. These do not necessarily have extreme values on X or on Y (although they may). A bivariate outlier o en represents an unusual combina on of values of X and Y. For example, if X is height and Y is weight, height of 6 and body weight of 120 lb would be a very unusual combina on of values, even though these are not extreme scores in histograms for height and weight. If you visualize the loca on of points in your sca erplot as a cloud, a bivariate outlier is an isolated data point that lies outside that cloud. Figure 10.12 shows an extreme bivariate outlier (the circled point at the upper right). For this sca erplot, r = +.64. If the outlier is removed, and a new sca erplot is set up for the remaining data, the plot in Figure 10.13 is obtained; when the outlier is excluded, r = –.11 (not significantly different from 0). This example illustrates that the presence of a bivariate outlier can inflate the value of a correla on. It is not desirable to have the result of an analysis depend so much on one outlier score. The presence of an outlier does not always increase the magnitude of r; consider the example in Figure 10.14. In this example, when the circled outlier at the lower right of the plot is included, r = +.532; when it is excluded, r = +.86. When this bivariate outlier is included, it decreases the
value of r. These examples demonstrate that decisions to retain or exclude outliers can have substan al impact on r values.
Figure 10.12 Sca erplot That Includes a Bivariate Outlier
Figure 10.13 Subset of Data From Figure 10.12 A er Outlier Is Removed Note: With the bivariate outlier included, Pearson’s r(48) = +.64, p < .001; with the bivariate outlier removed, Pearson’s r(47) = –.10, not significant. It is dishonest to run two correla ons (one that includes and one that excludes bivariate outliers) and then report only the larger correla on. It can be acceptable to report both correla ons so that readers can see the effect of the outlier. Decisions about iden fica on and handling of outliers should be made before data are collected.
Figure 10.14 A Bivariate Outlier That Deflates the Size of r Note: With the bivariate outlier included, Pearson’s r(48) = +.532, p < .001; with the bivariate outlier removed, Pearson’s r(47) = +.86, p < .001. When you are learning sta s cs, it can be informa ve to “experiment” with data and see whether correla ons change when outliers are dropped. (The website for this book at edge.sagepub.com/warner3e includes data files that are not used in these chapters that you can use to “play with” different analyses.) However, when you analyze data for publica on, you need to decide what to do about outliers and data transforma ons before you obtain correla ons. Dropping outliers a er you have obtained a correla on that is not significant, in order to obtain a correla on that is sta s cally significant, is a form of p-hacking that must be avoided. 10.10 RESEARCH EXAMPLE
Consider a set of survey data collected from 118 university students about their heterosexual da ng rela onships. The variables in the data set love.sav are described in Table 10.1. Only students currently involved in serious da ng rela onships were included. They provided several kinds of informa on, including their own sex, partner sex, and a single-item ra ng of a achment style. They also filled out Sternberg’s Triangular Love Scale (Sternberg, 1997), which measures three aspects of love: quan ta ve measures of in macy, commitment, and passion felt toward the current rela onship partner. Later in the chapter, we will use Pearson’s r to describe the strength of the linear rela onships between pairs of these variables and to test whether these correla ons are sta s cally significant. For example, we can ask whether there is a strong posi ve correla on between scores on in macy and commitment, as well as between passion and in macy. Table 10.1 Descrip on of Variables in the File love.sav
Note that a achment is a categorical variable with three groups; this variable cannot be used in correla on analysis. Any other pair of these variables can be used in correla on. Rela onship length is coded in a way that makes it reasonable to treat it as quan ta ve. Correla on analysis for each pair of variables, X and Y, will proceed as follows.
Preliminary data screening: Examine frequency tables, histograms, and/or boxplots for X and Y to evaluate their distribu on shapes (you hope to see few or no missing data, no implausible score values, and no extreme outliers).
Examine an X, Y sca erplot to evaluate linearity and iden fy poten al bivariate outliers. You hope to see a linear associa on without bivariate outliers.
If rules have been established prior to analysis regarding the criteria for calling a score an outlier, and a method for handling outliers, handle outliers according to these rules, and document what you have done.
Researchers prefer to be able to say that data are approximately normally distributed and do not have outliers. However, real data are o en messy, and problems with data must be explained in the write-up.
Obtain the correla on, r, between X and Y. Test the sta s cal significance of r. Many studies report numerous Pearson’s r values.
Sta s cal significance tests should be modified when more than one correla on is reported, as discussed later in this chapter.
Evaluate r and r2 as effect sizes. Set up a confidence interval (CI) for r. When interpre ng r, keep in mind factors that can make r a poor es mate of the true
strength of associa on between X and Y. These include viola ons of the assump ons described previously and addi onal problems discussed in Appendix 10D.
10.11 DATA SCREENING FOR RESEARCH EXAMPLE We will examine the correla on between the two variables commitment and in macy. For the data in love.sav, histograms of scores on these two variables were obtained by selec ng the <Analyze> → <Descrip ve Sta s cs> → <Frequencies> procedure. In the Frequencies: Charts dialog box, select “Histograms.” These SPSS menu selec ons appear in Figure 10.15. The histograms for commitment and in macy had similar shapes (the commitment and in macy histograms appear in Figure 10.16 and Figure 10.17, respec vely). These are not perfect normal distribu on shapes; both distribu ons were nega vely skewed; distribu on shapes were similar. Possible scores on these variables ranged from 15 to 75; most people rated their rela onships very high on both these scales. Boxplots were also obtained; the boxplot for commitment scores appears in Figure 10.18. There was one outlier and one extreme outlier at the lower end of the distribu on of commitment scores. The bivariate sca erplot for self-reported in macy and commitment (in Figure 10.19) shows a posi ve, linear, and strong associa on between scores; that is, persons who reported higher scores on in macy also reported higher scores on commitment. This is not an ideal situa on for
Pearson’s r (it does not resemble a bivariate normal distribu on), but this is a situa on where Pearson’s r will work reasonably well. A possible bivariate outlier was iden fied by visual examina on (circled). Other cases could also be called poten al bivariate outliers. I did not remove outliers when obtaining the correla on.
Figure 10.15 SPSS Menu Selec ons to Obtain Histograms When sample sizes are large, there may be mul ple cases that have exactly the same combina on of values for X and Y. These all correspond to the same circle or case marker on the plot and cannot be dis nguished by eye.
Figure 10.16 Nega vely Skewed Histogram for Commitment Scores
Figure 10.17 Nega vely Skewed Histogram for In macy Scores In this situa on I have not iden fied variables as independent or dependent; they are treated as correlates. When variables cannot be iden fied as independent versus dependent, the decision of which to place on the X axis is arbitrary. An independent variable corresponds to the X axis in a sca erplot. In this example, the decision to place in macy on the X axis is arbitrary, and it makes no difference when the correla on is es mated.
Figure 10.18 Boxplot for Commitment Scores Showing Low-End Outliers Note: Descrip ve sta s cs for commitment: M = 66.63, SD = 8.16, N = 118.
Figure 10.19 Sca erplot for In macy and Commitment 10.12 COMPUTATION OF PEARSON’S R Pearson’s r is a standardized index of strength of associa on. Its value does not depend on the units of measurement used for either the X or Y variable; its value also does not depend on N, the number of cases. If we have a sample of heights and weights for a set of people, r between height and weight will be the same whether height is given in inches, feet, cen meters, or meters and whether weight is given in kilograms or pounds. These are the steps involved in compu ng Pearson’s r. A numerical example using a small batch of data with hypothe cal scores for heart rate and tension appears in Table 10.2. First, convert scores on both X and Y to z scores: Other (10.1a) 𝑧𝑋=(𝑋−𝑀𝑋)/𝑠𝑋 Other (10.1b) 𝑧𝑌=(𝑌−𝑀𝑌)/𝑠𝑌 where MX and sX are the mean and standard devia on of X, and MY and sY are the mean and standard devia on of Y. This conversion gets rid of units of measurement such as inches,
meters, or kilograms; conversion to z scores makes the value of the correla on unit free. The z scores for this small set of hypothe cal data appear in Table 10.2.
Second, mul ply the value of zX and zY for each person to obtain a product (zX × zY). These products appear in the far-right column of Table 10.2. When zX and zY have the same sign, their product is posi ve (i.e., a nega ve value mes a nega ve value is posi ve). When zX and zY have opposite signs, their product is nega ve.
Third, sum the (zX × zY) products across all persons in the sample; that is, sum the last column in Table 10.2; this gives us Σ(zX × zY).
Divide the sum of these products, Σ(zX × zY), by N – 1, where N is the number of cases (number of X, Y pairs of scores). This corrects for the fact that the sum gets larger as N increases and makes the final value of r independent of N. This gives you the value of r.
Table 10.2 Computa on of Pearson’s r for a Set of Scores on Heart Rate (HR) and Self-Reported Tension
All these steps can be summarized in one equa on: Other (10.1) 𝑟=Σ(𝑧𝑋×𝑧𝑌)/(𝑁−1). An r value obtained from Equa on 10.1 will lie within the range –1.00 to +1.00 (if distribu on shapes for X and Y are the same; if the distribu on shapes differ, the largest possible values for r can be much less than 1 in absolute value). The order in which we list the variables does not ma er; whether we speak of the correla on of X with Y, denoted rXY, or the correla on of Y with X, denoted rYX, the value of r is the same. There are other formulas for Pearson’s r that yield the same numerical result (Appendix 10F).
10.13 HOW COMPUTATION OF CORRELATION IS RELATED TO PATTERN OF DATA POINTS IN THE SCATTERPLOT To understand how the formula for correla on provides informa on about the loca on of points in a sca erplot, and how it detects a tendency for high scores on Y to co-occur with high or low scores on X, it is helpful to look at the arrangement of points in an X, Y sca erplot (see Figure 10.20). Consider what happens when the sca erplot is divided into four quadrants or regions: scores that are above and below the mean on X and scores that are above and below the mean on Y. The MX line divides the range of X values into those below and above the mean of X, while the MY line divides the Y values into those below and above the mean of Y.
Figure 10.20 X, Y Sca erplot Divided Into Quadrants (Above and Below the Means of X and Y) The data points in Regions II and III in Figure 10.20 are concordant pairs; these are cases for which high X scores were associated with high Y scores or low X scores were paired with low Y scores. In Region II, both zX and zY are posi ve, and their product is also posi ve; in Region III, both zX and zY are nega ve, so their product is also posi ve. If most of the data points fall in Regions II and/or III, it follows that most of the contribu ons to the Σ(zX × zY) sum of products will be posi ve, and the correla on will tend to be large and posi ve. The data points in Regions I and IV in Figure 10.20 are discordant pairs because these are cases where high X goes with low Y and/or low X goes with high Y. In Region I, zX is nega ve and zY is posi ve; in Region IV, zX is posi ve and zY is nega ve. This means that the product of zX and zY for each point that falls in Region I or IV is nega ve. If there are numerous data points in Region I and/or IV, most of the contribu ons to Σ(zX × zY) will be nega ve, and r will tend to be nega ve. If the data points are about evenly distributed among the four regions, then posi ve and nega ve values of zX × zY will be about equally common, they will tend to cancel each other out
when summed, and the overall correla on will be close to zero. This can happen either because X and Y are unrelated (as in Figure 10.8) or in situa ons where there is a strongly curvilinear rela onship (as in Figure 10.9). In either of these situa ons, high X scores are associated with high Y scores about as o en as high X scores are associated with low Y scores. As you con nue to study sta s cs, you will see that when sta s cal formulas include products between variables of the form Σ(X × Y ) or Σ(zX × zY), the computa on provides informa on about correla on or linear associa on. Terms that involve ΣX2 or ΣY2 provide informa on about variability. Recognizing these terms makes it possible to decode the informa on included in more complex computa onal formulas. The computa on of r is straigh orward. However, many characteris cs of data, such as restricted ranges of scores, non-normal distribu on shape, outliers, and low reliability, can lead to over- or underes ma on of the correla ons between variables. Here is a brief list of some ar facts (problems) that can make sample Pearson’s r values poor es mates of the true strength of associa on between variables. Appendix 10D explains these in greater detail. Advanced students should take these into considera on when interpre ng or comparing obtained r values.
Pearson’s r will be deflated if the associa on between X and Y is nonlinear or curvilinear. Pearson’s r can be either inflated or deflated by the presence of bivariate outliers. Pearson’s r will be deflated if the X and Y variables have different distribu on shapes. Pearson’s r can be deflated if scores on X and/or Y have restricted ranges. Pearson’s r is usually overes mated if only groups of persons with extremely low and
extremely high scores on X and Y are examined. Samples that include members of different groups can provide misleading informa on. If X or Y have poor measurement reliability, their correla ons with other variables will be
a enuated (reduced). Part-whole correla ons: If X and Y contain overlapping informa on such as duplicate
survey ques ons, or if Y = X + something, then X and Y can be highly correlated because they contain duplicate informa on.
Correla ons (and related values called covariances) are the star ng point for many later mul variate analyses. Ar facts that inflate or deflate values of sample correla ons will also distort the results of other mul variate analyses. It is therefore extremely important for researchers to understand that ar facts that influence the size of Pearson’s r also influence the magnitudes of regression coefficients, factor loadings, and other coefficients used in mul variate models. 10.14 TESTING THE HYPOTHESIS THAT Ρ0 = 0 Sta s cal significance tests can be obtained for most sta s cs you will learn. Just as sample M was used to es mate the unknown popula on mean μ, sample r is used to es mate the unknown popula on value of the correla on between X and Y, denoted by the Greek le er rho (ρ). Recall that a large M (rela ve to the standard error of M) is evidence that is inconsistent
with belief that μ = 0. Similarly, a value of r that is large in absolute value is evidence inconsistent with the belief that ρ, the true strength of the correla on in the popula on, is 0. The formal null hypothesis is that ρ, the true popula on correla on between X and Y, is 0; in other words, that X and Y are not linearly related. Other
(10.2) Recall the general form of a t ra o used to test hypotheses like this:
Other In this chapter the sample sta s c of interest is r, so the t ra o is: Other
(10.3) The hypothesized value of ρ0 is 0, therefore the ρ0 term is dropped in later equa ons. (Other hypotheses about correla ons can be tested; see Appendix 10C at the end of this chapter.) The t ra o for the correla on has df = N − 2, where N is the number of cases that have X, Y pairs of scores. Other
(10.4) To find the t ra o, we need to obtain the value of SEr; this is given by Equa on 10.5. SEr depends only on the values of r2 and N, the number of cases in the sample: Other
(10.5) Subs tu ng the value of SEr from Equa on 10.6 into Equa on 10.5 and rearranging the terms yields the most widely used formula for a t test for the significance of a sample r value in Equa on 10.6. This t test has N − 2 degrees of freedom; because the hypothesized value of ρ0 is 0, the ρ0 term drops out. Other
(10.6) A large value of t tells us that r is large rela ve to its standard error. In prac ce you won’t need to examine t values or look up cri cal values of t. You can use the “Sig.” value in the SPSS output to evaluate the sta s cal significance of r. By default, SPSS reports p values for two-tailed tests. Using the most common criterion for significance (α = .05, two tailed), an r value is judged sta s cally significant if “Sig.” or p < .05. Note that if SPSS reports “Sig.” or p = .000, this must be reported as p < .001. A p value is an es mate of risk for Type I error, and although that risk may be zero to many decimal places, it can never be exactly zero. This sec on shows you that a Pearson’s r can be converted into a t ra o. In Chapter 12, on the independent-samples t test, you will see that a t ra o can also be converted into a correla on (r point biserial, denoted rpb). As you con nue to study you will see that many sta s cal analyses that appear to be different on the surface are just different ways of presen ng the same informa on about associa ons between variables. Other significance tests that can be used to compare sample value of r appear in Appendix 10C. Cau on is needed when comparing values of correla ons across samples or variables; many factors, discussed in Appendix 10D, can make sample values of Pearson’s r poor es mates of popula on correla ons. 10.15 REPORTING MANY CORRELATIONS AND INFLATED RISK FOR TYPE I ERROR Journal ar cles rarely report just a single Pearson’s r. Unfortunately, however, some studies report such large numbers of correla ons that evalua on of sta s cal significance becomes problema c. Suppose that k = 20 variables are measured in a nonexperimental study. If a researcher obtains all possible bivariate correla ons for a set of k variables, there will be k × (k – 1)/2 different correla ons, in this case (20 × 19)/2 = 190 different correla ons. When we set the risk for Type I error at α =.05 (that is, reject H0 if p < .05) for each individual correla on, then out of each set of 100 sta s cal significance tests, about 5 tests will be “significant” only because of Type I error (rejec on of H0 when H0 is true). If you generated an en re data set with k variables using a random number generator and then started to run correla ons among these variables, about 5% of the correla ons would be judged sta s cally significant even if the data are completely random. When a journal ar cle reports 100 correla ons, for example, one would expect that about 5% or 5 of these correla ons would be judged sta s cally significant using the α =.05 significance criterion, even if values in the data set came from a random number generator. If a researcher runs 100 correla ons and finds that most (say, 65 of 100) are significant, then it seems likely that at least some of these correla ons are not due to Type I error. However, the researcher should suspect that about 5 of the correla ons are due to Type I error—and there is no way to tell which of the 65 correla ons were due to sampling error. If a researcher reports
100 correla ons and only 5 are significant, it’s quite possible that the researcher has found nothing more than the expected number of Type I errors. The logic of null hypothesis significance tes ng implicitly assumes that we conduct one and only one test and then stop. If we follow this rule, and adhere to all other assump ons for significance tests, then the risk for rejec ng H0 when H0 is true should be limited, in theory, to α (usually α = .05). When we run many tests (and when we violate other assump ons), the actual risk for commi ng at least one Type I error is larger than .05 (“inflated”). When we run large numbers of significance tests, we should view decisions about sta s cal significance very skep cally. 10.15.1 Call Results Exploratory and De-emphasize or Avoid Sta s cal Significance Tests One way to avoid the problem of inflated risk is to report results as purely exploratory. You can report numerous correla ons in an exploratory study, provided you make it clear that evalua on of sta s cal significance is problema c in this situa on. In this case, I would not include p values at all. If they are included, state clearly that they have not been corrected for inflated risk for Type I error if you use p values from SPSS output. The p values provided by SPSS are not corrected for inflated risk for Type I error. Thus, one way to deal with the problem of inflated risk is to “come clean” and simply say that inflated risk exists. 10.15.2 Limit the Number of Correla ons A be er way to limit risk for Type I error is to limit the number of correla ons that will be obtained at the outset, before looking at the data, on the basis of theore cal assump ons about which predic ve rela ons are of interest. A possible drawback of this approach is that it may preclude serendipitous discoveries. Some mes correla ons that were not predicted indicate rela onships among variables that might be confirmed in subsequent replica ons. 10.15.3 Replicate or Cross-Validate Correla ons A third way to handle inflated risk is cross-valida on or replica on. Obtain new samples of data and see whether the same X, Y correla ons are significant in the new batch of data as in the first batch (i.e., see whether the correla on results can be replicated or reproduced). In a cross- valida on study, the researcher randomly divides data into two batches; thus, if the en re study had data for N = 500 par cipants, each batch would contain 250 cases. The researcher then does extensive exploratory analysis on the first batch of data and decides on a limited number of correla ons or predic ve equa ons that seem to be interes ng and useful. Then, the researcher reruns this small set of correla ons on the second half of the data. If the rela ons between variables remain significant in this fresh batch of data, it is less likely that these rela onships were just instances of Type I error. The main problem with this approach is that researchers o en don’t have large enough numbers of cases to make this possible. 10.15.4 Bonferroni Procedure: Use More Conserva ve Alpha Level for Tests of Individual Correla ons The Bonferroni procedure is another way to limit risk for Type I error. Suppose a researcher plans to do k = 10 correla ons and wants to have an experiment-wise alpha (EWα) of .05.
Sta s cians speak of the en re set of significance tests as an “experiment,” even in situa ons where the study is not an experiment; “experiment-wise” refers to all the sta s cal significance tests in a table, study, or research report. To keep the risk for obtaining at least one Type I error as low as 5% for a set of k = 10 significance tests, it is necessary to set a per comparison alpha (PCα) level that is lower for each individual test. The PCα level is the criterion used when deciding whether each individual correla on is sta s cally significant. Using the Bonferroni procedure, if there are k significance tests in the “experiment” or study, the PCα used to test the significance of each individual r value is set at EWα/k, where k is the number of significance tests. To obtain the Bonferroni-corrected PCα level used to evaluate the significance of each individual correla on, calculate PCα as follows: Other
(10.7) where EWα is the acceptable risk for error for the en re “experiment.” The value of EWα is arbitrarily chosen by the researcher; it may be higher than the conven onal α = .05 used in many single-test situa ons (for example, EWα could be set at .10). The value of k is the number of significance tests included in the set. PCα is the alpha level that is used for each individual test to obtain the desired EWα for the set of tests. For example, if we set EWα = .05 and have k = 10 correla ons, then each individual correla on will need to have an observed p value less than EWα/k = .05/10, or .005, to be judged sta s cally significant. Data analysts can say that Bonferroni-corrected alpha levels are used; they should state the numerical values of EWα, k, and PCα explicitly. This approach is very conserva ve. Some mes the number of correla ons that are tested in exploratory studies is quite large (I have seen papers that report 100 or 200 correla ons). If the per comparison error rate is obtained by dividing .05 by 100, the resul ng PCα would be so low that it would almost never be possible to judge individual correla ons significant. Some mes, instead of defining all 100 correla ons reported in a paper as the “set,” a data analyst might divide the 100 tests into 5 sets of 20 tests. Other procedures exist for control of inflated risk for Type I error; the Bonferroni approach is the simplest but also the most conserva ve. 10.15.5 Common Bad Prac ce in Reports of Numerous Significance Tests When large numbers of tests are reported, such as tables that contain lists of many correla ons, it is common for authors to use asterisks to denote values that reached some prespecified levels of sta s cal significance. In table footnotes, * indicates p < .05, ** indicates p < .01, and *** indicates p < .001. I do not think the prac ce of using asterisks to indicate sta s cal significance is a good idea; in effect, it sets the alpha levels used to evaluate correla ons a er results have been obtained. However, this is common prac ce, so you need to recognize this in journal ar cles. In most situa ons where asterisks appear in tables, nothing has been done to correct for inflated risk for Type I error (unless noted otherwise).
10.15.6 Summary: Repor ng Numerous Correla ons If researchers do not try to limit the risk for Type I error in any of these ways, they must explain in the write-up that reported p values are not corrected for inflated risk for Type I error and therefore they underes mate the true overall risk for Type I error. The discussion should reiterate that the study is exploratory, that rela onships detected by running large numbers of significance tests are likely to include large numbers of Type I errors, and that replica ons with new samples are needed before researchers can be confident that the rela onships are not simply due to chance or sampling error. The most decep ve way to ignore this problem is to run dozens or hundreds of correla ons and then select a small number of correla ons to report, without informing the reader about the many unreported analyses. 10.16 OBTAINING CONFIDENCE INTERVALS FOR CORRELATIONS There is increasing emphasis on the need to report CIs in research reports. SPSS does not provide confidence intervals for Pearson’s r. Unfortunately, this is a somewhat complex procedure. CIs for r can be obtained using online calculators. Lowry (2019) provides an online calculator for confidence intervals for Pearson’s r at www.vassarstats.net/rho.html. A screenshot appears in Figure 10.21. Enter the values of r and N (Lowry denotes this as n) into the windows as shown and click the calculate bu on. I used r = +.65 and n = 34 in this example. Lower and upper bounds are provided for both 95% and 99% CIs. This result can be reported as follows: “The 95% CI for r = +.65 with N of 34 is [.40, .81].” Appendix 10B outlines the by-hand procedures to obtain confidence intervals for values of r.
Figure 10.21 Screenshot: Online Calculator for Pearson’s r Confidence Intervals Source: Lowry (2019). 10.17 PEARSON’S R AND R2 AS EFFECT SIZES AND PARTITION OF VARIANCE Both Pearson’s r and r2 are indexes of effect size. They are standardized (their values do not depend on the original units of measurement of X and Y), and they are independent of sample size N. Some mes r2 is called the coefficient of determina on; I prefer to avoid that term because it suggests causality, and as noted earlier, correla on is not sufficient evidence for causality. An r2 es mates the propor on of variance in Y that can be predicted from X (or, equivalently, the propor on of variance in X that is predictable from Y). Propor on of predicted variance (r2) can be diagramed by overlapping circles, as shown in Figure 10.22. Each circle represents the total variance of one variable. The area of overlap between circles is propor onal to r2, the shared or predicted variance. The remaining area of each circle corresponds to 1 – r2; this represents the propor on of variance in Y that is not predictable from X. Cohen (1988) suggested the following verbal labels for sizes of r in the behavioral and social sciences: r of about .10 or less (r2 < .01) is small; r of about .30 (r2 = .09) is medium; and r greater than .50 (r2 > .25) is large. Guidelines are summarized in Table 10.3. Below r of about .10, a correla on represents a rela on between variables that we can detect in sta s cal analyses using large samples, but the rela on is so weak that it is not no ceable in everyday life. When r ≈ .30, rela ons between variables may be strong enough that we can detect them in everyday life. When r is above ≈ .50, rela ons may be easily no ceable in everyday life. These guidelines for effect size labels are well known and generally accepted by researchers in social and behavioral sciences, but they are not set in stone. In other research situa ons, an effect size index other than r and/or different cutoff values for small, medium, and large effects may be preferable (Fritz, Morris, & Richler, 2012). When findings are used to make important decisions that affect people’s lives (such as whether a new medical treatment produces meaningful improvements in pa ent outcomes), addi onal informa on is needed; see further discussion about effect size in Chapter 12, on the independent-samples t test.
Figure 10.22 Overlapping Circles: Propor ons of Areas Correspond to r2 and (1 − r2) Table 10.3 Widely Reported Effect Size Labels for Pearson’s r and Pearson’s r2
Why did Cohen choose .30 as the criterion for a medium effect? I suspect it was because correla ons of approximately r = .30 and below are common in research in areas such as personality and social psychology. For example, Mischel (1968) remarked that values of r greater than .30 are rare in personality research. In some fields, such as psychophysics and behavior analysis research in psychology, propor ons of explained variance tend to be much higher (some mes on the order of 90%). The effect size guidelines in Table 10.3 would not be used in research fields where stronger effects are common. In prac ce, you will want to compare your r and r2 values with those obtained by other researchers who study similar variables. This will give you some idea of how your effect sizes compare with those in other studies in your research domain. As shown in Figure 10.22, r2 and (1 – r2) provide a par on of variance for the scores in the sample. For example, the variance in Y scores can be par oned into a propor on that is predictable from X (r2) and a propor on that is not predictable from X (1 – r2). An r2 is o en referred to as “explained” or predicted variance in Y; (1 – r2) is error variance or variance that cannot be predicted from X. In everyday life, error usually means “mistake” (and mistakes some mes do happen when sta s cs are calculated and reported). The term error means several different things in sta s cs. In the context of correla on and many other sta s cal
analyses, error refers to the collec ve influence of several kinds of factors that include other predictor or causal variables not included in the study, problems with measurements of X and/or Y, and randomness. Suppose the Y variable is first-year college GPA, and X is SAT score. Correla ons between these variables are on the order of .4 to .5 in many studies. If r = .5, then r2 = .25 = 25% of the variance in GPA is predictable; and (1 – r2) = 75% of the variance in GPA is error variance, or variance that is not predicted by SAT score. In sta s cs, error refers to all other variables that are not included in the analysis that may influence GPA. For example, GPA may also depend on variables such as amount of me each student spends partying and drinking, difficulty of the courses taken by the student, amount of me spent on outside jobs, life stress, physical illness, and a poten ally endless list of other factors. If we have not measured these other variables and have not included them in our data analysis, we have no way to evaluate their effects. By now, you may be thinking, If we could measure these other variables and include them in the analysis, then the percentage of variance in GPA that we could predict should be higher (and if r2 becomes larger, then 1 – r2, propor on of error variance, will become lower). If you are thinking that, you are correct. However, analyses that include mul ple variables are beyond the scope of this volume. Before you can work with larger numbers of variables, you need to learn numerous bivariate analyses (analyses used to examine one predictor and one outcome variable). The bivariate sta s cs in these chapters are the building blocks that are included in the more complicated, mul ple-variable analyses. As you con nue to study sta s cs, you will find that par oning the variance in a dependent variable into propor ons of variance that can be explained by one, two, or many predictor variables is a major goal of sta s cal analysis. The ability to predict large propor ons of variance in outcome variables is highly desired by data analysts. It is important to understand that the r and r2 values you obtain in a study are not “facts of nature.” Propor ons of variance will vary across studies and can differ substan ally between research situa ons and the situa ons of interest in the real world. The r2 we obtain in a study depends on many features unique to the data set (such as the presence or absence of outliers) and on the kinds of cases included in the study. As an example, consider adult height (Y) as an outcome we want to predict or explain. Two of the major variables that affect this are gene c inheritance and quality of nutri on. Malnutri on during childhood can severely stunt growth. Suppose we want to evaluate what propor on of variance in height (Y) is due to quality of nutri on (X). I’ll use an extreme example; however, the issues are similar in less extreme situa ons. In imaginary Study 1, suppose that all the people are gene c clones and that they vary a great deal in nutri onal status: Some are well fed, others starved. The r2 between quality of nutri on and height would be very high in imaginary Study 1. Now imagine Study 2, in which all persons have been fed exactly the same diet, and there is a lot of varia on in gene cs. In Study 2, there would be li le or no associa on between height and quality of nutri on; r2 would be close to 0. What can a researcher say about the associa on between nutri on and height out in the world, given the r2 from one study? The correct answer
is, It depends on the situa on. The numerical values of sta s cs (whether M or r or r2) depend a lot on the composi on of the sample in the study, and we can’t generalize results to popula ons of persons who differ substan ally from the persons in the study. 10.18 STATISTICAL POWER AND SAMPLE SIZE FOR CORRELATION STUDIES Sta s cal power is the likelihood of obtaining a sample r large enough to reject H0: ρ0 = 0 when the popula on correla on ρ is not 0. Sta s cal power for a correla on depends on the following factors: the true effect size in the popula on (e.g., the real value of ρ or ρ2 in the popula on of interest), the alpha level, choice of one- or two-tailed test, and N, the number of subjects for whom we have X, Y scores. Power analysis should be done during the planning stages of a study (not a er results are in). If you plan to examine correla ons between SAT scores and GPAs, and you know that past similar studies have found r on the order of .50, it would be reasonable to guess that you need a sample size that gives you a good chance of detec ng a popula on effect size ρ of .50. In sta s cal power analysis, power of .80 is used as the goal (i.e., you want an 80% chance of rejec ng H0 if H0 is false). Using Table 10.4, it is possible to look up the minimum N of par cipants required to obtain adequate sta s cal power for different popula on correla on values. For example, let α = .05, two tailed; set the desired level of sta s cal power at .80 or 80%; and assume that the true popula on value of the correla on is ρ = .5. This implies a popula on ρ2 of .25. From Table 10.4, a minimum of N = 28 subjects would be required to have power of 80% to obtain a significant sample result if the true popula on correla on is ρ = .50. Note that for smaller effects (e.g., a ρ2 value on the order of .05), sample sizes need to be substan ally larger; in this case, N = 153 would be needed to have power of .80. Table 10.4 Sta s cal Power for Pearson’s r
Post hoc (or postmortem) power analyses should not be conducted. In other words, if you found a sample r2 of .03 using an N of 10, do not say, “The r in my study would have been sta s cally significant if I had N of 203” (or some other larger value of N). Even if power analysis suggests that a smaller sample size is adequate, it is generally a good idea to have at least N = 100 cases where correla ons are reported, to avoid situa ons where there is not enough informa on to evaluate whether assump ons (such as normality and linearity) are sa sfied and situa ons where one or two extreme outliers can have a large effect on the size of the sample correla on. The following is slightly paraphrased from Schönbrodt (2011): From my experience as a personality psychologist, I do not trust correla ons with N < 80. … N of 100–120 is be er. In this region, correla ons get stable (this is of course only a rule of thumb and certainly depends on the magnitude of the correla on). The p value by itself is bad guidance, as in small samples the CIs are very huge … the CI for r = .34 (with N = 35) goes from .008 to .60, which is “no associa on” to “a strong associa on.” Furthermore, r is rather suscep ble to outliers, which is even more serious in small samples. Guidelines about sample size are not chiseled into stone. When data points are difficult to obtain, some mes researchers have no choice but to use small N’s. Be aware that small N’s are not ideal and that results obtained using small samples will have wide confidence intervals and may not replicate closely in later studies. 10.19 INTERPRETATION OF OUTCOMES FOR PEARSON’S R 10.19.1 When r is Not Sta s cally Significant If r does not differ significantly from 0, this does not prove there is no associa on between X and Y (i.e., you cannot accept the null hypothesis). A small r might occur due to sampling error, bivariate outliers that deflate the size of r, a nonlinear associa on between X and Y, or other sources of ar fact (see Appendix 10D). Failure to obtain a sta s cally significant correla on may be due to small sample size. If assump ons for Pearson’s r are violated, you need a different analysis to evaluate associa on. If assump ons for Pearson’s r are met, and you find r values close to 0 in many studies with large samples, and you can rule out ar facts such as bivariate outliers and nonlinear associa ons, eventually you may develop a stronger belief that X and Y may not be associated. If X and Y are not associated, then it is not plausible that X causes Y or that Y causes X. 10.19.2 When r is Sta s cally Significant On the other hand, if you find that r is large in a single study, you cannot conclude that X causes Y or that X is always associated with Y or predic ve of Y. A large r may arise because of sampling error, bivariate outliers that inflate the size of r, or other sources of ar fact that influence the magnitude of r. If a large r replicates across many studies, situa ons, and types of subjects, we gradually develop more confidence that there is a real associa on, but addi onal evidence is needed to evaluate whether the associa on might be causal. Experiments in which X is manipulated under well-controlled situa ons can provide evidence of whether X may cause or influence Y. In prac ce, large r’s across many studies lead many inves gators to retain their
provisional hypotheses that X might cause Y, or at least that X is a useful predictor of Y. However, they should never claim this is proof that X causes Y. Whether r is large or small, it is important to examine sca erplots before drawing conclusions about the nature of possible associa ons between X and Y. 10.19.3 Sources of Doubt Always keep in mind that a nonsignificant r may be an instance of Type II error (possibly because of small sample size) and that a sta s cally significant r may be an instance of Type I error. Chance is “the ever-present rival conjecture” (Polya, 1954/2014). In other words, whether we obtain large or small correla ons, the outcome may be due to chance. 10.19.4 The Problem of Spuriousness Keep in mind that some correla ons are spurious. In everyday speech, when we say that something is spurious, we mean that it is false, fake, or not what it looks like or pretends to be. It is more difficult to define spuriousness in discussions of correla ons. It is easy to iden fy a spurious correla on when it is just plain silly. Here is a commonly used example. Suppose that each case is one month of the year; X is ice cream sales in that month, and Y is homicides in that month. Imagine that there is a posi ve correla on between ice cream sales and homicides. Would this correla on be meaningful? Is it conceivable that ice cream consump on influences homicide rates (Peters, 2013)? Obviously, it would be nonsense to believe these variables are related in any meaningful way. There is no theory that would connect them or explain how one might cause the other. We know that ea ng ice cream does not cause people to commit homicide. This is an example of a spurious correla on. Silly correla ons can almost always be dismissed as spurious. A spurious correla on may occur because of chance or coincidence. Spurious correla ons arise because a third variable (some mes referred to as a “confounded variable” or “lurking variable”) is involved. For the example of ice cream sales and homicide, the confounded variable is temperature. In ho er months, homicide rates increase, and ice cream sales increase (Peters, 2013). However, correla ons that seem silly are not always spurious. Some evidence suggests that consump on of diet drinks is related to weight gain (Wootson, 2017). At first glance this may seem silly. Diet drinks have few or no calories; how could they influence body weight? In this case, I’m not sure whether the correla on is spurious or not. On one hand, some ar ficial sweeteners might alter metabolism in ways that promote weight gain. If that is the case, then this correla on is not spurious: the ar ficial sweeteners may cause weight gain. On the other hand, it is possible that weight gain increases diet drink consump on (i.e., people who worry about their weight may switch to diet drinks) or that consump on of diet drinks is related to confounded or lurking variables, such as exercise. People who don’t exercise may gain weight; if people who don’t exercise also consume diet drinks, then consump on of diet drinks will have a spurious (not directly causal) correla on with weight gain.
When correla ons are puzzling or difficult to explain, more research is needed to decide whether they might indicate real rela onships between variables. If a study were done in which all par cipants had the same daily calorie consump on, and par cipants were randomly divided into groups that did and did not consume diet drinks, then if the diet drink group gained more weight, this outcome would be consistent with the hypothesis that diet drinks cause weight gain. Further research would then be needed to figure out a possible mechanism, for example, specific ways in which diet drinks might change metabolism. As a beginning sta s cs student, you have not yet learned sta s cal techniques that can be used to assess more subtle forms of spuriousness. Even using these methods can lead to mistaken judgements whether a correla on is spurious or not. When unexpected or odd or even silly correla ons arise, researchers should not go through mental gymnas cs trying to explain them. In a study of folk culture, a researcher once reported that na ons with high milk produc on (X) also scored high in ornamenta on of folk song style (Y). There was a posi ve correla on between X and Y. The author suggested that the addi onal protein provided by milk provided the energy to generate more elaborate song style. This is a forced and unlikely explana on. For beginning students in sta s cs, here is my advice: Be skep cal about what you read; be careful what you say. There are many reasons why sample correla ons may not be good es mates of popula on correla ons. Spurious correla ons can happen; large correla ons some mes turn up in situa ons where variables are not really related to each other in any meaningful way. A decision to call a correla on sta s cally significant can be a Type I error; a decision to call a correla on not significant can be a Type II error. (Later you will learn that correla on values also depend on which other variables are included in the analysis.) Researchers o en select X and Y variables for correla on analysis because they believe X and Y have a meaningful, perhaps causal, associa on. However, correla on does not provide a rigorous way to test this belief. When you report correla ons, use language that is consistent with their limita ons. Avoid using terms such as proof, and do not say that X causes, influences, or determines Y when your data come from nonexperimental research. 10.20 SPSS EXAMPLE: RELATIONSHIP SURVEY The file called love.sav is used in the following example. Table 10.1 lists the names and characteris cs of variables. To obtain a Pearson correla on, the menu selec ons (from the menu bar above the data worksheet) are <Analyze> → <Correlate> → <Bivariate>, as shown in Figure 10.23. The term bivariate means that each requested correla on involves two (bi means two) variables. These menu selec ons open the Bivariate Correla ons dialog box, shown in Figure 10.24. The data analyst uses the cursor to highlight the names of at least two variables in the le -hand pane (which lists all the variables in the ac ve data file) for correla ons. Then, the user clicks on
the arrow bu on to move selected variable names into the list of variables to be analyzed. In this example, the variables to be correlated are named commit (commitment) and in macy. Other boxes can be checked to determine whether significance tests are to be displayed and whether two-tailed or one-tailed p values are desired. To run the analyses, click the OK bu on.
Figure 10.23 Menu Selec ons for Bivariate Correla ons The output from this procedure is displayed in Figure 10.25, which shows the value of the Pearson correla on (r = +.745), the p value (which would be reported as p < .001, two tailed), and the number of data pairs the correla on was based on (N = 118). The degrees of freedom for this correla on are given by N – 2, so in this example, the correla on has 116 df. (A common student mistake is confusion of df with N. You may report either of these as informa on about sample size, but in this example, note that df = 116 [N – 2] and N = 118.)
Figure 10.24 SPSS Dialog Box for Bivariate Correla ons
Figure 10.25 Output for One Pearson Correla on Note that the same correla on appears twice in Figure 10.25. The correla on of in macy with commit (.745) is the same as the correla on of commit with in macy (.745). The correla on of a variable with itself is 1 (by defini on). Only one of the four cells in Figure 10.25 contains useful informa on. If you have only one correla on, it makes sense to report it in sentence form (“The
correla on between commitment and in macy was r[116] = +.745, p < .001, two tailed”). The value in parentheses a er r is usually assumed to be the df, unless clearly stated otherwise. It is possible to run correla ons among many pairs of variables. The SPSS Bivariate Correla ons dialog box that appears in Figure 10.26 includes a list of five variables: in macy, commit, passion, length (of rela onship), and mes (the number of mes the person has been in love). If the data analyst enters a list of five variables, as shown in this example, SPSS runs the bivariate correla ons among all possible pairs of these five variables (as shown in Figure 10.27). If there are k variables, the number of possible different pairs of variables is given by [k × (k – 1)]/2. In this example with k = 5 variables, (5 × 4)/2 = 10 different correla ons are reported in Figure 10.27. Note that because correla on is “symmetrical” (i.e., the correla on between X and Y is the same as the correla on between Y and X), the correla ons that appear in the upper right- hand corner of the table in Figure 10.27 are the same as those that appear in the lower le - hand corner. When such tables are presented in journal ar cles, usually only the correla ons in the upper right-hand corner are shown.
Figure 10.26 Bivariate Correla ons Dialog Box: Correla ons Among All Variables in List
Figure 10.27 Correla on Output: All Variables in List (in Figure 10.26)
Figure 10.28 Ini al SPSS Syntax Generated by Paste Bu on
Figure 10.29 Edited SPSS Syntax to Obtain Selected Pairs of Correla ons Correla ons among long lists of variables can generate huge tables, and o en researchers want to obtain smaller tables. Suppose a data analyst wants to obtain summary informa on about the correla ons between a set of three outcome variables Y1, Y2, and Y3 (in macy, commitment, and passion) and two predictor variables X1 and X2 (length and mes). To do this, we need to paste and edit SPSS syntax. Look again at the Bivariate Correla ons dialog box in Figure 10.26; there is a bu on labeled Paste. Clicking the Paste bu on opens a new window, called a Syntax window, and pastes the SPSS commands (or syntax) for correla on that were generated by the user’s menu selec ons into this window. The ini al SPSS Syntax window appears in Figure 10.28. Syntax can be saved, printed, or edited. It is useful to save syntax to document what you have done or to rerun analyses later. In this example, we will edit the syntax; in Figure 10.29, the SPSS keyword WITH has been placed within the list of variable names so that the list of variables in the CORRELATIONS command now reads, “in macy commit passion WITH length mes.” It does not ma er whether the SPSS commands are in uppercase or lowercase; the word WITH appears in uppercase characters in this example to make it easy to see. In this example, each variable in the second list (in macy, commit, and passion) is correlated with each variable in the first list (length and mes). Variables can be grouped by kinds of variables. Length and mes are objec ve informa on about rela onship history that could be thought of as predictors; in macy, commitment, and passion are subjec ve ra ngs of rela onship quality that could be thought of as outcomes. This results in a table of six correla ons, as shown in Figure 10.30. If many variables are included in the list for the bivariate correla on procedure, the resul ng table of correla ons can be large. It is o en useful to set up smaller tables for subsets of
correla ons that are of interest, using the WITH command to designate which variables should be paired. Note that judgments about the significance of p values indicated by asterisks in SPSS output are not adjusted to correct for the inflated risk for Type I error that arises when large numbers of significance tests are reported. If a researcher wants to control or limit the risk for Type I error, this can be done by using Bonferroni-corrected per comparison alpha levels to decide which, if any, of the p values reported by SPSS can be judged sta s cally significant. For example, to hold the EWα level to .05, the PCα level used to test the six correla ons in Table 10.5 could be set to α = .05/6 = .008. Using this Bonferroni-corrected PCα level, none of the correla ons would be judged sta s cally significant.
Figure 10.30 SPSS Correla on Output From Edited Syntax in Figure 10.29 Note: Correla ons between variables in the first list (in macy, commitment, passion) are correlated with variables in the second list (length of present da ng rela onship, number of
mes in love). The p values in Figure 10.30 were not corrected for inflated Type I error. Table 10.5 Correla ons Between Sternberg’s (1997) Triangular Love Scale (In macy, Commitment, and Passion) and Length of Rela onship and Number of Times in Love (N = 118 Par cipants)
As a reader, you can generally assume that sta s cal significance assessments have not been corrected for inflated risk for Type I error unless the author explicitly says this was done. 10.21 RESULTS SECTIONS FOR ONE AND SEVERAL PEARSON’S R VALUES Following is an example of a “Results” sec on that presents the results of one correla on analysis. Results A Pearson correla on was performed to assess whether levels of in macy in da ng rela onships could be predicted from levels of commitment on a self-report survey administered to 118 college students currently involved in da ng rela onships. Commitment and in macy scores were obtained by summing items on two of the scales from Sternberg’s (1997) Triangular Love Scale; the range of possible scores was from 15 (low levels of commitment or in macy) to 75 (high levels of commitment or in macy). Examina on of histograms indicated that both variables had nega vely skewed distribu ons. Scores tended to be high on both variables, possibly because of social desirability response bias and a ceiling effect (most par cipants reported very posi ve evalua ons of their rela onships). Skewness was not judged severe enough to require data transforma on or removal of outliers. The sca erplot of in macy with commitment showed a posi ve linear rela onship. There was one bivariate outlier with unusually low scores for both in macy and commitment; this outlier was retained. The correla on between in macy and commitment was sta s cally significant, r(116) = +.75, p < .001 (two tailed). The r2 was .56; about 56% of the variance in in macy could be predicted from levels of commitment. This is a strong effect. The 95% CI was [.659, .819]. This rela onship remained strong and sta s cally significant, r(108) = +.64, p < .001, two tailed, even when outliers with scores less than 56 on in macy and 49 on commitment were removed from the sample. Note that if SPSS reports p = .000, report this as p < .001. The p value describes a risk for Type I error, and that risk is never zero; it just becomes exceedingly small as the values of r and N increase. Thus, p = .000 is an incorrect statement.
Here is an example of a “Results” sec on that presents the results of several correla on analyses. Results Pearson correla ons were performed to assess whether levels of self-reported in macy, commitment, and passion in da ng rela onships could be predicted from the length of the da ng rela onship and the number of mes the par cipant has been in love, on the basis of a self-report survey administered to 118 college students currently involved in da ng rela onships. In macy, commitment, and passion scores were obtained by summing items on scales from Sternberg’s (1997) Triangular Love Scale; the range of possible scores was 15 to 75 on each of the three scales. Examina on of histograms indicated that the distribu on shapes were not close to normal for any of these variables; distribu ons of scores were nega vely skewed for in macy, commitment, and passion. Most scores were near the high end of the scale, which indicated the existence of ceiling effects, and there were a few isolated outliers at the low ends of the scales. Skewness was not judged severe enough to require data transforma on or removal of outliers. Sca erplots suggested that rela onships between pairs of variables were (weakly) linear. The six Pearson correla ons are reported in Table 10.5. Using p values that were not corrected for inflated risk for Type I error, only the correla on between commitment and length of rela onship was sta s cally significant, r(116) = +.20, p < .05 (two tailed), 95% CI [.02, .367]. If Bonferroni-corrected PCα levels are used to control for the inflated risk for Type I error that occurs when mul ple significance tests are performed, the PCα level is .05/6 = .008. Using this more conserva ve criterion for sta s cal significance, none of the six correla ons in Table 10.5 would be judged sta s cally significant. The r2 for the associa on between commitment and length of rela onship was .04; thus, only about 4% of the variance in commitment scores could be predicted from length of the rela onship; this is a weak rela onship. There was a tendency for par cipants who reported longer rela onship dura on to report higher levels of commitment. However, this correla on was only judged sta s cally significant when no correc on was made for inflated risk for Type I error. 10.22 REASONS TO BE SKEPTICAL OF CORRELATIONS In prac ce, many of the assump ons required for Pearson’s r are commonly violated, and some of the assump ons are difficult to check. Appendix 10D lists many problems that can make sample values of r poor es mates of true popula on correla ons. Research reports o en include large numbers of correla ons; this leads to increased risk for Type I decision error. You should be par cularly skep cal about correla ons in research reports if:
There is no evidence that assump ons for Pearson’s r, par cularly linearity, are sa sfied (or even checked).
Scores on one or both variables are likely to have outliers, but the report does not men on outliers.
Large numbers of variables are measured, but very few correla ons are reported (this suggests that the correla ons may have been cherry-picked from dozens or hundreds of analyses).
The research report includes dozens or hundreds of correla ons, and no correc ons are made for inflated risk for Type I error.
Correla ons are reported that were not an cipated in the introduc on; this may indicate the author is hypothesizing a er results are known (Kerr, 1998).
Causal inferences are made (or suggested) for nonexperimental data. (Some advanced data analy c techniques use sta s cal controls that compensate, to some extent, for lack of experimental controls.)
Results are generalized to popula ons different from the persons in the sample and reported as if they are a “fact of nature” that applies to all situa ons.
Correla on seems likely to be spurious. Pearson correla ons are frequently included in research reports (even those that go on to discuss more complicated analyses). The term correlated is also common in mass media. The term correlated does not always refer specifically to Pearson’s r; it may refer to other analyses used to evaluate associa ons between pairs of variables. 10.23 SUMMARY Why spend so much me on details about assump ons for Pearson’s r and ar facts that can influence the value of Pearson’s r? First, Pearson’s r is a very widely reported sta s c. Second, people some mes convert other sta s cs into Pearson’s r to evaluate effect size. These can be summarized in research reviews called meta-analyses. Third, analyses that you may learn later, which include mul ple predictor and/or mul ple outcome variables, are o en based on or closely related to r values, and if these r values are incorrect, results from these more complex analyses will also be incorrect. (This is true for analyses that are part of the general linear model; there are other advanced analyses that are not based on, or closely related to, Pearson’s r.) The next chapter introduces bivariate regression; this is closely related to Pearson’s r. Thinking about data in terms of regression helps us see how large the predic on errors for individuals o en are. When X and Y variables have meaningful units of measurement (such as years of educa on and salary in dollars or euros), we can obtain a correla on between educa on and salary and evaluate the size of the correla on and the percentage of variance in salary that is predictable from educa on. We can also ask, For each 1 year of addi onal educa on, how much does salary go up in numbers of dollars or euros? In many real-world situa ons, that is a very useful way to describe effect size.
This chapter outlines many reasons why correla ons can be poor es mates of the true strength of correla ons between variables and how p values for correla ons can underes mate the true risk of Type I error. It is important to keep these limita ons in mind when you think about correla ons. APPENDIX 10A: NONPARAMETRIC ALTERNATIVES TO PEARSON’S R When data violate assump ons for Pearson’s r, it may be preferable to use a nonparametric correla on sta s c. Nonparametric sta s cs do not require the same assump ons as Pearson’s r, but they o en have their own assump ons. Because nonparametric correla ons convert scores to ranks, univariate outliers are not as problema c as for Pearson’s r. There are several nonparametric alterna ves to Pearson’s r; two of the best known are Spearman’s r and Kendall’s τ (tau). Data used to demonstrate these analyses are in the SPSS file ny example.sav in Figure 10.31. 10.A.1 Spearman’s r The scores on X and Y can be quan ta ve; they do not need to be normally distributed. If they are not already in the form of ranks, scores are converted to ranks before correla on is assessed. Spearman’s r does not require that X and Y be linearly related; it assesses whether they are monotonically related. It is denoted rs, and it is more widely used than Kendall’s tau. SPSS converts scores on X and Y into ranks (e.g., each X score is ranked among other values of X), then finds the difference between each pair of ranks; for person i, di = rank X – rank Y. The number of cases is denoted n. If there are no ed ranks, Spearman’s r can be computed as follows:
Other If there are many ed ranks, and/or the sample size is small, it may be be er to use one of the versions of Kendall’s τ (Kendall, 1962). Computa ons for this are more complex (they involve coun ng the numbers of pairs that have concordant, discordant, and ed ranks). Kendall’s τ is usually smaller than rs. When you have opened the Bivariate Correla ons dialog box (as shown in Figure 10.24), check the box for “Spearman” or “Kendall’s tau-b” to obtain nonparametric correla ons. The SPSS output for the data in the ny example.sav data set appears in Figures 10.32 and 10.33. In this example, all versions of the correla on lead to the same conclusion: There is a large posi ve correla on between X and Y. However, the exact values differ, with r > rs > τ b. You can see that if you convert the X scores to ranks (this variable is called rankx) and do a Pearson correla on on rankx, you obtain the same value as Spearman’s r. In other words, Spearman’s r is just Pearson’s r applied to ranked scores.
Figure 10.31 The ny example.sav Data Set
Figure 10.32 Pearson’s r Correla ons for Comparison With Spearman’s r and Kendall’s Tau b
Figure 10.33 Spearman’s r and Kendall’s Tau Correla ons APPENDIX 10B: SETTING UP A 95% CI FOR PEARSON’S R BY HAND The sampling distribu on for r does not have a normal shape, and therefore procedures to set up confidence intervals must be modified. Pearson’s r is converted to a value called Fisher’s Z; Fisher’s Z does have a normal distribu on. Here is an overview of the procedure.
1. Convert Pearson’s r to Fisher’s Z. This can be done by looking up corresponding values in the table in Appendix G at the end of the book. Alterna vely, Fisher’s Z can be obtained from the following formula or from online calculators.
Other
(10.8) 2. Compute a confidence interval in terms of Fisher’s Z. Part of this involves computa on of
SEZ, the standard error of Fisher’s Z (which fortunately is very simple). 3. Convert the confidence interval obtained in terms of Fisher’s Z into a CI in terms of r.
For this example, use r = .65 and N = 34.
First, look up the value of Fisher’s Z that corresponds to r = .65 in the table in Appendix G at the end of the book. Pearson’s r of .65 corresponds to Fisher’s Z of .775.
Second, you need to calculate SEZ, the standard error of Fisher’s Z. The value of SEZ depends only on N, the number of cases: Other
(10.9)
For a 95% confidence interval for a normally distributed sta s c (and Fisher’s Z is normally distributed), zcri cal values of ±1.96 bound the center 95% of the distribu on. The limits for a 95% CI for Z are:
Other
Subs tu ng in .775 for Z, 1.96 for zcri cal, and .18 for SEZ, we have: Other
The 95% CI in terms of Fisher’s Z is [.422, 1.1278]. However, this is not what we need to know. We need to know what the lower and upper limits of the CI are in terms of Pearson’s r. To obtain this final result, convert the lower and upper limits of the CI out of Fisher’s Z units back into r, using the table in Appendix G at the end of the book to find corresponding values. For a lower limit Z of .422, the lower limit value of r = .40. For an upper limit Z of 1.1278, the upper limit value of r = .81. The final answer: The 95% CI for r = .65 with N = 34 is [.40, .81] APPENDIX 10C: TESTING SIGNIFICANCE OF DIFFERENCES BETWEEN CORRELATIONS It can be problema c to compare correla ons that are based on different samples or popula ons, or that involve different variables, because so many factors can ar factually influence the size of Pearson’s r (as discussed in Appendix 10D). These ar facts might influence one of the correla ons in the comparison differently than another. If two correla ons differ significantly, this difference might arise because of ar fact (such as a narrower range of scores used to compute one r) rather than because of a difference in the true strength of the rela onship. For further discussion of problems with comparisons of correla ons and other standardized coefficients to make inferences about differences in effect sizes across popula ons, see Greenland, Maclure, Schlesselman, Poole, and Morgenstern (1991) and Greenland, Schlesselman, and Criqui (1986). Two types of comparisons between correla ons are described here. The first test compares the strength of the correla on between the same two variables in two different groups or popula ons. Suppose that the same set of variables (such as X = emo onal intelligence [EI] and Y = drug abuse [DA]) is correlated in two different groups of par cipants (Group 1 = men, Group 2 = women). We might ask whether the correla on between EI and DA is significantly different for men versus women. The corresponding null hypothesis is Other
(10.10) To test this hypothesis, the Fisher’s Z transforma on (discussed in Appendix 10B) is applied to both sample r values. Let r1 be the sample correla on between EI and DA for men and r2 the
sample correla on between EI and DA for women; N1 and N2 are the numbers of par cipants in the male and female groups, respec vely. First, using the table in Appendix G at the end of this book, look up the Z value that corresponds to r1 (Z1) and the Z value that corresponds to r2 (Z2). Next, apply the following formula: Other
(10.11) Ideally, N should be >100 in each sample; when N’s are this large, we can call the test ra o a z ra o rather than a t ra o. The test sta s c z is evaluated using the standard normal distribu on; if the obtained z ra o is greater than +1.96 or less than –1.96, then the correla ons r1 and r2 are judged significantly different using α = .05, two tailed. This test should be used only when the N’s in each sample are large. A second situa on of interest involves comparison of correla ons with the same dependent variable for two different predictor variables. Suppose a researcher wants to know whether the correla on of X with Z is significantly different from the correla on of Y with Z. The corresponding null hypothesis is Other
(10.12) This test does not involve the use of Fisher’s Z transforma ons. Instead, we need to have all three possible bivariate correla ons (rXZ, rYZ, and rXY); N is the total number of par cipants. The test sta s c (from Lindeman, Merenda, & Gold, 1980) is a t ra o of this form: Other
(10.13) The resul ng t value is evaluated using cri cal values from the t distribu on with (N – 3) df. Even if a pair of correla ons is judged to be sta s cally significantly different using these tests, the researcher should be very cau ous about interpre ng this result. Remember that correla ons can be inflated or deflated because of factors that affect the size of r, such as range of scores, reliability of measurement, outliers, and so forth (discussed in Appendix 10D). If two correla ons differ, the difference in r values may be due to ar factual factors that affect the magnitude of r differently in the two situa ons, such as the presence of outliers in one batch of data and not in the other. Differences in correla on magnitude may be due to problems such as differences in reliability of measures, in addi on to (or instead of) differences in the real strength of the associa on (ρ) for different samples or variables.
APPENDIX 10D: SOME FACTORS THAT ARTIFACTUALLY INFLUENCE MAGNITUDE OF R D.1. Pearson’s r will be deflated if the associa on between X and Y is nonlinear or curvilinear. Examples were shown in this chapter. D.2. Pearson’s r can be either inflated or deflated by the presence of bivariate outliers (depending on where they are located). Examples were shown in this chapter. D.3. Maximum possible values for Pearson’s r will be deflated if the X and Y variables have different distribu on shapes. To understand why different distribu on shapes limit the sizes of correla ons, you need to know that Pearson’s r can be used as a standardized regression coefficient to predict the standard score on Y from the standard score on X (or vice versa). This equa on is as follows: Other
(10.14) In words, Equa on 10.14 tells us that one interpreta on of correla on is in terms of rela ve distance from the mean. That is, if X is 1 SD from its mean, we predict that Y will be r SD units from its mean. If husband and wife heights are correlated +.32, then a woman who is 1 SD above the mean in the distribu on of female heights is predicted to have a husband who is about 1/3 SD above the mean in the distribu on of male heights (and vice versa). Note that we can obtain a perfect one-to-one mapping of z score loca ons, and therefore a correla on of +1.00, for X and Y scores only if X and Y both have the same distribu on shape. Figure 10.34 shows how z′Y scores are mapped onto zX scores if r = 1.00 and for other values of r that are less than 1. For r to equal 1, we need to have the same numbers of X scores and Y scores that are +1 SD, +2 SDs, and +3 SDs from their means. This can occur only when X and Y have the same distribu on shape. As an example, consider a situa on where X has a normal distribu on shape and Y has a skewed distribu on shape (see Figure 10.35). It is not possible to make a one-to-one mapping of the scores with zX values greater than zX = +1 to corresponding zY values greater than +1; the Y distribu on has far more scores with zY values greater than +1 than the X distribu on in this example. This example illustrates that when we correlate scores on two quan ta ve variables that have different distribu on shapes, the maximum possible correla on that can be obtained will ar factually be less than 1 in absolute value. Perfect one-to-one mapping (which corresponds to r = 1) can arise only when the distribu on shapes for the X and Y variables are the same. It is desirable for all the quan ta ve variables in a mul variable study to have nearly normal distribu on shapes. Be aware that if you want to compare X1 and X2 as predictors of Y, and the distribu on shapes of X1 and X2 are different, then the variable with a distribu on less similar to the distribu on shape of Y will ar factually tend to have a lower correla on with Y. This is
one reason why comparisons among correla ons for different predictor variables can be misleading. It is important to assess distribu on shape for all the variables before interpre ng and comparing correla ons. Correla ons can be ar factually small because the variables that are being correlated have dras cally different distribu on shapes. D.4. Pearson’s r may be deflated if scores on X and/or Y have restricted ranges The ranges of scores on the X and Y variables can influence the size of the sample correla on. If the research goal is to es mate the true strength of the correla on between X and Y variables for some popula on of interest, then the ideal sample should be randomly selected from the popula on of interest and should have distribu ons of scores on both X and Y that are representa ve of, or similar to, the ranges of scores in the popula on of interest. That is, the mean, variance, and distribu on shape of scores in the sample should be similar to the mean, variance, and distribu on shape for the popula on of interest.
Figure 10.34 Mapping of Standardized Scores From zX to 𝑧′𝑌 for Three Values of Pearson’s r
Figure 10.35 Failure of One-to-One Mapping of Score Loca on When X and Y Have Different Distribu on Shapes Note: For example, there are more scores located beyond 1 SD above the mean in the upper, normal distribu on than in the lower, posi vely skewed distribu on. If you try to match z scores one to one, there will not be “matches” for many of the z scores at the upper end of the normal distribu on (thus, an r of 1 cannot be obtained). Drawing not precisely to scale. Suppose that a researcher wants to assess the correla on between GPA and verbal SAT (VSAT) score. If data are obtained for a random sample of many students from a large high school with a wide range of student abili es, GPA and VSAT score are likely to have wide ranges (GPA from about 0 to 4.0, VSAT from about 250 to 800). See Figure 10.36 for hypothe cal data that show a wide range of scores on both variables. In this example, when a wide range of scores are included, the sample correla on between VSAT score and GPA is fairly high (r = +.61). However, samples are some mes not representa ve of the popula on of interest; because of accidentally biased or inten onally selec ve recruitment of par cipants, the distribu on of scores in a sample may differ from the distribu on of scores in the popula on of interest. Some sampling methods result in a restricted range of scores (on X or Y or both variables). Suppose that the researcher obtains a convenience sample by using scores for a class of honors students. Within this subgroup, the range of scores on GPA may be quite restricted (3.3 to 4.0), and the range of scores on VSAT may also be rather restricted (640 to 800). Within this subgroup, the correla on between GPA and VSAT scores will tend to be smaller than the correla on in the en re high school, as an ar fact of restricted range. Figure 10.37 shows the subset of scores from Figure 10.36 that includes only cases with GPAs greater than 3.3 and VSAT scores greater than 640. For this group, which has a restricted range of scores on both variables, the correla on between GPA and VSAT score drops to +.34. It may be more difficult to predict a 40- to 60-point difference in VSAT score from a .2- or .3-point difference in GPA for the rela vely homogeneous group of honors students, whose data are shown in Figure 10.37, than to predict the 300- to 400-point differences in VSAT score from 2- or 3-point differences in GPA in the more diverse sample shown in Figure 10.36.
Figure 10.36 Correla on Between Grade Point Average (GPA) and Verbal SAT (VSAT) Score in Data With Unrestricted Range (r = +.61) In addi on, a correla on obtained on the basis of a limited range of X and Y values should not be extrapolated or generalized to describe associa ons between these variables outside that range. In the previous example, the lack of a strong associa on between GPA and VSAT score for the high-achieving students does not tell us how these variables are related across wider ranges of scores. D.5. Pearson’s r is usually overes mated if only extreme groups are examined. A different type of bias in correla on es mates occurs when a researcher purposefully selects groups that are extreme on both X and Y variables. This is some mes done in early stages of research in an a empt to ensure that a rela onship can be detected. Figure 10.38 illustrates the data for GPA and VSAT score for two extreme groups (honors students vs. failing students) selected from the larger batch of data in Figure 10.36. The correla on between GPA and VSAT score for this sample that comprises two extreme groups was r = +.93. Pearson’s r obtained for samples that are formed by looking only at extreme groups tends to be much higher than the correla on for the en re range of scores. When extreme groups are used, the researcher should note that the correla on for this type of data typically overes mates the correla on that would be found in a sample that included the en re range of possible scores. Examina on of extreme groups can be legi mate in early stages of research, provided that researchers understand that correla ons obtained from such samples do not describe the strength of rela onship for the en re range of scores.
Figure 10.37 Correla on Between Grade Point Average (GPA) and Verbal SAT (VSAT) Score in a Subset of Data With Restricted Range (r = +.34) Note: This is the subset of the data in Figure 10.38 for which GPA > 3.3 and VSAT score > 640.
Figure 10.38 Correla on Between Grade Point Average (GPA) and Verbal SAT (VSAT) Score on the Basis of Extreme Groups (r = +.93) Note: Two subsets of the data in Figure 10.36 (low group, GPA < 1.8 and VSAT score < 400; high group, GPA > 3.3 and VSAT score > 640). D.6. Samples that include members of different groups can provide misleading or confusing informa on. It is important to realize that a correla on between two variables (for instance, X = EI and Y = drug use) may be different for different types of people. For example, Bracke , Mayer, and Warner (2004) found that EI was significantly predic ve of illicit drug use behavior for men but not for women. The sca erplot of hypothe cal data in Figure 10.39 illustrates a similar effect. Scores for men appear as triangular markers in Figure 10.39; there is a fairly strong nega ve correla on between EI and drug use for men. There was a tendency for men with higher EI to use drugs less. For women the data points are shown as circular markers; for women, drug use and EI were not significantly correlated.
Figure 10.39 Sca erplot for Interac on Between Gender and Emo onal Intelligence (EI) as Predictors of Drug Use: “Different Slopes for Different Folks” Note: The correla on between EI and drug use for the en re sample is r(248) = –.60, p < .001; the correla on within the female subgroup (circular markers) is r(112) = –.11, not significant; and the correla on within the male subgroup (triangular markers) is r(134) = –.73, p < .001. Another example shows that when data for men and women are all treated as one sample, the nature of the associa on between variables can be different in the en re sample than in the two separate groups. The hypothe cal data shown in Figure 10.40 show a posi ve correla on between height and violent behavior for an overall sample that includes both male and female par cipants (r = +.687). However, within the male and female groups, there was a very small correla on between height and violence (r = –.045 for men, r = –.066 for women). A spurious correla on between height and violence would be obtained by lumping these groups together into one analysis that did not take gender into account. Here is the situa on: Men are taller than women, and men engage in more violent behavior than women. When data for both sexes are considered, it appears that taller persons show more violent behavior. When you examine groups separately, taller men show no more violence than shorter men, and taller women show no more violence than shorter women. The apparent associa on between height and violence is spurious; height appears to be related to violence only because both variables differ by sex. Sex “explains away” or “accounts for” the apparent associa on between height and violence. The associa on between height and violence is spurious. It would be a mistake to assume, on the basis of the overall graph in Figure 10.40, that taller men are more violent than shorter men or that taller women are more violent than shorter women.
Figure 10.40 Sca erplot of a Spurious Correla on Between Height and Violence: Apparent Associa on Between Height and Violence Disappears When Men and Women Are Examined Separately Note: For the en re sample, r(248) = +.687, p < .001; for the male subgroup only, r(134) = –.045, not significant; and for the female subgroup only, r(112) = –.066, not significant. In either of these two research situa ons, it can be misleading to look at a correla on for a batch of data that mix two or several different kinds of par cipants’ data together. It may be necessary to compute correla ons separately within each group (separately for men and for women, in this example) to assess whether the variables are really related and, if so, whether the nature of the rela onship differs within various subgroups in your data. These examples point to the need to take sex into account when evalua ng how some X and Y variables are related. More generally, we need to take other variables into account when evalua ng how X and Y are related; these other variables can be dichotomous or quan ta ve. The general term for adding a third variable is sta s cal control. Examining X, Y correla ons separately within male and female groups is one form of sta s cal control for sex. More o en, data analysts use analyses such as mul ple regression that include several predictors so that each predictor can be assessed while sta s cally controlling for (or adjus ng for) the other predictor variables. D.7. If X has poor measurement reliability, its correla ons with other variables will be a enuated (reduced). Other things being equal, when X and Y variables have low measurement reliability, this low reliability tends to decrease or a enuate their observed correla ons with other variables. A reliability coefficient for an X variable is o en denoted by rXX. One way to es mate a reliability coefficient for a quan ta ve X variable would be to measure the same group of par cipants on
two occasions and correlate the scores at Time 1 and Time 2 (a test-retest correla on). (Values of reliability coefficients rXX range from 0 to 1.) The magnitude of this a enua on (of correla on) due to unreliability is given by the following formula: Other
(10.15) where rXY is the observed correla on between X and Y, ρXY is the “real” correla on between X and Y that would be obtained if both variables were measured without error, and rXX and rYY are reliabili es of the variables. Because reliabili es are less than perfect (rXX < 1) in prac ce, Equa on 10.15 implies that the observed rXY will generally be smaller than the “true” popula on correla on ρXY. The lower the reliabili es, the greater the predicted a enua on or reduc on in magnitude of the observed sample correla on. It is theore cally possible to correct for a enua on and es mate the true correla on ρXY, given obtained values of rXY, rXX, and rYY, but note that if the reliability es mates themselves are inaccurate, this es mated true correla on may be quite misleading. Equa on 10.16 can be used to generate a enua on-corrected es mates of correla ons; however, keep in mind that this correc on will be inaccurate if the reliabili es of the measurements are not precisely known: Other
(10.16) A be er way to deal with the reliability problem is through the use of latent variable models such as structural equa on models, but this is a much more advanced topic. D.8. Part–whole correla ons: If X and Y include overlapping informa on such as the same survey ques ons, their correla on will be high. If you create a new variable that is a func on of one or more exis ng variables (as in X = Y + Z, or X = Y – Z), then the new variable X will be correlated with its component parts Y and Z. Thus, it is an ar fact that the total Wechsler Adult Intelligence Scale (WAIS) score (which is the sum of WAIS verbal and WAIS quan ta ve) is correlated to the WAIS verbal subscale. Part–whole correla ons can also occur as a consequence of item overlap: If two psychological tests include iden cal or very similar items, they will correlate ar factually because of item overlap. For example, many depression measures include ques ons about fa gue, sleep disturbance, and appe te disturbance. Many physical illness symptom checklists also include ques ons about fa gue and sleep and appe te disturbance. If a depression score is used to predict a physical symptom checklist score, a large part of the correla on between these could be due to duplica on of items. D.9. Aggregated data.
Correla ons can turn out to be quite different when they are computed on individual par cipant data versus aggregated data, where units of analysis correspond to groups of par cipants. It can be misleading to make inferences about individuals on the basis of aggregated data; sociologists call this the “ecological fallacy.” Some mes rela onships appear much stronger when data are presented in aggregated form (e.g., when each data point represents a mean, median, or rate of occurrence for a geographical region). In one of the earliest studies of diet and cholesterol, Keys (1980) collected data on serum cholesterol and on coronary heart disease outcomes for N = 12,763 men from 19 different geographical regions around the world. He found a correla on near zero between serum cholesterol and coronary heart disease for individual men. That graph is not shown; it would be a cluster of more than 12,000 data points that form a circular cloud. Keys went on to look at the data another way: He aggregated data for each of the 19 regions. That is, he computed average dietary cholesterol and average serum cholesterol for the group of hundreds of men within each of the 19 regional samples. This aggrega on or averaging reduced the data to 19 X, Y data points, as shown in Figure 10.41. The correla on for the 19 data points in Figure 10.41 was +.80. This result was widely reported as some of the first evidence that a diet high in cholesterol was related to higher serum cholesterol. However, results from aggregated data cannot be used to make inferences about individual cases. In this example, the correla ons turn out to be dras cally different when the data points represent individuals than when the data points represent regions. The correla on between dietary and serum cholesterol was close to 0 when scores for the 12,763 separate individual cases were analyzed. D.10. Summary: factors that affect the magnitude of correla ons. When you see two correla ons that have different values (such as .4 for men and .6 for women or .4 for the correla on of height with salary and .3 for the correla on of self-esteem with salary), you will be tempted to interpret these differences as evidence that the strength of associa on differs for the popula ons or variables. Always keep the following in mind. Sampling error is always a problem; one correla on can be larger than the other just because of sampling error. Conduc ng sta s cal significance tests and se ng up confidence intervals takes sampling error into account but does not make it possible to rule out sampling error completely. In addi on, if women have a wider range of scores on X and Y than men, the higher correla on for women may be partly or en rely due to the wider range of scores in the female sample. If height predicts salary be er than IQ predicts salary, this might be because height measurements can be measured with greater reliability than self-esteem. In the next chapter, you’ll move on to look at regression analysis to predict Y scores from X scores.
Figure 10.41 Results of the Seven Na ons Study: Each Data Point Represents One Region Source: Reprinted by permission of the publisher from SEVEN COUNTRIES: A MULTIVARIATE ANALYSIS OF DEATH AND CORONARY HEART DISEASE by Ancel Keys, p. 122, Cambridge, Mass.: Harvard University Press, Copyright © 1980 by the President and Fellows of Harvard College. APPENDIX 10E: ANALYSIS OF NONLINEAR RELATIONSHIPS Suppose you obtain a sca erplot that shows an inverse U-shaped curve like Figure 10.9. In that example, the X predictor variable was anxiety and the Y dependent variable was exam score. This pa ern Figure 10.9 is curvilinear. As anxiety increases from 10 to 40, exam scores get be er; as anxiety con nues to increase beyond 40, exam scores get worse. If you obtain Pearson’s r for data like these, r will be small. Pearson’s r cannot tell you that there is a fairly strong rela onship that is not linear. It might occur to you that you could split the data in half and run one correla on analysis for anxiety scores that range from 10 to 40 and a second correla on analysis for anxiety scores that range from 40 to 80. Within those ranges, the associa ons between anxiety and exam score are fairly linear. However, that would not be considered an elegant solu on.
Another approach would be to use anxiety scores to divide par cipants into three anxiety groups (low anxiety, medium anxiety, and high anxiety) and then compare mean values of Y across those three groups (using analysis of variance). However, conver ng quan ta ve scores into group membership involves loss of informa on, and data analysts usually do not recommend that. In Chapter 11 you will see that when a rela onship is linear, you can predict Y from X using a regression equa on like this: Y′ = b0 + b × X. You might recall from algebra that when equa ons include terms that are squared or cubed, they correspond to curves instead of straight lines. An equa on of the form Y′ = b0 + (b1 × X) + (b2X2) can generate a curve that would fit the data in this figure. This volume does not cover regression with more than one predictor; for discussion, see Volume II (Warner, 2020). Fi ng an equa on that uses both X and X2 (anxiety and anxiety squared) to predict exam score is the best analysis for this situa on. Now consider these hypothe cal data from a different perspec ve. If this graph captures the “real” associa on between X and Y, but a study does not include people with the full range of anxiety scores, the data obtained using different ranges of X values can lead to different conclusions. For example, if a study includes only people with anxiety scores below 40, a researcher may conclude that anxiety and exam score are posi vely correlated. If the study includes only persons with anxiety scores above 40, the data would suggest that anxiety and exam score are nega vely correlated. There are two different situa ons in which researchers might conclude that these variables are uncorrelated. If the study includes only a narrow range of anxiety scores in the middle (from 35 to 45), that data set would yield a very small correla on. If the researcher selected two extreme groups (low anxiety, anxiety scores from 10 to 20; and high anxiety, anxiety scores from 60 to 70), that would also lead to a conclusion that exam scores are not related to anxiety. Researchers cannot always obtain the range of scores they want for an X predictor variable in nonexperimental studies. However, it’s important to understand that conclusions about whether and how X and Y are related may differ depending on the range of X scores included in the study. The situa on is par cularly complicated if the true rela onship between X and Y is curvilinear. APPENDIX 10F: ALTERNATIVE FORMULA TO COMPUTE PEARSON’S R Some textbooks give this formula for Pearson’s r: Other
In this equa on you can see that the “building blocks” for r are the same as for some earlier analyses (for example, Σ[X2] and [ΣX]2 are used in some formulas for the sum of squares). Another formula for Pearson’s r is based on the covariance between X and Y:
Other
(10.17) where MX is the mean of the X scores, MY is the mean of the Y scores, and N is the number of X, Y pairs of scores. The elements include values of M for both variables. Note that the variance of X is equivalent to the covariance of X with itself:
Other Pearson’s r can be calculated from the covariance of X with Y as follows. We standardize sta s cs by dividing them by SD or SE values; r is a standardized covariance. Other A covariance, like a variance, is an arbitrarily large number; its size depends on the units used to measure the X and Y variables. This can make interpreta on of covariance difficult, par cularly in situa ons where the units of measurement are arbitrary. Pearson correla on can be understood as a standardized covariance: The values of r fall within a fixed range from –1 to +1, and the size of r does not depend on the units of measurement the researcher happened to use for variables. For students who pursue advanced study, it is important to understand the rela on between covariance and correla on. In more advanced sta s cal methods such as structural equa on modeling, covariances rather than correla ons are used as the basis for es ma on of model parameters and evalua on of model fit. COMPREHENSION QUESTIONS
1. Why is it important to examine histograms, boxplots, and sca erplots along with correla ons? What informa on can you obtain from these graphs that you do not know just from the value of r?
2. Given a small set of scores for an X and Y variable, compute Pearson’s r by hand. 3. A meta-analysis (Anderson & Bushman, 2001) reported that the average correla on
between me spent playing violent video games (X) and engaging in aggressive behavior (Y) in a set of 21 well-controlled experimental studies was +.19. This correla on was judged to be sta s cally significant. In your own words, what can you say about the nature and strength of the rela onship?
4. Harker and Keltner (2001) examined whether emo onal well-being in later life could be predicted from the facial expressions of N = 141 women in their college yearbook photos. The predictor variable of greatest interest was the “posi vity of emo onal expression” in the college yearbook photo. They also had these photographs rated on physical a rac veness. They contacted the same women for follow-up psychological assessments at age 52 (and at other ages, data not shown here). Here are correla ons of these two predictors (on the basis of ra ngs of the yearbook photo) with several of their self-reported social and emo onal outcomes at age 52:
a) Which of the six correla ons above are sta s cally significant:
If you test each correla on using α =.05, two tailed? If you set EWα = .05 and use Bonferroni-corrected tests?
b) How would you interpret their results? c) Can you make any causal inferences from this study? d) Would it be appropriate for the researchers to generalize these findings to other groups,
such as men? e) What addi onal informa on would be available to you if you were able to see the
sca erplots for these variables? 5. A researcher says that 49% of the variance in blood pressure can be predicted from
heart rate (HR) and that blood pressure is posi vely (and linearly) associated with HR. What is the correla on between blood pressure and HR?
6. Explain: Conduc ng and/or repor ng numerous correla ons leads to an inflated risk for Type I error.
7. What can you do to limit the risk for Type I error when repor ng numerous correla ons? 8. Suppose that you want to do sta s cal significance tests for a set of four correla ons,
and you want your EWα to be .05. What PCα would you use to assess significance for each individual correla on if you apply the Bonferroni procedure?
9. How are r and r2 interpreted? 10. Draw a diagram to show r2 of these values as an overlap between circles (areas need not
be exact):
11. What are some of the possible reasons for large correla ons between a pair of variables
in a sample, X and Y (other than a strong associa on between X and Y in the popula on)?
12. How can bivariate outliers influence magnitudes of correla ons? 13. Suppose that two raters (Rater A and Rater B) each assign physical a rac veness scores
(0 = not at all a rac ve to 10 = extremely a rac ve) to a set of seven facial photographs. Pearson’s r can be used as an index of interrater reliability or agreement on quan ta ve ra ngs. A correla on of +1 would indicate perfect rank-order agreement between raters, while an r of 0 would indicate no agreement about judgments of rela ve a rac veness.
An r of .8 to .9 is considered desirable when reliability is assessed. For this example, Rater A’s score is the X variable and Rater B’s score is the Y variable. The ra ngs are as follows:
a) Compute the Pearson correla on between the Rater A and Rater B a rac veness
ra ngs. What is the obtained r value? b) Is your obtained r sta s cally significant (using α =.05, two tailed)? c) Are the Rater A and Rater B scores “reliable”? Is there good or poor agreement between
raters? 14. Explain how the formula r = Σ(zX × zY)/N is related to the pa ern of points in a
sca erplot (i.e., the numbers of concordant/discordant pairs). 15. What does it mean to say that a correla on is spurious? Is a silly correla on always
spurious? Are spurious correla ons always silly? 16. What assump ons are required for a correla on to be a valid descrip on of the rela on
between X and Y? 17. Only if you have read Appendix 10A: Name two common nonparametric correla ons. 18. Only if you have read Appendix 10B: What is Fisher’s Z, how is it obtained, and what is it
used for? 19. Only if you have read Appendix 10C: What test would you do for each of the following
null hypotheses about correla ons? H0: ρ0 = .0 H0: ρ1 = ρ2 H0: ρXY = ρZY
20. Only if you have read Appendix 10D: In words, what does the equa on z′Y = r × zX say? How does this equa on tell us that X and Y cannot have a correla on of 1 unless X and Y have the same distribu on shapes?
21. Only if you have read Appendix 10D: Discuss sources of ar fact that can inflate or deflate magnitudes of correla ons.
CHAPTER 11 BIVARIATE REGRESSION 11.1 RESEARCH SITUATIONS WHERE BIVARIATE REGRESSION IS USED A bivariate regression analysis provides an equa on that predicts scores on a quan ta ve Y variable from scores on a quan ta ve X variable. (The predictor variable X can be dichotomous, but if it is, we are more likely to do an independent-samples t test to see how scores on X and Y are related. The dependent variable Y cannot be dichotomous for bivariate linear regression.) For example, we might predict salary (Y) from years of educa on (X), or first-year college grade point average (GPA) (Y) from SAT score (X). Bivariate regression analysis is closely related to Pearson’s r and requires the same assump ons. We assume that the rela on between Y and X is linear. Regression can be based on raw scores for X and Y, that is, scores in the original units of measurement (such as salary in dollars or educa on in years); this regression equa on predicts salary in dollars from educa on in years. Another version of regression uses unit-free z scores or standardized scores; this version of regression predicts the z score of salary from the z score for educa on. Both versions of regression are useful. When Pearson’s r is used, it is not necessary to iden fy one variable as independent and the other as dependent. Pearson’s r is a symmetrical index of associa on; the correla on of X with Y (rXY) is the same as the correla on of Y with X (rYX). In bivariate regression it is necessary to iden fy one variable as the predictor even if that dis nc on is arbitrary. The independent or predictor variable is denoted X, and the dependent or outcome is denoted Y. Pearson’s r is a “symmetrical” index of associa on; the correla on between X and Y is the same as the correla on between Y and X. However, bivariate regression is not symmetrical; the equa on to predict scores on Y from X differs from the equa on to predict scores on X from Y. When a sca erplot is set up as part of bivariate regression, the predictor or X variable is always placed on the horizontal (X) axis. If one variable is thought to be a possible cause, or is measured earlier in me, it is used as the X variable. 11.2 NEW INFORMATION PROVIDED BY REGRESSION Some informa on obtained in bivariate regression analysis duplicates what we know from correla on analysis. When we obtained Pearson’s r, we examined the following:
1. From evalua on of the sca erplot, we evaluate whether the X, Y associa on is linear and whether bivariate outliers are present.
2. On the basis of the sign of r, we know whether the X, Y associa on is posi ve or nega ve.
3. We can interpret absolute values of r as effect size informa on and label them small, medium, or large effects.
4. We can obtain a sta s cal significance test and confidence interval (CI) for r. 5. We obtain a par on of variance for scores on Y, the outcome variable. An r2 is the
es mated propor on of variance in Y that can be predicted from X; (1 – r2) is the es mated propor on of variance that cannot be predicted from X. Par on of variance is an important concept in later study of sta s cs.
Bivariate regression analysis adds two kinds of informa on. Regression analysis provides an equa on that corresponds to a line that best predicts Y scores from X scores. The equa on is of this form: predicted Y (denoted Y′) = intercept + slope × X, or more formally: Other
(11.1) The slope coefficient (b) in a regression equa on provides a different kind of informa on that complements effect size informa on provided by r and r2. In addi on, regression provides informa on about predic on errors, discussed later in this chapter. Neither of these (slope and predic on error) is available if we only obtain an rXY correla on. This introduc on to bivariate regression sets the stage for understanding two major concepts that recur throughout sta s cs: Par on of scores into components, which, in turn, is the basis for par ons of sums of squares when we do later analyses such as analysis of variance (ANOVA). Evalua on of predic on errors (also called residuals). A major advantage of bivariate regression is that it can be extended to include more than one predictor variable (Volume II [Warner, 2020]). For example, we can go beyond the ques on of whether GPA (Y) is predictable from SAT score (X1) to ask whether GPA is predictable from SAT score (X1), high school class rank (X2), number of extracurricular ac vi es (X3), and other variables. 11.3 REGRESSION EQUATIONS AND LINES A regression equa on to predict Y from X corresponds to a straight line in an X, Y sca erplot. Consider this hypothe cal example. Years of educa on a er high school (X) is used to predict occupa onal pres ge ra ng (Y, on a scale from 0 to 100, with 100 indica ng the highest pres ge). If there were a perfect associa on (r = +1) between these variables, the sca erplot would look like the one in Figure 11.1.
Each person has scores on both X and Y. Later in this chapter you will see that analysis of the X, Y pairs of scores can produce an equa on of the following form, presented earlier as Equa on 11.1.
Other where Y′ is the predicted score on the outcome (Y) variable (some books use Ŷ to denote predicted scores). The b0 term is called the intercept or constant, and b is the regression slope or regression coefficient. The intercept b0 corresponds to the predicted score for Y when X equals 0. In some samples, the X variable does not have values of 0; in these cases b0 is s ll needed as part of the regression equa on, but it does not have a meaningful interpreta on. The slope b answers the ques on, For each one-unit increase in X, how much does the Y score change? If you need a review of graphing lines on the basis of an equa on and two data points, see Appendix 11A.
Figure 11.1 Graph of Regression Line for r = 1.00, Perfect Linear Associa on Between X and Y The graph in Figure 11.1 shows a perfect associa on between pres ge (Y ) and years of educa on (X ), with r = 1.00. For a person with X = 4 years of educa on, occupa onal pres ge is
predicted exactly by b0 + b × X = 30 + 10 × 4 = 70. When r = 1.00, the predicted value for Y is correct for all values of X. If people’s actual occupa onal pres ge scores fall exactly on a straight line, then given a person’s score on X (years of educa on), we can exactly predict that person’s score on Y (occupa onal pres ge). Of course, you know the world does not work in such a simple way. In the real world, 1 addi onal year of educa on would not result in the same increase in occupa onal pres ge for every person. Occupa onal pres ge may be influenced by other factors (and not just by educa on), such as the reputa on of the college the student a ends, the student’s IQ and GPA, and the student’s mo va on. When other variables influence Y, the correla on of X with Y is less than 1, and values of Y do not lie exactly on the regression line. Figure 11.2 shows a situa on that is more like the real world. The regression line is the same as in Figure 11.1. Actual values of Y appear as circles; with r < 1, many of these Y values do not fall on the regression line. The correla on is s ll posi ve. If data points lie reasonably near a regression line, it is s ll useful to use the equa on for that line to generate approximate predicted Y′ scores for each person, but the predic ons will have some error. When points in a sca erplot are reasonably close to a regression line, we can use values of points on the line as predic ons for actual Y values. Although the predic ons won’t be exact for many cases, they will be reasonably close. Consider persons with X = 4 years of educa on. On the basis of the equa on, predicted Y′ = 30 + 10 × 4 = 70. The predicted score Y′ = 70 on the basis of an X score of 4 appears as a triangle in Figure 11.2. Every point on the regression line that corresponds to a specific X value represents a possible predic on. The circles below and above the line represent two different actual values of Y for persons with 4 years of educa on.
Figure 11.2 Graph in Which Dots Represent Actual Y Scores for Occupa onal Pres ge and Triangle Represents Specific Predicted Y Score for X = 4 Look at the value X = 4 in Figure 11.2, and you will see that there were two persons who had X scores of 4 (their actual Y scores are represented by circles). If you subs tute X = 4 into the regression equa on (i.e., Y′ = b0 + b × X = 30 + 10 × 4 = 70), you obtain the value marked by a triangle on the regression line. For both persons with X = 4, the predicted value Y′ = 70 is used to predict their actual Y scores. The predicted score, Y′ = 70, is lower than one person’s actual score of Y = 80 and higher than the other person’s actual Y score of 57; on average, it is a reasonably good predic on for the “average” person with a score of X = 4. When researchers use bivariate regression, they are interested primarily in these two ques ons: How strong is the associa on between X and Y (for this, we use r and r2)? How much increase in Y occurs, on average, for each one-unit increase in X (for this, we examine the b slope coefficient)? Recall that r was a unit-free or standardized index of effect size. In contrast, b provides effect size informa on given in the original units of measurement. In this example b answers the ques on, For a 1-year increase in amount of educa on, how many points of change are predicted in occupa onal pres ge? When X and Y are measured in meaningful units, such as kilograms, euros, dollars, heart rate in beats per minute, and so on, the value of b provides useful informa on about the magnitude of changes in Y variable scores for each one-unit increase in the X score. When X and Y are not measured in meaningful units, the value of b is less useful. Later in the chapter you’ll see that the differences between actual Y and predicted Y′ values, called predic on errors or residuals, can be examined for addi onal informa on. To summarize, bivariate regression provides addi onal informa on that is not provided by r and r2: a regression line to predict Y′ from X and informa on about predic on errors. Discussion of regression also sheds more light on the ques on, How does r2 provide informa on about par on of variance in Y into explained versus unexplained parts? 11.4 TWO VERSIONS OF REGRESSION EQUATIONS In bivariate regression (and in some other analyses) you can write predic on equa ons in two forms. 11.4.1 Raw-Score Regression Equa on You can set up an equa on to predict raw scores on Y from raw scores on X. Raw score is the score in the original unit of measurement, such as inches, kilograms, or dollars. This equa on was shown earlier and is repeated here:
Other
The b coefficient can be called a raw-score regression coefficient, a regression slope, or a regression coefficient. The value of b0, the intercept or constant, is included to adjust predicted Y scores for the means of X and Y. The value of b0 is o en not of interest. Reports usually focus on regression slopes (these can also be called regression coefficients). The regression slope is denoted b. Values of b0 and b can be posi ve or nega ve. They can have any numerical value (ranging from much less than 1 to values of thousands or greater) depending on the units in which X and Y are measured. The b0 and b (intercept and slope) coefficients in Equa on 11.1 are the coefficients that provide the best possible predic ons for Y. The value of b0, the intercept, can have a meaningful interpreta on in situa ons where the X predictor variable can take on the value of 0, as in the example above. However, in situa ons where X never equals 0 (e.g., in a predic on of salary from IQ), the intercept can be understood as an adjustment factor that takes the means of X and Y into account when genera ng predicted scores for Y. We can use a regression equa on to generate predicted Y′ scores for individuals on the basis of their X scores. To do this, subs tute the value of X for the individual into the regression equa on and solve for Y′. For X = years of educa on beyond high school and Y = salary, the raw- score regression equa on Y′ = b0 + b × X provides two kinds of informa on. The value of b0 tells us the predicted value of salary when years of educa on beyond high school = 0. (For some X predictor variables, such as IQ, values never equal 0; for these variables the intercept has no meaningful interpreta ons.) The value of b tells us the slope: the number of units of increase in Y predicted for each one-unit increase in X. If predicted salary in dollars = 30,000 + 4,000 × years, then we predict a salary of $30,000 for persons with 0 years of educa on past high school and a $4,000 increase for each year of educa on. However, as you’ll see later in the chapter, predic ons for individuals are o en inaccurate. 11.4.2 Standardized Regression Equa on A second form of regression equa on predicts standardized (z) scores on Y from standardized (z) scores on X: Other
(11.2) In equa ons that predict z scores from z scores, the slope coefficients are called β (beta) coefficients. When a regression equa on has only one predictor variable, β is the same as r, so we can rewrite Equa on 11.2 as Other
(11.3) Predic on of z scores from z scores does not require an intercept term to adjust for the means of X and Y, because z scores have means of 0. The β coefficient is o en just called a “beta” or beta weight; it can also be called a standardized regression coefficient. Values of β, like values of r, range from –1 to +1 (except in rare situa ons in more complex analyses).
Equa on 11.3 says that for each 1 standard devia on (z score) increase in years, we predict an average salary increase of β standard devia ons. If zY′ = β × zX = +.4 × zX, then for each 1 standard devia on increase in years of educa on, predicted salary goes up by .4 (or 4/10) standard devia on. Values of β fall in the range –1 to +1. For bivariate regression, β = Pearson’s r. 11.4.3 Comparing the Two Forms of Regression Why do we have two different versions of the regression equa on? When X and Y are measured in meaningful units, the raw-score coefficient b can tell us something about magnitude of impact. For a one-unit increase in X, how much improvement or change in Y can you expect? Suppose you want to predict future salary in dollars from years of educa on a er high school. Return to this hypothe cal raw-score regression equa on: salary′ = b0 + b × years = 30,000 + 4,000 × years of educa on. A hypothe cal high school graduate similar to the people in the imaginary study could predict his or her future salary under different scenarios. For a student who obtains 4 more years of educa on a er high school, this equa on predicts salary as $30,000 + $4,000 × 4 = $46,000. If this equa on represented reality, this informa on could help a person weigh the costs of the 4 addi onal years of educa on against the predicted salary gain when deciding whether ge ng more educa on is worthwhile. For instance, if each addi onal year of educa on predicted only a $500 increase in salary (i.e., if b = 500), a person might not think this outcome very valuable. On the other hand, if b = 4,000 (i.e., if each addi onal year of educa on predicted a $4,000 increase in salary), a person would think this outcome more valuable. Scores on many variables in behavior and social science are not given in meaningful units. In these situa ons, predic ons made from raw-score regression equa ons are usually not very informa ve. Suppose you want to predict how much a person will be a racted to you (Y, rated on a scale from 1 to 15) from a ra ng of how o en you behave posi vely toward them (1 = never, 2 = occasionally, 3 = some mes, 4 = o en, 5 = all the me). If the predic on equa on is a rac on′ = 3 + 1.33 × frequency of posi ve behavior, and your posi ve behavior score is 4, then predicted a rac on on the 1-to-15 ra ng scale would be 8.32. Without more context, the increase of 1.33 units and the predicted outcome of 8.32 units on the a rac on scale on the basis of the raw-score regression equa on are meaningless. When X and Y are measured in meaningful units, a raw-score b coefficient can provide useful informa on. When X and Y are not measured in meaningful units, then removing units of measurement provides unit-free informa on about strength of associa on. In bivariate regression, the β coefficient to predict zY′ from zX is iden cal to r. Predic ons made using standardized regression equa ons (zY′ = β × zX) are less easy to understand in everyday language. If β = .50, we can say that for each 1 standard devia on increase in zX, we predict half a standard devia on increase in zY. This is an unfamiliar way to think about the situa on (but you will get used to it).
Why do researchers o en prefer to talk about β (instead of b) coefficients? Because a β coefficient, like an r coefficient, provides unit-free informa on about effect size and strength of associa on. Researchers are usually much more interested in the nature and strength of associa ons between variables than in outcomes for individual cases in the sample. One advantage of β coefficients is that they can be used (very cau ously!) to compare the predic ve strength of two or more different X predictors that are given in different units of measurement. Suppose that Y is salary; X1 is occupa onal pres ge, rated on a scale from 1 to 100; X2 is total years of educa on, ranging from 8 to 20; and X3 is household income for family of origin in tens of thousands of dollars. Suppose that you want to evaluate which of these three variables is the “strongest” predictor of salary. Is salary predicted be er by educa on, or by occupa onal pres ge? You cannot directly compare b coefficients for predictors that have different units of measurement (the magnitude of b depends on sX, the standard devia on of each X predictor). It is more reasonable to compare β coefficients, which are unit free. However, values of β, just like values of r, do not perfectly represent the true strength of associa on in the popula on. Values of r and β in your sample may over- or underes mate ρ and the true popula on standardized slope for many reasons (see the discussion of factors that influence the magnitude of r in Appendix 10D). In prac ce, some researchers report only β coefficients because they want readers to focus on the strength of the associa on. I recommend repor ng both b and β in most situa ons. As noted earlier, b coefficients tend to be most useful when X and Y are measured in meaningful units. 11.5 STEPS IN REGRESSION ANALYSIS
If outliers are an cipated, you should decide on rules for the iden fica on and handling of outliers ahead of me.
Conduct all preliminary data screening that you would do for correla on analysis. Examine histograms and/or boxplots for both X and Y. Examine an X, Y sca erplot to assess linearity and see if there are bivariate outliers.
Use the SPSS regression procedure to obtain the correla on between X and Y and both the raw-score and standardized regression coefficients to predict Y from X (as explained later in this chapter).
Graph the regression line. Use chart edi ng procedures to add a regression line to your X, Y sca erplot (or use other graphing programs such as ggplot; see Rasco, 2020, for the R supplement). These graphs do not usually appear in published research reports, but it is a good idea for researchers to examine them.
Evaluate predic on errors, also called residuals.
Here are ques ons you can answer using bivariate regression:
1. What is the nature of the rela on between X and Y? We need to know the direc on (sign); that is, does Y tend to increase as X increases? The values of b, β and r all have the same sign, and the sign will tell you whether Y increases as X increases.
2. Is b sta s cally significant? (When there is only one predictor variable, if r is significant, then b is also significant.) What is the CI for b, the raw-score coefficient?
3. What is the effect size? When scores are not given in meaningful units, r and β are be er ways to describe effect size. When scores on both X and Y are given in meaningful units, the raw-score b coefficient provides a useful way to assess effect size in terms of real- world, prac cal, or clinical significance. For example, how much does each one-unit increase in a drug reduce blood pressure?
4. How is the variance in Y par oned between variance predictable from X (r2) and variance not predictable from X, (1 – r2)? (1 – r2) is also called the propor on of error variance.
5. Do plots of residuals indicate viola ons of assump ons for bivariate linear regression?
11.6 PRELIMINARY DATA SCREENING Preliminary data screening for bivariate regression is the same as for Pearson correla on. When we examine a sca erplot in a regression analysis, the X or predictor variable is conven onally graphed on the X or horizontal axis of the sca erplot, and the Y or outcome variable is graphed on the ver cal axis. Preliminary examina on of the univariate distribu ons of X and Y tells the researcher whether these distribu ons are reasonably normal, whether there are univariate outliers, and whether the ranges of scores on both X and Y are wide enough to avoid problems with restricted range. Preliminary examina on of the sca erplot enables the researcher to assess whether the X, Y rela on is linear, whether the variance of Y scores is reasonably equal across levels of X, and whether there are bivariate outliers that require a en on. Because most of the data screening is the same as for correla ons, details of data screening procedures are not repeated. The assump on that Y scores have the same variances for each value of X is o en not men oned in discussion of assump ons for correla on. This assump on is called homoscedas city (or equality of variance in Y across levels of X). Figure 11.3 shows hypothe cal data that violate this equal variance assump on. Imagine that you plot monthly household water use (Y) with household income (X). Low-income households may all use about the same (low) amount of water. However, high-income households may include people who live in small but expensive apartments (who may not use very much water) and people with huge estates (who fill swimming pools and water vast lawns). The ver cal arrows indicate the approximate range or variance of scores for low- versus high-income groups. Variance in water use is much greater for high values of income than low values of income. In this situa on, Pearson’s r would not be a complete descrip on; informa on about the differences in variances in amount of water use for different levels of income would also be needed.
Figure 11.3 Hypothe cal Data: Water Use Plo ed by Household Income Showing Heteroscedas city Viola on of this assump on would result in larger predic on errors in the outcome variable (water use) for people with high incomes than for people with low incomes. 11.7 FORMULAS FOR BIVARIATE REGRESSION COEFFICIENTS We want to find values of b0 and b that give us the equa on that falls as close as possible to all the points in the sca erplot. Another way to say this is that we want coefficients that minimize the predic on errors. The predic on error or residual for each case is just the difference between each person’s actual value of Y and the value of Y′ predicted using the regression line: Other
(11.4) If Ann’s actual salary Y is $40,000, and her predicted salary, Y′, is $43,000, the predic on error is (40,000 – 43,000) = –3,000. Her actual salary is $3,000 lower than the salary predicted for her using the regression line. The “best” regression equa on is the one that provides predicted values (Y′) that are as close as possible to the actual Y. We need to summarize informa on about the magnitude of predic on errors across all persons in the sample. Can we just add up the predic on errors? No. I will tell you (without proof) that the sum of the predic on errors across all cases, , always equals 0. This should not surprise you; you have seen that devia ons o en sum to zero. Therefore, adding predic on errors won’t provide useful informa on. We encountered the same problem when we wanted to compute a sample variance, because the sum of devia ons of X scores from the sample mean was also 0 in that situa on. Recall that the “bag of tricks” in sta s cs is fairly small. The solu on that was used to compute the variance of X was to square
each devia on and then sum the squared devia ons. The same trick is used here. We compute SSE (sum of squared predic on errors) as follows: Other
(11.5) Equa on 11.5 says that we calculate a predic on error for each individual case, square each predic on error, then sum the squared errors for all cases. We want to obtain the values of b0 and b that minimize SSE. The method used by mathema cal sta s cians to derive the formulas to compute b0 and b is called ordinary least squares (OLS). Many other sta s cs you’ll learn are also based on OLS es ma on. Appendix 11B explains how the formulas for b0 and b that yield the minimum SSE (the smallest predic on errors) were obtained. Fortunately, because op mal formulas for the best values of b0 and b are known, you don’t have to solve this problem every me you want to do a regression. In prac ce, here is how to calculate the es mated values of b and b0: First, find the means and standard devia ons for X and Y and their bivariate correla on rXY. (By now, these computa ons should be familiar.) Once you have the values of the es mate for the raw-score slope b to predict Y from X is: Other
(11.6) See Appendix 11C for an alterna ve equa on to compute the es mate of b. Note that the ra o of standard devia ons in this equa on is sY/sX or, more generally,
To understand this ra o, you can think of it this way. When you want to predict Y scores from X scores:
First you need to divide X scores by sX to take them out of the X score units of measurement.
You mul ply that result by r; this provides informa on about the sign and strength of linear associa on between Y and X scores.
Then you mul ply by sY to convert the result into units of the Y outcome variable. We also need to adjust predic ons to take into account the differences between the means of X and Y. The intercept b0 (the predicted value of Y when X = 0, or the point on the graph where the regression line crosses the Y axis) can be computed from the means of X and Y, and the raw- score slope b, as follows: Other
(11.7) , where MY is the mean of the Y scores in the sample and MX is the mean of the X scores in the sample. Including b0 in a regression equa on adjusts for differences between the means of X and Y. Equa on 11.6 makes it clear that b is essen ally a rescaled version of r. Unlike the unit-free Pearson’s r, which has a range from –1 to +1, the range of b depends on the units in which X and Y are measured. No ce several implica ons of Equa on 11.6:
If r equals 0, then b also equals 0. The sign of b is determined by the sign of r. The magnitude of the raw-score b coefficient depends on the standard devia ons of
both variables. Values of b can range from extremely small into the thousands and above.
As sY (the standard devia on of the outcome variable) increases, holding sX constant, b increases.
On the other hand, as sX (the standard devia on of the predictor variable) decreases, holding sY constant, b also increases.
Because b is a rescaled version of r, factors that can inflate or deflate r (discussed in Appendix 10D) can also influence the magnitude of b.
The β (beta coefficient) to predict zY′ from zX is β = r. The standard-score (z-score) version of the regression equa on does not require an intercept to adjust for means of the variables, because z scores have means of 0. Because β = r, factors that can inflate or deflate r also influence the magnitude of β. Because the magnitudes of b and β are influenced by many of the same problems that can make sample r’s poor es mates of the true popula on correla on ρ, comparisons of values of b or β (across samples or predictor variables) should only be made with great cau on. 11.8 STATISTICAL SIGNIFICANCE TESTS FOR BIVARIATE REGRESSION The null hypothesis for a regression with one predictor variable can be stated as follows: H0: b0 = 0. (We can also test the null hypothesis H0: b0 = 0, but this is usually not of interest.) When you run the SPSS regression procedure, you will obtain a t ra o to test this null hypothesis. As in earlier situa ons, this t test has the following form: Other
(11.8) with (N – 2) df. An es mate of SEb is needed to set up the t ra o. SPSS provides this es mate; if you need to calculate it by hand, the formula is as follows:
Other
(11.9) This t ra o is evaluated rela ve to a t distribu on with N – 2 df, where N is the number of cases. SPSS provides this t test along with a two-tailed p value. SPSS also provides a test of whether the intercept b0 equals zero; however, this test is rarely of interest and usually is not reported. 11.9 CONFIDENCE INTERVALS FOR REGRESSION COEFFICIENTS Using SEb, the upper and lower limits of the 95% CI around b can be calculated using the usual formula:
Other
(11.10)
Other
(11.11) where tcrit is the cri cal value of t that separates the bo om 2.5%, the middle 95%, and the top 2.5% of the area in a t distribu on with N – 2 df. SPSS provides the upper and lower limits of the 95% CI for the b coefficient. 11.10 EFFECT SIZE AND STATISTICAL POWER For a bivariate regression, the assessment of sta s cal power can be done using the same sta s cal power tables as those used for Pearson’s r. In general, if the researcher assumes that the strength of the squared correla on between X and Y in the popula on is weak, the number of cases required to have power of .80 or higher is rather large. Tabachnick and Fidell (2018) suggested that the ra o of cases (N) to number of predictor variables (k) should be on the order of N > 50 + 8k or N > 104 + k (whichever is larger) for regression analysis. This implies that N should be at least 105 when using one predictor variable. This is consistent with sample size sugges ons from Schönbrodt (2011), discussed in Chapter 10. Even if sta s cal power tables may suggest that N < 100 can give adequate sta s cal power for significance tests of b and r, it is preferable to have N > 100. 11.11 EMPIRICAL EXAMPLE USING SPSS: SALARY DATA A small set of hypothe cal data for the empirical example is given in the SPSS file salary.sav (with N = 50 cases). The predictor, X, is the number of years employed at a company; the outcome, Y, is the annual salary in dollars. The research ques on is whether salary changes in a systema c (linear) way as years of job experience or job seniority increases. In other words, how many dollars more salary can an individual expect to earn for each addi onal year of employment?
To run a bivariate linear regression, make the following menu selec ons: <Analyze> → <Regression> → <Linear> (as shown in Figure 11.4). This opens the main dialog box for the SPSS Linear Regression procedure, which appears in Figure 11.5. The name of the dependent variable (salary) and the name of the predictor variable (years) were moved into the panes marked “Dependent” and “Independent(s)” in this main dialog box. It is possible at this point to click OK and run a regression analysis; however, for this example, the following addi onal selec ons were made. The Sta s cs bu on was clicked to open the Linear Regression: Sta s cs dialog box; a checkbox in this window was marked to request values of the 95% CI for b, the slope to predict raw scores on Y from raw scores on X (Figure 11.6).
Figure 11.4 SPSS Menu Selec ons for Linear Regression
Figure 11.5 Main Dialog Box for SPSS Linear Regression Procedure
Figure 11.6 SPSS Regression Dialog Box to Request Sta s cs 11.12 SPSS OUTPUT: SALARY DATA To see the equivalence between Pearson’s r and parts of the results of the bivariate regression result, Pearson’s r between years and salary was obtained using the SPSS correla ons procedure; results appear in Figure 11.7. Complete SPSS regression output includes addi onal informa on (discussed in Volume II [Warner, 2020]). Figure 11.8 shows the results needed to find the propor on of predicted and unpredicted variance (R2 and 1 – R2) and to write out the two versions of the regression equa ons (raw score and standardized). From the top of Figure 11.8, the propor on of variance in salary that can be predicted from years of experience is r2 or R2, that is, .688 or about 69%. When regression includes more than one predictor, mul ple R tells us how well the en re set of predictor variables can predict Y. In this example, the regression equa on has only one predictor. When there is only one predictor variable, Pearson’s r between X and Y is the same as mul ple R for the equa on that uses X to predict Y. (You can ignore the other informa on in the top panel of Figure 11.8 for now. The standard error of the es mate is discussed later in the chapter and is not usually included in research reports. The adjusted R2 value is only used when a regression has more than one predictor variable.)
Table 11.1 relabels and rearranges the elements of the coefficient table in the SPSS output so that you can relate them to terms in the textbook. The top panel of the SPSS output in Figure 11.7 gives results for R (capital R is called mul ple R). On the basis of informa on in Table 11.1 we can write the unstandardized regression equa on to predict salary in dollars from experience in years, as follows:
Other
Figure 11.7 Pearson’s r for Years and Salary
Figure 11.8 Selected SPSS Linear Regression Output: Predic on of Salary From Years at Work Table 11.1 Unstandardized Regression Coefficients for Predic on of Salary From Years of Educa on
The predicted salary for a person with X = 0 years of experience is $31,416.72, that is, b0. In this example, 0 is a possible value for years of experience. In this example, $31,416.72 represents star ng salary. The slope b tells us the predicted increase in salary (in dollars) for each addi onal one-unit increase in experience; this is $2,829.57. This corresponds to the average salary raise per year. The value of the beta (β) coefficient also appears in Figure 11.8. SPSS denotes the column for the standardized coefficients with the word Beta. In this example, β = .83. It is not just a coincidence that β = r = R. This always happens when regression includes only one predictor variable. We can use the value of β to write the following standardized regression equa on:
Other This equa on says that, for each 1 standard devia on increase in years of experience, the z score for salary goes up by .83 standard devia ons. If salary were perfectly related to years of experience (i.e., r = 1.00), then for each 1 SD increase in experience, there would be a 1 SD increase in salary. Values of β, like values of r, are limited to the range –1 to +1. Each t ra o tests the null hypothesis that a regression coefficient equals zero (using a two-tailed test). Each t ra o is the slope divided by the standard error of that slope. For b, t = b/SEb. The df for each t test is N – 2, where N is the number of cases in the sample. The t ra os in this example have 48 df. We can say that the b slope is sta s cally significant, t(48) = 10.295, p < .001, two tailed. The number in parentheses a er a t value is the df. A 95% CI was also obtained for each coefficient. 11.13 RESULTS SECTION: HYPOTHETICAL SALARY DATA Following is an example of a “Results” sec on that presents the results of the foregoing bivariate regression of salary and job experience. Results A bivariate regression was performed to evaluate how well salary (in dollars) could be predicted from years of job experience in a hypothe cal data set with N = 50 cases. Preliminary data screening [not reported in this chapter] indicated that scores on salary were reasonably normally distributed. Scores on years were posi vely skewed. There were no missing data, and no outliers were removed in the subsequent analysis. A sca erplot indicated that the rela on
between X and Y was posi ve and reasonably linear, and there were no bivariate outliers. The correla on between years of job experience and salary was sta s cally significant, r(48) = .83, p < .001. The regression equa on for predic ng star ng salary from years of job experience was found to be Y′ = 31,416.72 + 2,829.57 × X. The r2 for this equa on was .69; that is, 69% of the variance in salary was predictable from years of job experience. This is a very strong rela onship; increases in years of job experience were closely related to increases in salary. The slope to predict salary from years, b = 2,829.57, was sta s cally significant, t(48) = 10.295, p < .001, two tailed. The 95% CI for the slope to predict salary in dollars from years of job experience ranged from 2,277 to 3,382; for each 1-year increase in job experience, the predicted salary increased by about $2,300 to $3,400. A sca erplot of the fi ed regression line appears in Figure 11.10. 11.14 PLOTTING THE REGRESSION LINE: SALARY DATA Now let’s examine the sca erplot and regression line for the salary data. A er you run the SPSS sca erplot procedure to create the graph, click on the graph to open it in the SPSS Chart Editor. The ini al Chart Editor window appears in Figure 11.9. From the menu bar at the top of the Chart Editor, select <Elements>, then select <Fit Line at Total> from the drop-down menu, as shown in Figure 11.10. In the Proper es dialog box, shown on the right side of Figure 11.11, select the radio bu on for “Linear.” The best fi ng regression line will appear on the graph. You can close the chart editor or do addi onal edi ng to improve the appearance of the graph.
Figure 11.9 Ini al View in SPSS Chart Editor
Figure 11.10 SPSS Menu Selec ons to Add Fit Line: <Elements> → <Fit Line at Total>
Figure 11.11 SPSS Dialog Box for Proper es of Fit Line: Select “Linear”
11.15 USING A REGRESSION EQUATION TO PREDICT SCORE FOR INDIVIDUAL (JOE’S HEART RATE DATA) In an earlier chapter, this ques on was raised: What might explain why some people have higher heart rates than average? Bivariate regression provides a way to answer this ques on (keeping in mind that no one batch of data, and no one analysis, can provide a defini ve answer). If mean heart rate for a sample is 81.2 beats per minute, and Joe’s heart rate is 88 beats per minute, that tells us Joe’s heart rate is (88 – 81.2) = 6.8 beats per minute above average. Can we explain why Joe’s heart rate is 6.8 beats per minute higher than average, or predict that his heart rate is this far above average? Bivariate regression provides a way to think about this. Data for this example are in Figure 11.12 and in the file named joeshr.sav. For now, look only at the first two columns. Suppose Joe is a member of a sample, and all 10 members of the sample have scores for anxiety (the X independent variable) and hr (the Y dependent variable). We’ll assume Joe is the person whose row of data is highlighted in Figure 11.12; he has anxiety = 12 and hr = 88. For now, consider only the first two columns in this data file. Using the same regression procedures as in the previous example, we can obtain a bivariate regression to predict heart rate from anxiety for these data. Results appear in Figure 11.13. This example includes another part of regression output, an ANOVA table, not shown in the previous example. The abbrevia on ANOVA stands for analysis of variance. The ANOVA reports how the total sum of squares (SS) for the Y variable can be divided into an SS for predicted parts of Y scores and an SS for errors in predic on. Appendix 11D provides a fully worked example that demonstrates how a single person’s score (such as Joe’s) can be divided into a predicted part and a predic on error and how informa on about the magnitudes of these two score components can be summarized across all persons in the sample by squaring and summing them. We can see that the correla on between heart rate and anxiety is R = .751, R2 = .565. A li le more than half the variance in hr scores is predictable from anxiety in this sample. Because R is posi ve, we know that the associa on is posi ve (higher anxiety tends to go with higher hr). From the coefficient table, we see that the b slope for the raw-score regression is 1.029 and that it is sta s cally significant (p = .012). Confidence intervals for b were not obtained in this example. (Unless otherwise stated, the criterion α = .05, two tailed, is used to decide whether p values are sta s cally significant.)
Figure 11.12 Heart Rate and Anxiety in joeshr.sav File
Figure 11.13 SPSS Output for Bivariate Regression to Predict Heart Rate From Anxiety Using joeshr.sav Data The raw-score regression equa on to predict heart rate in beats per minute from anxiety scores is:
Other
Figure 11.14 Compute Statement Used to Predict Heart Rate Scores From Raw-Score Regression Equa on
Figure 11.15 Compute Statement Used to Calculate Residuals In words, for an anxiety score of 0, we predict hr of 7.1833 beats per minute. (You would need to look at the data file or sca erplot to see whether anxiety of 0 is a score that occurs in this sample). For each 1-point increase in anxiety, we expect about a 1 beat per minute increase in hr. The standardized score regression to predict zhr is:
Other In words, for each 1 standard devia on increase in anxiety, we predict a .75 standard devia on increase in heart rate. We can use the regression equa on in the SPSS output to generate a predicted score Y′ for Joe. The SPSS regression procedure can calculate and save predicted scores, but it’s useful for you to see explicitly how predic ons are made by applying an equa on to scores. The raw-score predic on equa on from the SPSS output in Figure 11.12 was: hr′ = 71.833 + 1.029 × anxiety. To generate a predicted heart rate for Joe, we subs tute in Joe’s score for anxiety: hr′ = 71.833 + 1.029 × 12 = 84.18, rounded to 84.2. Given Joe’s anxiety, we predict his heart rate to be 84.2 beats per minute. His actual hr was higher, so our predic on isn’t exactly correct. To obtain predicted hr scores for all 10 persons in the sample, we can use the regression equa on to compute them, as shown in Figure 11.14.
The variable predictedhr corresponds to Y′, the predicted score for each person. We can now find the predic on error or residual, the difference between actual and predicted heart rate: Y – Y′. To obtain residuals for all cases, we use the compute statement in Figure 11.15 to subtract the variable predictedhr from hr; we call the resul ng new variable residualhr. For Joe, with Y = 88 and Y′ = 84.2, the predic on error (Y – Y′) = 88 – 84.2 = 3.8. His actual heart rate was 3.8 beats per minute faster than the equa on predicted. Now let’s look at the other parts of Joe’s hr score. We know that part of his score was not predicted by the regression equa on. How much of his score was predicted? The predicted part of his score corresponds to (Y′ – MY). If we knew nothing about other variables that might predict hr, the best predicted hr for all persons in the sample would be MY, mean hr. The Y′ predicted value is an adjustment to the predic on; how much higher or lower than MY would we expect each person’s score to be when we generate a predicted heart rate using anxiety? We now have three pieces of informa on: Joe’s actual hr, Y = 88 Joe’s predicted hr, Y′ = 84.18 (rounded to 84.2) The mean hr for the sample, MY = 81.2 We can use these numbers to divide Joe’s total devia on from the sample mean (Y – MY) into two parts: Total devia on of Joe’s heart rate from mean = (Y – MY) = (88 – 81.2) = 6.8. Joe’s hr is 6.8 beats per minute above the average in the sample. Difference between Joe’s actual and predicted hr = (Y – Y′) = (88 – 84.2) = 3.8. Joe’s actual hr was about 4 beats per minute higher than the value predicted by the regression equa on from his anxiety. This was the part of Joe’s hr that the regression did not predict or explain. This is Joe’s residual or predic on error. (Y′ – MY) is the difference between Joe’s predicted heart rate (from the regression) and the “default” predic on we would make in the absence of other informa on. If no other informa on were available, we would predict his hr to be MY, the sample mean. This devia on is (Y′ – MY) = (84.2 – 81.2) = 3. In words, the regression equa on predicted, on the basis of his anxiety, that Joe’s heart rate would be 3 beats per minute faster than the sample mean. You can check to see that Joe’s total devia on (column 1) equals the sum of the residual (column 2) and the predicted part (column 3) of the score:
We can write an equa on to summarize the components or pieces of Joe’s hr score:
Other In words, Joe’s heart rate can be constructed from the sample mean (81.2), plus the part of Joe’s score that could not be predicted from his anxiety (3.8), plus the part of Joe’s score that could be predicted from anxiety (3). We can do this for every individual in the sample (see fully worked example in Appendix 11D). Note that the values of (Y – Y′) and (Y′ – MY) will differ for other persons, and in many cases, one or both of these score components will be nega ve numbers. You may find it helpful to calculate predicted scores and devia ons for a few other cases and compare your results with those in Appendix 11D. We can locate Joe’s score in a sca erplot, as shown in Figure 11.16. Joe’s actual and predicted scores are both above the regression line. Figure 11.17 zooms in on Joe’s score to show the three devia ons that describe Joe’s situa on. Each devia on corresponds to a ver cal arrow in Figure 11.17. You can see (Y – MY) (Joe’s total devia on from average), (Y – Y′) (Joe’s predic on error), and (Y′ – MY) (the predicted part of Joe’s score). No ce that when you sum the predicted and residual components labeled in Figure 11.16, the values of Y′ “cancel out,” so they add up to (Y – MY):
Other
(11.12) For Joe, these values are 6.8 = 3.8 + 3. We have broken down Joe’s total difference from mean heart rate (6.8) into two parts: a part that cannot be predicted from anxiety, (Y – Y′) = 3.8, and a part that can be predicted from anxiety, (Y′ – MY) = 3. The fact that Joe has a higher than average hr can be partly (but not completely) predicted from his anxiety. Some mes predic on errors ma er to people. Gray (1985) used regression analysis to predict faculty salaries from variables such as years at the college or university. Her methods were widely used by others to evaluate possible discrimina on. Consider the case of a faculty
member whose actual salary is $5,000 less than the predicted salary from a regression equa on and that the faculty member has 10 years of experience. The predicted salary from the regression equa on for 10 years of experience is o en fairly close to the actual salaries for other faculty who have 10 years of experience. When the faculty member sees that he or she is paid less than other faculty who have the same amount of experience, that faculty member may well wonder, Why am I paid less than other faculty who are similar to me? Is it because I am a poor teacher? Because I have fewer research accomplishments and grant dollars than other faculty members with 10 years of experience? (The faculty member might think, okay, that’s fair.) Or is it because the department chair who recommends star ng salaries and amounts for annual raises does not like me? Or because I am female, or a person of color? If that is why I am paid less than other faculty with 10 years of experience, that is not fair. Gray described the use of residuals from regression to predict faculty salary as evidence in court cases about poten al discrimina on.
Figure 11.16 Sca erplot for Anxiety and Heart Rate Scores (Joe’s Score Iden fied)
Figure 11.17 The Two Components of Joe’s Score in the Sca erplot: Predicted Part of Score and Residual Except in situa ons like Gray’s (1985) analysis of salary data, researchers are usually more interested in the strength of associa on between a predictor and an outcome variable and not very interested in the outcomes for individual cases. 11.16 PARTITION OF SUMS OF SQUARES IN BIVARIATE REGRESSION As the next step, we will want to summarize informa on about the sizes of these three devia ons across all persons in the sample. Like the situa on where we compute the variance
of X by looking at devia ons of X from its mean, the devia ons in the regression situa on will sum to zero. We get out the usual bag of tricks to avoid this problem: We will square the devia ons before we sum them. (For the sample variance of X, the equa on that includes all the opera ons is We’ll do something similar for the three different kinds of devia ons we obtain in regression. As before, we will call these sums of squared devia ons sums of squares (SS). In the regression situa on, we will have three different SS terms:
The total devia on (Y – MY) is the sum of the two devia ons for unexplained and explained parts of scores:
Other
(11.13) First we can square all the devia ons1 (e.g., square the en re equa on in 11.12): Other
(11.14) Then sum these squared values across all par cipants in the sample: Other
(11.15) he i subscripts in these equa ons indicate that a devia on is computed and squared for each person (for person numbers i = 1, 2, 3, … N), and then the squared devia ons are summed across all persons in the sample. In Equa ons 11.13, 11.14, and 11.15: MY is the mean of the dependent variable Y 𝑌 ′ 𝑖 is the predicted value of Y for each case (i.e., Y value predicted from X) Yi is the actual value of Y for each case (Yi –𝑌′𝑖) is the predic on error (i.e., the difference between predicted and actual Y for each case) The terms in Equa on 11.15 can be given names, and we can relabel the equa on as follows: Other
(11.16)
SSresidual can also be called SSE or SSerror (varia on not explained by the regression). SSregression can also be called SSexplained or SSpredicted (varia on explained by the regression). Just as (Y – Y′) and (Y′ – MY) divided an individual heart rate score into unexplained and explained parts, SSresidual and SSregression par on (or divide) SStotal into unexplained and explained varia on in hr. The idea that we can par on total SS for an outcome variable of interest, such as heart rate, into varia on we can predict versus varia on we cannot predict from an analysis is one of the most fundamental ideas in sta s cs. Par on of SS is a basic building block for most analyses you will learn later. A major goal of research and sta s cal analysis is to evaluate what propor on of variance we can explain or predict. Much of the informa on in the SPSS output for bivariate regression is redundant. Capital R (which is called mul ple R when we have more than one predictor) is the same as Pearson’s r between X and Y. Both R and r are the same as the β coefficient in the standardized regression equa on. (When mul ple predictors are used in regression, these pieces of informa on begin to tell us different things.) SSregression will be small if predic on errors are small, and usually researchers do want predic on errors to be small. This discussion explains why, and how, the SS terms in the ANOVA por on of the regression output (in Figure 11.13) were obtained. At this point, you can ignore the df, MS, and F values in the ANOVA source table; these will be explained in later chapters about ANOVA. A completely worked example, in which SPSS is used to compute these devia ons for each case, square the devia ons for each case, and then sum the squared devia ons for all cases appears in Appendix 11D. You may find it instruc ve to compute a few devia ons, square a few devia ons, and add up squared devia ons by hand and then use the results in the appendix to check your work. Values of SS terms can range from 0 to very large numbers. By itself, a single SS value can’t be interpreted as effect size. However, because SSresidual and SSregression sum to SStotal, we can figure out the propor on or percentage of these SS terms rela ve to SStotal and use that to talk about percentage of predicted variance. In Chapter 10 you saw that r2 corresponds to propor on of explained variance. This chapter presents another way to es mate propor on of explained variance: SSregression/SStotal. In fact, these two ideas are equivalent: r2 and (1 – r2) can be calculated from the SS terms in bivariate regression as shown in Equa ons 11.17 and 11.18.
Other
(11.17) Other
(11.18) These equa ons indicate that r2, and SSregression/SStotal, is the propor on of variance in Y that is linearly predictable from X; (1 – r2), and SSresidual/SStotal, is the propor on of variance in Y that is not linearly predictable from X. 11.17 WHY IS THERE VARIANCE (REVISITED)? This is the first detailed example you have seen where total varia on in scores for a variable (now we are talking about a Y dependent variable) was divided into “unexplained” versus “explained” components. You will see similar par ons of SS in future analyses, some mes under different names, such as SSbetween groups versus SSwithin groups in ANOVA. We can now answer an earlier ques on about heart rate. What factors might explain why some people have hr above average and some people have hr below average? The results for data in the joeshr.sav file show that, in this imaginary sample, differences in anxiety predicted a bit more than half the of the variance in hr (r2 = .565). If confirmed in real studies, and if we can show that experimental manipula ons of anxiety lead to increases in hr under controlled condi ons, we begin to have a case that anxiety may be one of the factors that can poten ally explain varia on in hr. It might be possible to explain more of the variance in hr by adding more predictor variables (e.g., smoking, fitness, alcohol and drug use). 11.18 ISSUES IN PLANNING A BIVARIATE REGRESSION STUDY Here are issues to keep in mind when planning a bivariate regression study. Choice of variables: Usually X, Y pairs of variables are selected because researchers suspect that X may cause or influence Y, or that an X variable that happens earlier in me than Y may be a useful predictor of Y. If the regression to predict Y from X is sta s cally significant with a reasonably large effect size, this outcome is consistent with a hunch that X influences Y, but not proof of that hunch. You must dis nguish between independent and dependent variables when you set up a regression analysis. Types of variables: The predictor or independent variable X can be quan ta ve or dichotomous (but not categorical with more than two categories). The outcome or dependent variable Y must be quan ta ve. Ideally the sample of par cipants or cases is randomly sampled from an actual popula on of interest. If a convenience sample is used, generaliza ons can be made only to hypothe cal popula ons that resemble the sample.
Ranges of scores on X and Y: You cannot generalize predic ons beyond the ranges of X and Y values in your sample, so make sure that the ranges of scores in your sample are similar to ranges of scores in the popula ons of interest. Some variables (such as income) tend to have outliers. If outliers are likely, then establish rules for iden fica on and handling of outliers before data collec on. Document the number of outliers encountered and how they were handled. Sta s cal power analysis (same as for Pearson’s r) can be used to assess whether the N in the sample provides sufficient power to detect the effect size. N should be near or above 100 (if possible) even when power analysis suggests that a smaller sample size might be adequate. Regression analysis assumes that scores on X and Y are reliable and valid. Reliability means consistency. Some variables (such as height in inches) are easy to measure consistently. It can be much more difficult to obtain consistent or reliable results for variables such as depression. Find informa on about measurement for your variables, if you can. When researchers develop measures for depression and other psychological variables, they publish informa on about reliability; a reliability coefficient or Cronbach’s alpha ranges from 0 to 1 with higher values indica ng be er reliability. Reliability above .70 is o en considered adequate. Construct validity is whether a variable really measures what it purports to measure. When new measures are proposed, researchers also assess validity, o en using correla ons between the new measure and accepted older measures. When you read past studies similar to your studies, look for informa on about the reliability and validity of measures, and reference that informa on when you describe your own study. If your variables do not measure the construct you are interested in, the results of your analysis cannot be very useful. Carmines and Zeller (1979) provided an accessible introduc on to reliability and validity. You need to dis nguish between predictor and outcome variables in bivariate regression (that dis nc on was not needed for Pearson’s r). Otherwise, the same issues apply to design of a bivariate regression study as for a bivariate correla on study. It is important to avoid restricted scores on both X and Y because a restricted range o en tends to reduce the size of the correla on. Also, you cannot extrapolate (extend or generalize your results) beyond the ranges of values of X and Y that are included in your sample. Therefore, it is important to obtain a sample in which the ranges of X and/or Y scores correspond to the range of behaviors in the popula on of interest. If you know ahead of me that either X or Y tend to have extreme outliers, it is important to decide on procedures for iden fica on and handling of outliers prior to data analysis. Because b is essen ally a rescaled version of r, factors that ar factually influence the size of Pearson’s r can also affect the magnitude of b, the regression slope coefficient (see Appendix 10D). 11.19 PLOTTING RESIDUALS
Up un l now, when you have looked for poten al viola ons of assump ons, you have examined histograms of scores for X and Y variables obtained before you conducted analyses. However, for some analyses (such as bivariate regression), we can also look for poten al viola ons of assump ons a er the analysis has been run by plo ng residuals (predic on errors). O en, predic on errors are graphed as a func on of Y (actual score on outcome variable) or Y′ (predicted score on outcome variable). For simple analyses such as bivariate regression, plots of residuals usually reveal the same problems you would already have seen in preliminary data screening. For more complex analyses with mul ple predictor variables, residuals can be a valuable source of informa on about problems with the analysis. Plots of residuals are not generally presented in journal ar cles. Some data analysts and some disciplines view evalua on of residuals as extremely important. Plots of residuals are rarely discussed in behavioral and social science research (although perhaps they should be). Here’s a poten al problem with analysis of residuals. A data analyst can be tempted to “throw out” scores with large predic on errors and then rerun the analysis. That amounts to pruning the data to fit the analysis, reminiscent of the myth of Procrustes, who was said to have cut off people’s legs to make them fit his bed. The Plots bu on in the Linear Regression dialog box can be used to open the dialog box to request diagnos c plots of residuals (Figure 11.18). SPSS creates temporary variables that contain values for the standardized predicted scores from the regression (denoted either *ZPRED or ZPR_1) and the standardized residuals from the regression (denoted either *ZRESID or ZRE_1), as well as other versions of the predicted and residual values for each case. In this example, ZRE_1 was plo ed (as the Y variable) against scores on ZPR_1 (which was designated as the X variable for the residuals plot). Finally, the Save bu on was used to request that SPSS save the values for standardized predicted scores and residuals into the SPSS worksheet; saved scores on these specific variables were requested by placing checkmarks in the checkboxes for these variables in the Linear Regression: Save dialog box, which appears in Figure 11.19. A view of the data file that includes these new saved variables appears in Figure 11.20. The plot of residuals as a func on of predicted scores appears in Figure 11.21. When assump ons for regression are sa sfied, the points in the residuals plot in Figure 11.21 should appear within a fairly uniform band from le to right, with a mean of 0; and most standardized residuals should be between –3 and +3 (typical range for z scores, although z values can be larger in absolute value). Figure 11.22 provides examples of possible pa erns in plots of residuals. Ideally, when residuals (either standardized or unstandardized) are graphed against predicted scores, the graph should look like the one in Figure 11.22a. Figures 11.22b, 11.22c, and 11.22d show what graphs of residuals might look like in the presence of three different kinds of viola ons of assump ons for linear regression. Figure 11.22b shows what the residuals might look like if scores on the Y
Figure 11.18 Command to Request Plot of Standardized Residuals by Standardized Predicted Salaries
Figure 11.19 Dialog Box to Save Standardized Predicted Scores and Standardized Residuals From Regression
Figure 11.20 SPSS Data With Saved Variables: Standardized Predicted Scores (ZPR_1) and Standardized Residuals (ZRE_1) From Regression That Predicts Salary From Years
Figure 11.21 Sca erplot: Standardized Residuals (Y Axis) and Standardized Predicted Salary Score (X Axis)
Figure 11.22 Interpreta ons for Plots of Residuals From Regression Note: (a) Normally distributed residuals, sa sfies assump ons. (b) Curvilinear pa ern in residuals, violates assump ons. (c) Non-normally distributed residuals at each level of zY′, violates assump ons. (d) Residuals with unequal variances across levels of zY′, violates assump ons. The residual plot obtained for the data on salary and years of job experience most closely resembles the graph in Figure 11.22a; assump ons of normality, linearity, and homoscedas city were not violated in the data presented for this empirical example. 11.20 STANDARD ERROR OF THE ESTIMATE Standard error of the es mate, denoted sY.X or SEest, is rarely included in research reports (perhaps because it is o en embarrassingly large). When you see a subscript that includes a dot, such as the subscript Y.X, this tells you that the variable to the le of the dot is predicted from the variable(s) to the right of the dot. SPSS denotes this term as SEest. The SY.X nota on indicates error variance when Y is predicted from specific values of X. The value of the standard error of the es mate depends on the value of r; other factors being equal, SY.X becomes smaller as r increases in absolute value. Its value also depends on the standard devia on of Y, the outcome variable; the maximum possible value for sY.X is the SD for Y. Researchers usually hope that this standard error, SEest, will be rela vely small. Other
(11.19)
Figure 11.23 Schema c Diagram of Ideal Situa on in Bivariate Regression The standard error of the es mate is an indica on of the typical size of residuals or predic on errors. An alterna ve way to calculate SEest shows that it is a func on of the variability of scores on Y, the dependent variable, and the strength of the rela onship between the predictor and outcome variables given by r2: Other
(11.20) Note that if r = 0, SEest takes on its maximum possible value of sY. When r = 0, the best predicted value of Y is MY no ma er what the X score happens to be, and the typical size of a predic on error in this situa on is sY, the standard devia on of the Y scores. If r = +1 or –1, SEest takes on its minimum possible value of 0, and the best predic on of Y is (exactly) Y = b0 + bX. That is, when r = 1 in absolute value, all the actual Y values fall on the regression line (as in Figure 11.1). The size of SEest provides the researcher with a sense of the typical size of predic on errors; when SEest is close to 0, the predic on errors are small. As SEest approaches sY, this tells us that informa on about a par cipant’s X score does not improve the ability to predict the par cipant’s score on Y very much. In the idealized situa on represented in Figure 11.23, there is a normal distribu on of Y scores for each specific value of X; in this example, a Y distribu on for X = 1, 2, and 3. Ideally, these normal distribu ons have the same standard devia on (SEest). When a Y point does not fall on the regression line, its distance from the line indicates the magnitude of predic on error.
The standard devia on or standard error that describes the distance of actual Y values from Y′ is es mated by SEest (assuming that the variance of Y is equal across values of X). In actual data sets, there are usually too few observa ons to assess the normality of the distribu on of Y at each level of X, and to judge whether the variances of Y are uniform across levels of X. 11.21 SUMMARY Assuming that X and Y are linearly related, scores on a quan ta ve Y variable can be predicted from scores on a quan ta ve (or dichotomous) X variable by using a predic on equa on in either a raw-score form,
Other or a standardized, z-score form,
Other The b coefficient can be interpreted in terms of “real” units; for example, for each addi onal hour of exercise (X), how many pounds of weight (Y) can a person expect to lose? The beta coefficient is interpretable as a unit-free or standardized index of the strength of the linear rela onship: For each 1 SD increase in zX, how many standard devia ons do we predict that zY will change? The significance of a regression predic on equa on can be tested by using a t test (to test whether the raw-score slope b differs significantly from 0). (In mul ple regression, where there are several predictor variables, tests of each b slope coefficient provide informa on about the significance of the contribu on of each individual predictor, while the F test of mul ple R assesses the sta s cal significance of predic on by the en re set of all predictors.) Regression can be understood as a par on of scores into components: a part that can be predicted from X and a part that cannot be predicted from X. Squaring and summing these components gives us a par on of SStotal into SSregression + SSresidual. An r2 is equivalent to the ra o SSregression/SStotal. Researchers o en choose to use certain variables as predictors in regression because they believe that these variables may possibly cause or influence the dependent variable. However, finding that an X variable is significantly predic ve of a Y variable in a regression does not prove that X caused Y (unless X and Y represent data from a well-controlled experiment). The most we can say when we find that an X variable is predic ve of Y in correla onal research is that this finding is consistent with theories that X might be a cause of Y. In Volume II (Warner, 2020), mul ple regression is used; this involves equa ons that include mul ple predictor variables, as in this example: Y′ = b0 + b1X1 + b2X2 + ··· + bkXk.