Module/Week 7 PPOL 505 P.E.5

Mz Genuine
SalkindCh15.docx

15 TESTING RELATIONSHIPS USING THE CORRELATION COEFFICIENT COUSINS OR JUST GOOD FRIENDS?

15: MEDIA LIBRARY

Premium Videos

Core Concepts in Stats Video

· Hypothesis Test for Correlation

Lightboard Lecture Video

· Interpreting Correlations

Time to Practice Video

· Chapter 15: Problem 8

Difficulty Scale

(easy—you don’t even have to figure anything out!)

WHAT YOU WILL LEARN IN THIS CHAPTER

· Testing the significance of the correlation coefficient

· Interpreting the correlation coefficient

· Distinguishing between significance and meaningfulness (again!)

· Using SPSS to analyze correlational data and how to understand the results of the analysis

INTRODUCTION TO TESTING THE CORRELATION COEFFICIENT

In his research article on the relationship between the quality of a marriage and the quality of the relationship between the parent and the child, Daniel Shek told us that there are at least two possibilities. First, a poor marriage might enhance parent–child relationships. This is because parents who are dissatisfied with their marriage might substitute their relationship with their children for the emotional gratification lacking in their marriage. Or, according to the spillover hypothesis, a poor marriage might damage the parent–child relationship by setting the stage for increased difficulty in parenting children.

Shek examined the link between marital quality and parent–child relationships in 378 Chinese married couples over a 2-year period. He found that higher levels of marital quality were related to higher levels of quality of parent–child relationships; this was found for concurrent measures (at the present time) as well as longitudinal measures (over time). He also found that the strength of the relationship between parents and children was the same for both mothers and fathers. This is an obvious example of how the use of the correlation coefficient gives us the information we need about whether sets of variables are related to one another. Shek computed a whole bunch of different correlations across mothers and fathers as well at Time 1 and Time 2 but all with the same purpose: to see if there was a significant correlation between the variables. Remember that this does not say anything about the causal nature of the relationship, only that the variables are associated with one another.

Want to know more? Go online or to the library and find …

Shek, D. T. L. (1998). Linkage between marital quality and parent-child relationship: A longitudinal study in the Chinese culture. Journal of Family Issues, 19, 687–704.

LIGHTBOARD LECTURE VIDEO

Interpreting Correlations

THE PATH TO WISDOM AND KNOWLEDGE

Here’s how you can use the flowchart to select the appropriate test statistic, the test for the correlation coefficient. Follow along the highlighted sequence of steps in Figure 15.1.

1. The relationship between variables, not the difference between groups, is being examined.

2. Only two variables are being used.

3. The appropriate test statistic to use is the t test for the correlation coefficient.

Quality of Marriage

Quality of the Parent–Child Relationship

76

43

81

33

78

23

76

34

76

31

78

51

76

56

78

43

98

44

88

45

76

32

66

33

44

28

67

39

65

31

59

38

87

21

77

27

79

43

85

46

68

41

76

41

77

48

98

56

98

56

99

55

98

45

87

68

67

54

78

33

COMPUTING THE TEST STATISTIC

Here’s something you’ll probably be pleased to read: The correlation coefficient can act as its own test statistic. This makes things much easier because you don’t have to compute any test statistics, and examining the significance is very easy indeed.

Let’s use, as an example, the following data that examine the relationship between two variables, the quality of marriage and the quality of parent–child relationships. Let’s pretend these hypothetical (made-up) scores can range from 0 to 100, and the higher the score, the happier the marriage and the better the parenting.

You can use Formula 5.1 from Chapter 5 to compute the Pearson correlation coefficient. When you do, you will find that r = .437 or .44 when we round to two places past the decimal. Now let’s go through the steps of actually testing the value for significance and deciding what the value means.

If you need a quick review of the basics of correlations, take a look back at Chapter 5.

Here are the famous eight steps and the computation of the t test statistic:

· 1. State the null and research hypotheses.

The null hypothesis states that there is no relationship between the quality of the marriage and the quality of the relationship between parents and children. (Technically, it hypothesizes that the correlation coefficient is 0.0.) The research hypothesis is a two-tailed, nondirectional research hypothesis because it posits that a relationship exists between the two variables but not the direction of that relationship. Remember that correlations can be positive (or direct) or negative (or indirect), and the most important characteristic of a correlation coefficient is its absolute value or size and not its sign (positive or negative).

The null hypothesis is shown in Formula 15.1:

(15.1)

H0:ρxy=0.H0:ρxy=0.

The Greek letter ρ, or rho, represents the population estimate of the correlation coefficient.

The research hypothesis (shown in Formula 15.2) states that there is a relationship between the two values and that the relationship differs from a value of zero:

(15.2)

H1:rxy≠0.H1:rxy≠0.

· 2. Set the level of risk (or the level of significance or Type I error) associated with the null hypothesis.

The level of risk or probability of a Type I error or level of significance is .05.

· 3. and 4. Select the appropriate test statistic.

Using the flowchart shown in Figure 15.1, we determined that the appropriate test is the correlation coefficient. In this instance, we do not need to compute a test statistic because the sample r value (rxy = .44) is, for our purposes, the test statistic.

· 5. Determine the value needed for rejection of the null hypothesis using the appropriate table of critical values for the particular statistic.

Table B.4 lists the critical values for the correlation coefficient.

Our first task is to determine the degrees of freedom (df), which approximates the sample size. For this particular test statistic, the degrees of freedom is n − 2, or 30 − 2 = 28, where n is equal to the number of pairs used to compute the correlation coefficient. This is the degrees of freedom only for this particular analysis and not for those with different sample sizes.

Using this number (28), the level of risk you are willing to take (.05), and a two-tailed test (because there is no direction to the research hypothesis), the critical value is .381 (using df = 25 because 28 isn’t shown in the table). So, at the .05 level, with 28 degrees of freedom for a two-tailed test, the value needed for rejection of the null hypothesis is .381.

· 6. Compare the obtained value and the critical value.

The obtained value is .44, and the critical value for rejection of the null hypothesis that the two variables are not related is .381.

· 7. and 8. Make a decision.

Now comes our decision. If the obtained value (or the value of the test statistic) is more extreme than the critical value (or the tabled value), the null hypothesis cannot be accepted. If the obtained value does not exceed the critical value, the null hypothesis is the most attractive explanation.

In this case, the obtained value (.44) does exceed the critical value (.381)—it is extreme enough for us to say that the relationship between the two variables (quality of marriage and quality of parent–child relationships) did occur in our sample due to something other than chance. It’s likely that the population’s correlation, whatever it is, is greater than 0.0.

One tail or two? It’s pretty easy to conceptualize what a one-tailed versus a two-tailed test is when it comes to differences between means. And it may even be easy for you to understand a two-tailed test of the correlation coefficient (where any difference from zero is what’s tested). But what about a one-tailed test? It’s really just as easy. A directional test of the research hypothesis that there is a relationship posits that relationship as being either direct (positive) or indirect (negative). So, for example, if you think that there is a positive correlation between two variables, then the test is one-tailed. Similarly, if you hypothesize that there is a negative correlation between two variables, the test is one-tailed as well. It’s only when you don’t predict the direction of the relationship that the test is two-tailed. Got it?

Okay, we cheated a little. Actually, you can compute a t value (just like for the test for the difference between means) for the significance of the correlation coefficient. The formula is not any more difficult than any other you have dealt with up to now, but you won’t see it here. The point is that some smart statisticians have computed the critical r value for different sample sizes (and, likewise, degrees of freedom) for one- and two-tailed tests at different levels of risk (.01, .05), as you see in Table B.4. But if you are reading along in an academic journal and see that a correlation was tested using a t value, you’ll now know why.

CORE CONCEPTS IN STATS VIDEO

Hypothesis Test for Correlation

So How Do I Interpret r(28) = .44, p < .05?

· r represents the test statistic that was used.

· 28 is the number of degrees of freedom.

· .44 is the obtained value using the formula we showed you in Chapter 5.

· p < .05 (the really important part of this little phrase) indicates that the probability is less than 5% on any one test of the null hypothesis that the relationship between the two variables is due to chance alone. Because we defined .05 as our criterion for the research hypothesis to be more attractive than the null hypothesis, our conclusion is that a significant relationship exists between the two variables. This means that, in our data set, as the level of marital quality increases, so does the level of quality of the parent–child relationship. Similarly, as the level of marital quality decreases, so does the level of quality of the parent–child relationship.

Causes and Associations (Again!)

You’re probably thinking that you’ve heard enough of this already, but this is so important that we really can’t emphasize it enough. So, we’ll emphasize it again. Just because two variables are related to one another (as in the previous example), that has no bearing on whether one causes the other. In other words, having a terrific marriage of the highest quality in no way ensures that the parent–child relationship will be of a high quality as well. These two variables may be correlated because some of the same traits that might make a person a good spouse also make that person a good parent (patience, understanding, willingness to sacrifice), but it’s certainly possible to see how someone could be a good husband or wife and have a terrible relationship with his or her children.

Effect sizes are indices that indicate the strength of relationships among variables. We’ve learned about Cohen’s d, the effect size for t tests; eta squared, the effect size for one-way analysis of variance; and so on. The correlation coefficient is, itself, an effect size! In fact, it’s the simplest of effect sizes as it provides a direct measure of the relationship between two variables, right? It’s interpreted this way: .2 = small, .5 = medium, and .8 = large. Remember, too, that another way to use a correlation as an effect size is to square it and interpret that coefficient of determination as the proportion of the variance in one variable that overlaps with another.

Remember the crimes and ice cream example from Chapter 5? As ice crime consumption goes up in a city, so does the crime rate, but no one thinks that eating ice cream actually affects crime. It’s the same here. Just because things are related and share something in common with one another has no bearing on whether there is a causal relationship between the two.

Correlation coefficients are used for lots of different purposes, and you’re likely to read journal articles in which they’ve been used to estimate the reliability of a test. But you already know all this because you read Chapter 6 and mastered it! In Chapter 6, you may remember that we talked about such reliability coefficients as test–retest (the correlation of scores from the same test at two points in time), parallel forms (the correlation between scores on different forms of the same test), and internal consistency (the correlation between different parts of a test). Reliability is basically the correlation of a test with itself. Correlation coefficients also are the standard units used in more advanced statistical techniques, such as those we discuss in Chapter 18.

Significance Versus Meaningfulness (Again, Again!)

In Chapter 5, we reviewed the importance of the use of the coefficient of determination for understanding the meaningfulness of the correlation coefficient. You may remember that you square the correlation coefficient to determine the amount of variance in one variable that is accounted for by another variable. In Chapter 9, we also went over the general issue of significance versus meaningfulness.

But we should mention and discuss this topic again. Even if a correlation coefficient is significant (as was the case in the example in this chapter), it does not mean that the amount of variance accounted for is meaningful. For example, in this case, the coefficient of determination for a simple Pearson correlation value of .437 is equal to .190, indicating that 19% of the variance is accounted for and a whopping 81% of the variance is not. This leaves lots of room for doubt, doesn’t it?

So, even though we know that there is a positive relationship between the quality of a marriage and the quality of a parent–child relationship and they tend to “go together,” the relatively small correlation of .437 indicates that lots of other things are going on in that relationship that may be important as well. So, if ever you wanted to adapt a popular saying to statistics, here goes: “What you see is not always what you get.”

USING SPSS TO COMPUTE A CORRELATION COEFFICIENT (AGAIN)

Here, we are using Chapter 15 Data Set 1, which has two measures of quality—one of marriage (time spent together in one of three categories, with the value of 1 indicating a lower quality of marriage than a value of 3) and one of parent–child relationships (strength of affection, with the higher score being more affectionate).

1. Enter the data in the Data Editor (or just open the file). Be sure you have two columns, each for a different variable. In Figure 15.2, the columns were labeled Qual_Marriage (for Quality of Marriage) and Qual_PC (Quality of Parent–Child Relationship).

Figure 15.2 ⬢ Chapter 15 Data Set 1 calculator

2. Click Analyze → Correlate → Bivariate, and you will see the Bivariate Correlations dialog box, as shown in Figure 15.3.

Figure 15.3 ⬢ Bivariate Correlations dialog box

3. Double-click on the variable named Qual_Marriage to move it to the Variables: box, and then double-click on the variable named Qual_PC to move it to the Variables: box.

4. Click Pearson and click Two-tailed.

5. Click OK. The SPSS output is shown in Figure 15.4.

Figure 15.4 ⬢ SPSS output for testing the significance of the correlation coefficient

Understanding the SPSS Output

This SPSS output is simple and straightforward.

The correlation between the two variables of interest is .081, which is significant at the .637 level. But to be (much) more precise, the probability of committing a Type I error is .637. That means that the likelihood of rejecting the null when true (that the two variables are not related) is about 63.7%—spookily high and very bad odds! Seems like quality of parenting and quality of marriage, at least for this set of data, are unrelated.

Real-World Stats

With all the attention given to social media (in the social media, of course), it’s interesting to look at how researchers evaluate “social” well-being in college students who study and live outside the United States. This study investigated the relationship between the five personality domains and social well-being among 236 students enrolled at the University of Tehran. Correlations showed that some of the relationship between personality factors and dimensions of social well-being, such as neuroticism, was negatively related to social acceptance, social contribution, and social coherence. Conscientiousness was positively related to social contribution. Openness was positively related to social contribution, and social coherence and agreeableness were related to social acceptance and social contribution. The researchers discussed these findings in light of cultural differences between countries and the general relationship between personality traits and well-being. Interesting stuff!

Want to know more? Go online or to the library and find …

Joshanloo, M., Rastegar, P., & Bakhshi, A. (2012). The Big Five personality domains as predictors of social wellbeing in Iranian university students. Journal of Social and Personal Relationships, 29, 639–660.

Summary

Correlations are powerful tools that point out the direction and size of a relationship and help us to understand what two different outcomes share with one another. Remember that correlations tell us about associations but not about whether one variable affects another.

Time to Practice

1. Given the following information, use Table B.4 in Appendix B to determine whether the correlations are significant and how you would interpret the results.

a. The correlation between speed and strength for 20 women is .567. Test these results at the .01 level using a one-tailed test.

b. The correlation between the number correct on a math test and the time it takes to complete the test is –.45. Test whether this correlation is significant for 80 children at the .05 level of significance. Choose either a one- or a two-tailed test and justify your choice.

c. The correlation between number of friends and grade point average (GPA) for 50 adolescents is .37. Is this significant at the .05 level for a two-tailed test?

2. Use the data in Chapter 15 Data Set 2 to answer the questions below. Do the analysis manually or by using SPSS.

a. Compute the correlation between motivation and GPA.

b. Test for the significance of the correlation coefficient at the .05 level using a two-tailed test.

c. True or false? The more motivated you are, the more you will study. Which answer did you select, and why?

3. Use the data in Chapter 15 Data Set 3 to answer the questions below. Do the calculations manually or use SPSS.

a. Compute the correlation between income and level of education.

b. Test for the significance of the correlation.

c. What argument can you make to support the conclusion that “lower levels of education cause low income”?

4. Use the data in Chapter 15 Data Set 4 to answer the following questions. Compute the correlation coefficient manually.

a. Test for the significance of the correlation coefficient at the .05 level using a two-tailed test between hours of studying and grade.

b. Interpret this correlation. What do you conclude about the relationship between the number of hours spent studying and the grade received on a test?

c. How much variance is shared between the two variables?

d. How do you interpret the results?

5. A study was completed that examined the relationship between coffee consumption and level of stress for a group of 50 undergraduates. The correlation was .373, and it was a two-tailed test done at the .01 level of significance. First, is the correlation significant? Second, what’s wrong with the following statement: “As a result of the data collected in this study and our rigorous analyses, we have concluded that if you drink less coffee, you will experience less stress”?

6. Use the data below to answer the questions that follow. Do these calculations manually.

Age in Months

Number of Words Known

12

6

15

8

9

4

7

5

18

14

24

20

15

7

16

6

21

18

15

17

a. Compute the correlation between age in months and number of words known.

b. Test for the significance of the correlation at the .05 level of significance.

c. Go way back and recall what you learned in Chapter 5 about correlation coefficients and interpret this correlation.

7. Discuss the general idea that just because two things are correlated, one does not necessarily cause the other. Provide an example (other than ice cream and crime!).

8. Chapter 15 Data Set 5 contains four variables:

. Age (in years)

. Shoe_Size (small, medium, and large as 1, 2, and 3)

. Intelligence (as measured by a standardized test)

. Level_of_Education (in years)

Which variables are significantly correlated, and more important, which correlations are meaningful?

Time to Practice Video

Chapter 15: Problem 8

Chapter 15 problem 8 will require you to open up Chapter 15 data set 5. Once you open it, you'll see that there are four variable-- age measured in years, shoe size, intelligence, and level of education. Each of these can be considered continuous. Age is clearly continuous, but shoe size has the four levels-- or the three levels, 1, 2, and 3-- intelligence is measured on a standardized test with a score, and level of education in years. The question is, which of these variables is significantly correlated? You might imagine as you get older, your shoe size tends to increase until you reach your full-grown adulthood. But does shoe size correlate with intelligence? Does level of education correlate with your shoe size? Let's look at our data set, and you'll see that right now all of them are-- the level of measurement is indicated as nominal. Well, that's not correct. So we want to go to variable view and change them. Age, we know, is continuous. Shoe size could be considered ordinal, but let's call it a scale for now, because it increases up. Intelligence measured with a test is going to be a scale, and level of education is also continuous and measured on a scale. Look back here and make sure that all of them have changed. What we want to do now is, under Analyze, go to Correlate, and we're going to do Bivariate. Now, bivariate means two, so it's going to correlate them by pairs. So when we click that, let's put all of them over here. And we also want to think about under Options. Did we want to report our means and standard deviations? In this case, we did not need to. And under Style, was there anything that we wanted to do with formatting? And we don't need to at this point. We do want Pearson's clicked. Make sure it's a two tailed test. And then we want to flag the significant correlations. So when we hit OK, here is our correlation table. And this is repeated information. So I just want to point out that age and age is a 1. It's a perfect correlation. And on the diagonal, you will see all the 1s. The information above the diagonal is the exact same as below the diagonal. So let's look at the line below the diagonal. It's more traditionally what we do to see if things were correlated. Is age correlated with shoe size? It is-- a significant correlation and a really strong correlation of Remember, correlation goes from a negative 1 to a positive 1, so this is an incredibly strong correlation. Is age correlated with intelligence? No, it does not appear to be. This is a so that is not a significant correlation. Is age correlated with level of education? Significance is smaller. Yes. The older you are, the higher your level of education-- again, until you balance out. And then we can look at, is shoe size correlated with intelligence? Again, we really wouldn't have expected that. Is shoe size correlated with level of education? Yes. But that's probably not because of their shoe size, but because of the age. So these are the correlation matrix. And we want to know-- the second question is, what is the amount of variance that is explained? In this case, we take the correlation of a squared gives us the effect size for a correlation. So all you really need to do is take and it tells you that the effect size is a In other words, that means that shoe size and age-- the effect size is about that 88% of the variance between these two variables is accounted for in this situation. And that's how you answer this problem and do correlation with multiple variables.