Module/Week 6 PPOL 505 P.E.4
11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS
11: MEDIA LIBRARY
Premium Videos
Core Concepts in Stats Video
· Testing the Difference Between Two Sample Means
Lightboard Lecture Video
Time to Practice Video
Difficulty Scale
(A little longer than the previous chapter but basically the same kind of procedures and very similar questions. Not too hard, but you have to pay attention.)
WHAT YOU WILL LEARN IN THIS CHAPTER
· Using the t test for independent means when appropriate
· Computing the observed t value
· Interpreting the t value and understanding what it means
· Computing the effect size for a t test for independent means
INTRODUCTION TO THE T TEST FOR INDEPENDENT SAMPLES
Even though eating disorders are recognized for their seriousness, little research has been done that compares the prevalence and intensity of symptoms across different cultures. John P. Sjostedt, John F. Schumaker, and S. S. Nathawat undertook this comparison with groups of 297 Australian and 249 Indian university students. Each student was measured on the Eating Attitudes Test and the Goldfarb Fear of Fat Scale. High scores on both measures indicate the presence of an eating disorder. The groups’ scores were compared with one another. On a comparison of means between the Indian and the Australian participants, Indian students scored higher on both of the tests, and this was due mainly to the scores of women. The results for the Eating Attitudes Test were t(544) = −4.19, p < .0001, and the results for the Goldfarb Fear of Fat Scale were t(544) = −7.64, p < .0001.
Now just what does all this mean? Read on.
Why was the t test for independent means used? Sjostedt and his colleagues were interested in finding out whether there was a difference in the average scores of one (or more) variable(s) between the two groups. The t test is called independent because the two groups were not related in any way. Each participant in the study was tested only once. The researchers applied a t test for independent means, arriving at the conclusion that for each of the outcome variables, the differences between the two groups were significant at or beyond the .0001 level. Such a small chance of a Type I error means that there is very little probability that the difference in scores between the two groups was due to chance and not something like group membership, in this case representing nationality, culture, or ethnicity.
Want to know more? Go online or to the library and find …
Sjostedt, J. P., Schumaker, J. F., & Nathawat, S. S. (1998). Eating disorders among Indian and Australian university students. Journal of Social Psychology, 138(3), 351–357.
LIGHTBOARD LECTURE VIDEO
Independent t Tests
THE PATH TO WISDOM AND KNOWLEDGE
Here’s how you can use Figure 11.1, the flowchart introduced in Chapter 9, to select the appropriate test statistic, the t test for independent means. Follow along the highlighted sequence of steps in Figure 11.1.
1. The differences between the groups of Australian and Indian students are being explored.
2. Participants are being tested only once.
3. There are two groups.
4. The appropriate test statistic is the t test for independent samples.
Figure 11.1 ⬢ Determining that a t test is the correct statistic
Almost every statistical test has certain assumptions that underlie the use of the test. For example, the t test makes the major assumption that the amount of variability in each of the two groups is equal. This is the homogeneity of variance assumption. Homogeneity means sameness. Although it’s no big deal if this assumption is violated when the sample size is big enough, with smaller samples, one can’t be too sure of the results and conclusions. Don’t knock yourself out worrying about these assumptions because they are beyond the scope of this book. However, you should know that while such assumptions aren’t usually violated, they might be, and more advanced statistics courses talk about how to deal with that if it happens.
As we mentioned earlier, there are tons of statistical tests. The only inferential one that uses one sample that we cover in this book is the one-sample z test (see Chapter 10). But there is also the one-sample t test, which compares the mean score of a sample with another score, and sometimes that score is, indeed, the population mean, just as with the one-sample z test. In any case, you can use the one-sample z test or one-sample t test to test the same hypothesis, and you will reach the same conclusions (although you will be using different values and tables to do so). We discussed running the one-sample t test a little bit in Chapter 10 when we talked about using SPSS to compare one sample with a population.
COMPUTING THE T TEST STATISTIC
The formula for computing the t value for the t test for independent means is shown in Formula 11.1. The difference between the means makes up the numerator, the top of the equation; the amount of variation within and between each of the two groups makes up the denominator.
(11.1)
t=¯¯¯X1−¯¯¯X2√[(n1−1)s21+(n2−1)s22n1+n2−2][n1+n2n1n2],t=X¯1−X¯2[(n1−1)s12+(n2−1)s22n1+n2−2][n1+n2n1n2],
where
· ¯¯¯X1X¯1 is the mean for Group 1,
· ¯¯¯X2X¯2 is the mean for Group 2,
· n1 is the number of participants in Group 1,
· n2 is the number of participants in Group 2,
· s21s12 is the variance for Group 1, and
· s22s22 is the variance for Group 2.
This is a bigger formula than we’ve seen before, but there’s really nothing new here at all. It’s just a matter of plugging in the correct values.
CORE CONCEPTS IN STATS VIDEO
Testing the Difference Between Two Sample Means
ime for an Example
Here are some data reflecting the number of words remembered following a program designed to help Alzheimer’s patients remember the order of daily tasks. Group 1 was taught using visuals, and Group 2 was taught using visuals and intense verbal rehearsal. We’ll use the data to compute the test statistic in the following example.
|
Group 1 |
Group 2 |
||||
|
7 |
5 |
5 |
5 |
3 |
4 |
|
3 |
4 |
7 |
4 |
2 |
3 |
|
3 |
6 |
1 |
4 |
5 |
2 |
|
2 |
10 |
9 |
5 |
4 |
7 |
|
3 |
10 |
2 |
5 |
4 |
6 |
|
8 |
5 |
5 |
7 |
6 |
2 |
|
8 |
1 |
2 |
8 |
7 |
8 |
|
5 |
1 |
12 |
8 |
7 |
9 |
|
8 |
4 |
15 |
9 |
5 |
7 |
|
5 |
3 |
4 |
8 |
6 |
6 |
Here are the famous eight steps and the computation of the t test statistic.
1. State the null and research hypotheses.
As represented by Formula 11.2, the null hypothesis states that there is no difference between the means for Group 1 and Group 2. For our purposes, the research hypothesis (shown as Formula 11.3) states that there is a difference between the means of the two groups. The research hypothesis is two-tailed and nondirectional because it posits a difference but in no particular direction.
The null hypothesis is
(11.2)
H0:μ1=μ2.H0:μ1=μ2.
The research hypothesis is
(11.3)
H1:¯¯¯X1≠¯¯¯X2.H1:X¯1≠X¯2.
2. Set the level of risk (or the level of significance or chance of Type I error) associated with the null hypothesis.
The level of risk or probability of Type I error or level of significance (any other names?) here is .05, but this is totally the decision of the researcher.
3. Select the appropriate test statistic.
Using the flowchart shown in Figure 11.1, we determined that the appropriate test is a t test for independent means, because the groups are independent of one another.
4. Compute the test statistic value (called the obtained value).
Now’s your chance to plug in values and do some computation. The formula for the t value was shown in Formula 11.1. When the specific values are plugged in, we get the equation shown in Formula 11.4. (We already computed the mean and the standard deviation.)
(11.4)
t=5.43−5.53√[(30−1)3.422+(30−1)2.06230+30−2][30+3030×30].t=5.43−5.53[(30−1)3.422+(30−1)2.06230+30−2][30+3030×30].
With the numbers plugged in, Formula 11.5 shows how we get the final value of −0.137. The value is negative because a larger value (the mean of Group 2, which is 5.53) is being subtracted from a smaller number (the mean of Group 1, which is 5.43). Remember, though, that because the test is nondirectional—the research hypothesis is that any difference exists—the sign of the difference is meaningless.
(11.5)
t=−0.1√(339.20+123.0658)(60900)=−0.137.t=−0.1(339.20+123.0658)(60900)=−0.137.
When a nondirectional test is discussed, you may find that the t value is represented as an absolute value looking like this, |t| or t = |0.137|, which ignores the sign of the value altogether. Your teacher may even express the t value as such to emphasize that the sign is relevant for a one-directional test but not for a nondirectional one.
· 5. Determine the value needed for rejection of the null hypothesis using the appropriate table of critical values for the particular statistic.
Here’s where we go to Table B.2 in Appendix B, which lists the critical values for the t test.
We can use this distribution to see whether two independent means differ from one another by comparing what we would expect by chance if there was no difference in the population (the tabled or critical value) with what we observe (the obtained value).
Our first task is to determine the degrees of freedom (df ), which approximates the sample size (but, for fancy technical reasons, adjusts it slightly to make for a more accurate outcome). For this particular test statistic, the degrees of freedom is n1 − 1 + n2 − 1 or n1 + n2 − 2 (putting the terms in either order results in the same value). So for each group, add the size of the two samples and subtract 2. In this example, 30 + 30 − 2 = 58. This is the degrees of freedom for this particular application of the t test but not necessarily for any other.
The idea of degrees of freedom means pretty much the same thing no matter what statistical test you use. But the way that the degrees of freedom is computed for specific tests can differ from teacher to teacher and from book to book. We tell you that the correct degrees of freedom for the preceding test is computed as (n1 − 1) + (n2 − 1). However, some teachers believe that you should use the smaller of the two ns (a more conservative alternative you may want to consider).
Using this number (58), the level of risk you are willing to take (earlier defined as .05), and a two-tailed test (because there is no direction to the research hypothesis), you can use the t test table to look up the critical value. At the .05 level, with 58 degrees of freedom for a two-tailed test, the value needed for rejection of the null hypothesis is … oops! There’s no 58 degrees of freedom in the table! What do you do? Well, if you select the value that corresponds to 55, you’re being conservative in that you are using a value for a sample smaller than what you have (and the critical t value you need to reject the null hypothesis will be larger).
If you go for 60 degrees of freedom (the closest to your value of 58), you will be closer to the size of the population, but you’ll be a bit liberal in that 60 is larger than 58. Although statisticians differ in their viewpoint as to what to do in this situation, let’s always go with the value that’s closer to the actual sample size. So the value needed to reject the null hypothesis with 58 degrees of freedom at the .05 level of significance is 2.001.
· 6. Compare the obtained value and the critical value.
The obtained value is −0.14 (−0.137 rounded to the nearest hundredth), and the critical value for rejection of the null hypothesis that Group 1 and Group 2 performed differently is 2.001. The critical value of 2.001 represents the largest value at which chance is the most attractive explanation for any of the observed sample differences between the two groups, given 30 participants in each group and the willingness to take a .05 level of risk.
· 7. and 8. Decision time!
Now comes our decision. If the obtained value is more extreme than the critical value (remember Figure 9.2), the null hypothesis should not be accepted. If the obtained value does not exceed the critical value, the null hypothesis is the most attractive explanation. In this case, the obtained value (−0.14) does not exceed the critical value (2.001)—it is not extreme enough for us to say that the difference between Groups 1 and 2 occurred due to anything other than chance. If the value were greater than 2.001, that would be just like getting 8, 9, or 10 heads in a coin toss—too extreme a result for us to believe that mere chance is at work. In the case of the coin, the cause would be an unfair coin; in this example, it would be that there is a better way to teach memory skills to these older people.
So, to what can we attribute the small difference between the two groups? If we stick with our current argument, then we could say the difference is due to something like sampling error or simple variability in participants’ scores. Most important, we’re pretty sure (but, of course, not 100% sure—that’s what level of significance and Type I errors are all about, right?) that the difference is not due to anything in particular that one group or the other experienced to make its scores better.
So How Do I Interpret t(58) = −0.14, p > .05?
· t represents the test statistic that was used.
· 58 is the number of degrees of freedom.
· −0.14 is the obtained value, calculated using the formula we showed you earlier in the chapter.
· p > .05 (the really important part of this little phrase) indicates that the probability is greater than 5% that on any one test of the null hypothesis, the two groups do not differ because of the way they were taught. Note that p > .05 can also appear as p = ns for nonsignificant.
THE EFFECT SIZE AND T(EA) FOR TWO
You learned in Chapters 5 and 10 that effect size is a measure of how strongly variables relate to one another—with group comparisons, it’s a measure of the magnitude of the difference. Kind of like, How big is big?
And what’s especially interesting about computing effect size is that sample size is not taken into account. Calculating effect size, and making a judgment about it, adds a whole new dimension to understanding significant outcomes.
Let’s take the following example. A researcher tests whether participation in community-sponsored services (such as card games, field trips, etc.) increases the quality of life (as rated from 1 to 10) for older Americans. The researcher implements the treatment over a 6-month period and then, at the end of the treatment period, measures quality of life in the two groups (each consisting of 50 participants over the age of 80 where one group got the services and one group did not). Here are the results.
|
|
No Community Services |
Community Services |
|
Mean |
6.90 |
7.46 |
|
Standard deviation |
1.03 |
1.53 |
And the verdict is that the difference is significant at the .034 level (which is p < .05, right?). So there’s a significant difference, but what about the magnitude of the difference?
Computing and Understanding the Effect Size
As we showed you in Chapter 10, the most direct and simple way to compute effect size is to simply divide the difference between the means by any one of the standard deviations. Danger, Will Robinson—this does assume that the standard deviations (and the amount of variance) between groups are equal to one another.
For our example, we’ll do this:
(11.6)
ES=¯¯¯X1−¯¯¯X2s,ES=X¯1−X¯2s,
where
· ES is the effect size,
· ¯¯¯X1X¯1 is the mean for Group 1,
· ¯¯¯X2X¯2 is the mean for Group 2, and
· s is the standard deviation from either group.
So, in our example,
(11.7)
ES=7.46−6.901.53=.366.ES=7.46−6.901.53=.366.
So, the effect size for this example is .37.
You saw from our guidelines in Chapter 10 (p. 196) that an effect size of .37 is categorized as medium. In addition to the difference between the two means being statistically significant, one might conclude that the difference also is meaningful in that the effect size is not negligible. Now, how meaningful you wish to make it in your interpretation of the results depends on many factors, including the context within which the research question is being asked.
So, you really want to be cool about this effect size thing. You can do it the simple way, as we just showed you (by subtracting means from one another and dividing by either standard deviation), or you can really wow that interesting classmate who sits next to you. The grown-up formula for the effect size uses the “pooled standard deviation” in the denominator of the ES equation that you saw previously. The pooled standard deviation is the average of the standard deviation from Group 1 and the standard deviation from Group 2. Here’s the formula:
(11.8)
ES=¯¯¯X1−¯¯¯X2√σ21+σ222,ES=X¯1−X¯2σ12+σ222,
where
· ES is the effect size,
· ¯¯¯X1X¯1 is the mean of Group 1,
· ¯¯¯X2X¯2 is the mean of Group 2,
· σ21σ12 is the variance of Group 1, and
· σ22σ22 is the variance of Group 2.
If we applied this formula to the same numbers we showed you previously, you’d get a whopping effect size of .43—not very different from .37, which we got using the more direct method shown earlier (and still in the same category of medium size). But this is a more precise method and one that is well worth knowing about.
Two Very Cool Effect Size Calculators
Why not take the A train and just go right to http://www.uccs.edu/~lbecker/, where statistician Lee Becker from the University of California developed an effect size calculator? Or ditto for the one located at http://www.psychometrica.de/effect_size.html created by Drs. Wolfgang and Alexandra Lenhard? With these calculators, you just plug in the values, click Compute, and the program does the rest, as you see in Figure 11.2.
USING SPSS TO PERFORM A T TEST
SPSS is willing and ready to help you perform these inferential tests. Here’s how to perform the one that we just discussed and interpret the output. We are using the
data set named Chapter 11 Data Set 1. From your examination of the data, you can see how the grouping variable (Group 1 or Group 2) is in column 1 and the test variable (Memory) is in column 2.
Figure 11.2 ⬢ The very cool effect size calculator
1. Enter the data in the Data Editor or download the file. Be sure that there is one column for each group and that you have no more than two groups represented in that column.
2. Click Analyze → Compare Means → Independent-Samples T Test, and you will see the Independent-Samples T Test dialog box shown in Figure 11.3.
3. Click on the variable named Group and drag it to move it to the Grouping Variable(s) box.
4. Click on the variable named Memory_Test and drag it to place it in the Test Variable(s) box.
5. SPSS will not allow you to continue until you define the grouping variable. This basically means telling SPSS how many levels of the group variable there are (wouldn’t you think that a program this smart could figure that out?) and what to name them. In any case, click Define Groups and enter the values 1 for Group 1 and 2 for Group 2, as shown in Figure 11.4. The name of the grouping variable (in this case, Group) has to be highlighted before you can define it.
6. Click Continue and then click OK, and SPSS will conduct the analysis and produce the output you see in Figure 11.5.
Notice how SPSS uses a capital T to represent this test while we have been using a small t? That’s because it is in a title or heading, like we have been doing sometimes. What’s important for you to know is that there is a difference in upper- or lowercase only—it’s the same exact test.
Figure 11.3 ⬢ The Independent-Samples T Test dialog box
Figure 11.4 ⬢ The Define Groups dialog box
Understanding the SPSS Output
There’s a ton of SPSS output from this analysis, and for our purposes, we’ll deal only with selected output shown in Figure 11.5. There are three things to note:
1. The obtained t value is −0.137, exactly what we got when we computed the value by hand earlier in this chapter (−0.14, rounded from 0.1368).
2. The number of degrees of freedom is 58 (which you already know is computed using the formula n1 + n2 − 2).
3. The significance of this finding is .891, or p = .89, which means that on one test of this null hypothesis, the likelihood of rejecting the hypothesis when it is true is pretty high (89 out of 100)! So the Type I error rate is certainly greater than .05, the same conclusion we reached earlier when we went through this process manually. No difference!
Figure 11.5 ⬢ Copy of SPSS output for a t test between independent means
Real-World Stats
There’s nothing like being prepared, so say Boy and Girl Scouts. But what’s the best way to teach preparedness? In this study by Serkan Celik from Kirikkale University in Turkey, online versus face-to-face first-aid instruction courses were compared with one another to test the effectiveness of the modes of delivery. Interestingly, with the same instructor teaching in both modes, the online form resulted in higher achievement scores at the end of the course. What did Celik do for his analysis? Independent t tests, of course. And in fact, he used SPSS to produce the resulting t values showing that there was a difference between the two teaching approaches.
Want to know more? Go online or to the library and find …
Celik, S. (2013). A media comparison study on first aid instruction. Health Education Journal, 72, 95−101.
Summary
The independent t test is your first introduction to performing a real statistical test between two groups and trying to understand this whole matter of significance from an applied point of view. Be sure that you understand what is in this chapter before you move on. And be sure you can do by hand the few calculations that were asked for. Next, we move on to using another form of the same test, only this time, two sets of scores are taken from one group of participants rather than one set of scores taken from two separate groups.
Time to Practice
1. Using the data in the file named Chapter 11 Data Set 2, test the research hypothesis at the .05 level of significance that boys raise their hand in class more often than girls. Do this practice problem by hand, using a calculator. What is your conclusion regarding the research hypothesis? Remember to first decide whether this is a one- or two-tailed test.
2. Using the same data set (Chapter 11 Data Set 2), test the research hypothesis at the .01 level of significance that there is a difference between boys and girls in the number of times they raise their hand in class. Do this practice problem by hand, using a calculator. What is your conclusion regarding the research hypothesis? You used the same data for this problem as for Question 1, but you have a different hypothesis (one is directional and the other is nondirectional). How do the results differ and why?
3. Time for some tedious, by-hand practice just to see if you can get the numbers right. Using the following information, calculate the t test statistics by hand.
a. X1 = 62 X2 = 60 n1 = 10 n2 = 10 s1 = 2.45 s2 = 3.16
b. X1 = 158 X2 = 157.4 n1 = 22 n2 = 26 s1 = 2.06 s2 = 2.59
c. X1 = 200 X2 = 198 n1 = 17 n2 = 17 s1 = 2.45 s2 = 2.35
4. Using the results you got from Question 3 and a level of significance of .05, what are the two-tailed critical values associated with each result? Would the null hypothesis be rejected?
5. Use the following data and SPSS or some other computer application such as Excel or Google Sheets and write a brief paragraph about whether the in-home counseling is equally effective as the out-of-home treatment for two separate groups. Here are the data. The outcome variable is level of anxiety after treatment on a scale from 1 to 10.
|
In-Home Treatment |
Out-of-Home Treatment |
|
3 |
7 |
|
4 |
6 |
|
1 |
7 |
|
1 |
8 |
|
1 |
7 |
|
3 |
6 |
|
3 |
5 |
|
6 |
6 |
|
5 |
4 |
|
1 |
2 |
|
4 |
5 |
|
5 |
4 |
|
4 |
3 |
|
4 |
6 |
|
3 |
7 |
|
6 |
5 |
|
7 |
4 |
|
7 |
3 |
|
7 |
8 |
|
8 |
7 |
6. Time to Practice Video
7. Chapter 11: Problem 5
8. Using the data in the file named Chapter 11 Data Set 3, test the null hypothesis that urban and rural residents both have the same attitude toward gun control. Use SPSS to complete the analysis for this problem.
9. Here’s a good one to think about. A public health researcher (Dr. L) tested the hypothesis that providing new car buyers with child safety seats will also act as an incentive for parents to take other measures to protect their children (such as driving more safely, childproofing the home, etc.). Dr. L counted all the occurrences of safe behaviors in the cars and homes of the parents who accepted the seats versus those who did not. The findings? A significant difference at the .013 level. Another researcher (Dr. R) did exactly the same study, and for our purposes, let’s assume that everything was the same—same type of sample, same outcome measures, same car seats, and so on. Dr. R’s results were marginally significant (remember that from Chapter 9?) at the .051 level. Whose results do you trust more and why?
10. Here are the results of three experiments where the means of the two groups being compared are exactly the same but the standard deviation is quite different from experiment to experiment. Compute the effect size using Formula 11.6 and then discuss why this size changes as a function of the change in variability.
|
Experiment 1 |
Group 1 mean |
78.6 |
Effect size __________ |
|
|
Group 2 mean |
73.4 |
|
|
|
Standard deviation |
2.0 |
|
|
Experiment 2 |
Group 1 mean |
78.6 |
Effect size __________ |
|
|
Group 2 mean |
73.4 |
|
|
|
Standard deviation |
4.0 |
|
|
Experiment 3 |
Group 1 mean |
78.6 |
Effect size __________ |
|
|
Group 2 mean |
73.4 |
|
|
|
Standard deviation |
8.0 |
|
11. Using the data in Chapter 11 Data Set 4 and SPSS, test the null hypothesis that there is no difference in group means between the number of words spelled correctly for two groups of fourth graders. What is your conclusion?
12. For this question, you have to do two analyses. Using Chapter 11 Data Set 5, compute the t score for the difference between two groups on the test variable (named score). Then, use Chapter 11 Data Set 6 and do the same. Notice that there is a difference between the two t values, even though the mean scores for each group are the same. What is the source of this difference, and why—with the same sample size—did the t value differ?
13. Here’s an interesting (and hypothetical) situation. The average score for one group on an aptitude test is 89.5 and the average score for the second group is 89.2. Both groups have sample sizes of about 1,500 and the differences between the means are significant, but the effect size is very small (let’s say .1). What do you make of the fact that there can be a statistically significant difference between the two groups without there being a meaningful effect size?
Student Study Site
Get the tools you need to sharpen your study skills! Visit edge.sagepub.com/salkindfrey7e to access practice quizzes, eFlashcards, original and curated videos, data sets, and more!