In this chapter, we discuss the common types of statistical analyses used with simple two-group designs. The inferential statistics discussed in this chapter differ from those presented in Chapter 5 in that in Chapter 5, single samples were being compared to populations (z test and t test). In this section, the statistics are designed to test differences between two equivalent groups of participants.

Several factors influence which statistic should be used to analyze the data collected. For example, the type of data collected and the number of groups being compared must be considered. Moreover, the statistic used to analyze the data will vary depending on whether the study involves a between-subjects design, in which different subjects are used in each of the groups, or a correlated-groups design, in which the subjects in the experimental and control groups are related in some way. (Correlated-groups designs are of two types: within-subjects designs, in which the same subjects are used repeatedly in each group, and matched-subjects designs, in which different subjects are matched between conditions on variables that the researcher believes are relevant to the study.) We will look at the typical inferential statistics used to analyze interval-ratio data for two-group between-subjects designs and correlated-groups designs.

MODULE 11

The t Test for Independent Groups (Samples)

Learning Objectives

•Explain when the t test for independent groups should be used.

•Calculate an independent-groups t test.

•Interpret an independent-groups t test.

•Calculate and interpret Cohen's d and r2.

•Explain the assumptions of the independent-groups t test.

•Calculate confidence intervals.

In the two-group design, two samples (representing two populations) are compared by having one group receive nothing (the control group) and the second group receive some level of the manipulated variable (the experimental group). It is also possible to have two experimental groups and no control group. In this case, members of each group receive a different level of the manipulated variable. The null hypothesis tested in a two-group design is that the populations represented by the two groups do not differ:

H0:μ1=μ2H0:μ1=μ2

The alternative hypothesis may be that we expect differences in performance between the two populations but are unsure which group will perform better or worse (a two-tailed test):

Ha:μ1≠μ2Ha:μ1≠μ2

or, as discussed in Chapter 5, for a one-tailed test, the null hypothesis is either

H0:μ1≤μ2orH0:μ1≥μ2H0:μ1≤μ2 or H0:μ1≥μ2

depending on which alternative hypothesis is being tested:

Ha:μ1>μ2orHa:μ1<μ2,respectivly.Ha:μ1>μ2 or Ha:μ1<μ2, respectivly.

A significant difference between the two groups (samples representing populations) depends on the critical value for the statistical test being conducted. As with the statistical tests described in Chapter 5, alpha is typically set at .05 (α = .05).

Remember from Chapter 5 that parametric tests, such as the t test, are inferential statistical tests designed for sets of data that meet certain requirements. The most basic requirement is that the data fit a bell-shaped distribution. In addition, parametric tests involve data for which certain parameters are known, such as the mean (µ) and the standard deviation (σ ). Finally, parametric tests use interval-ratio data.

t Test for Independent Groups (Samples): What It Is and What It Does

The independent-groups t test is a parametric statistical test that compares the performance of two different samples of participants. It indicates whether the two samples perform so similarly that we conclude that they are likely from the same population, or whether they perform so differently that we conclude that they represent two different populations. Imagine, for example, that a researcher wants to study the effects on exam performance of massed versus spaced study. All subjects in the experiment study the same material for the same amount of time. The difference between the groups is that one group studies for 6 hours all at once (massed study), whereas the other group studies for 6 hours broken into three 2-hour blocks (spaced study). Because the researcher believes that the spaced-study method will lead to better performance, the null and alternative hypotheses are

H0:Spacedstudy≤Massedstudy,orμ1≤μ2Ha:Spacedstudy>Massedstudy,orμ1>μ2H0: Spaced study ≤ Massed study, or μ1≤μ2Ha:Spaced study > Massed study, or μ1>μ2

independent-groups t test A parametric inferential test for comparing sample means of two independent groups of scores.

The 20 subjects are chosen by random sampling and assigned to the groups randomly. Because of the random assignment of subjects, we are confident that there are no major differences between the groups prior to the study. The dependent variable is the subjects’ scores on a 30-item test of the material; these scores are listed in Table 11.1.

Notice that the mean performance of the spaced-study group (¯¯¯X1=22X¯1=22) is better than that of the massed-study group (¯¯¯X2=16.9X¯2 = 16.9). However, we want to be able to say more than this. In other words, we need to statistically analyze the data to determine whether the observed difference is statistically significant. As you may recall from Chapter 5, statistical significance indicates that an observed difference between two descriptive statistics (such as means) is unlikely to have occurred by chance. For this analysis, we will use an independent-groups t test.

Calculations for the Independent-Groups t Test

The formula for an independent-groups t test is

tobt=¯¯¯X1−¯¯¯X2S¯¯¯X1−¯¯¯X2tobt=X¯1−X¯2SX¯1−X¯2

TABLE 11.1 Number of items answered correctly by each subject under spaced-versus massed-study conditions using a between-subjects design (N = 20)

SPACED STUDY	MASSED STUDY
23	17
18	18
23	21
22	15
20	15
24	16
21	17
24	19
21	14
24	17
¯¯¯X1=22X¯1=22	¯¯¯X2=16.9X¯2=16.9

This formula resembles that for the single-sample t test discussed in Chapter 5. However, rather than comparing a single sample mean to a population mean, we are comparing two sample means. The denominator in the equation represents the standard error of the difference between means —the estimated standard deviation of the sampling distribution of differences between the means of independent samples in a two-sample experiment. When conducting an independent-groups t test, we are determining how far from the difference between the population means the difference between the sample means falls. If the difference between the sample means is large, it will fall in one of the tails of the distribution (far from the difference between the population means). Remember, our null hypothesis says that the difference between the population means is zero.

To determine how far the difference between sample means is from the difference between the population means, we need to convert our mean differences to standard errors. The formula for this conversion is similar to the formula for the standard error of the mean, introduced in Chapter 5:

standard error of the difference between means The standard deviation of the sampling distribution of differences between the means of independent samples in a two-sample experiment.

s¯¯¯X1−¯¯¯X2=√S21n1+S22n2sX¯1−X¯2=S12n1+S22n2

The standard error of the difference between the means does have a logical meaning. If you took thousands of pairs of samples from these two populations, and found ¯¯¯X1−¯¯¯X2X¯1 − X¯2for each pair, those differences between means would not all be the same. They would form a distribution. The mean of that distribution would be the difference between the means of the populations (µ1 − µ2), and its standard deviation would be s¯¯¯X1−¯¯¯X2sX¯1−X¯2.

Putting all of this together, we see that the formula for determining t is

tobt=¯¯¯X1−¯¯¯X2√s21n1+s22n2tobt=X¯1−X¯2s12n1+s22n2

where

tobt=thevalueoftobtained¯¯¯X1and¯¯¯X2=themeansforthetwogroupss21ands22=thevariancesofthetwogroups(thestandarddeviationsquared)n1andn2=thenumberofsubjectsineachofthetwogroups(weusentorefertothesubgroupsandNtorefertothetotalnumberofpeopleinthestudy)tobt=the value of t obtainedX¯1 and X¯2=the means for the two groupss12 and s22=the variances of the two groups (the standard deviation squared)n1 and n2=the number of subjects in each of the two groups (we use nto refer to the subgroups and N to refer to the total number of people in the study)

Let's use this formula to determine whether there are any significant differences between our spaced and massed study groups.

¯¯¯X1=ΣX1n1=22010=22¯¯¯X2=ΣX2n2=16910=16.9s21=Σ(X1−¯¯¯X1)2n1−1=369=4.00s22=Σ(X2−¯¯¯X2)2n2−1=38.99=4.32t=¯¯¯X1−¯¯¯X2√s21n1+s22n2=22−16.9√4.0010+4.3210=5.1√.832=5.1√.832=5.1.912=5.59X¯1=ΣX1n1=22010=22X¯2=ΣX2n2=16910=16.9s12=Σ(X1−X¯1)2n1−1=369=4.00s22=Σ(X2−X¯2)2n2 − 1=38.99=4.32 t = X¯1−X¯2s12n1+s22n2=22−16.94.0010+4.3210=5.1.832=5.1.832=5.1.912=5.59

Interpreting the Independent-Groups t Test

The tobt = 5.59. We must now consult Table A.2 in Appendix A to determine the critical value for t (tcv). First we need to determine the degrees of freedom, which for an independent-groups t test are (n1 − 1) + (n2 − 1), or n1 + n2 − 2. In the present study, with 10 subjects in each group, there are 18 degrees of freedom (10 + 10 − 2 = 18). The alternative hypothesis was one-tailed, and α = .05.

Consulting Table A.2, we find that for a one-tailed test with 18 degrees of freedom, the critical value of t at the .05 level is 1.734. Our tobt falls beyond the critical value (is larger than the critical value). Thus, the null hypothesis is rejected, and the alternative hypothesis that subjects in the spaced-study condition performed better on a test of the material than did subjects in the massed-study condition is supported. Because the t score was significant at the .05 level, we should check for significance at the .025, .01, .005, and .0005 levels provided in Table A.2. Our tobt of 5.59 is larger than the critical values at all of the levels of significance provided in Table A.2. This result is pictured in Figure 11.1. In APA style, it would be reported as follows: t(18) = 5.59, p < .0005 (one-tailed). This conveys in a concise manner the t score and the degrees of freedom and that the result was significant at the .0005 level. Keep in mind that when a result is significant, the p value is reported as less than (<).05 (or some smaller probability), not greater than (>)—an error commonly made by students. Remember the p value, or alpha level, indicates the probability of a Type I error. We want this probability to be small, meaning we are confident that there is only a small probability that our results were due to chance. This means it is highly probable that the observed difference between the groups is truly a meaningful difference—that it is actually due to the independent variable. Instructions on using Excel, SPSS, or the TI-84 calculator to conduct this independent-groups t test appear in the Statistical Software Resources section at the end of this chapter.

Graphing the Means

Typically, when a significant difference is found between two means, the means are graphed to provide a pictorial representation of the difference. In creating a graph, we place the independent variable on the x-axis and the dependent variable on the y-axis. As noted in Module 3, the y-axis should be 60% to 75% of the length of the x-axis. For a line graph, we plot each mean and connect them with a line. For a bar graph, we draw separate bars whose heights represent the means. Figure 11.2 shows a bar graph representing the data from the spaced- versus massed-study experiment. Recall that the mean number of items answered correctly by those in the spaced-study condition was 22, compared with a mean of 16.9 for those in the massed-study condition.

FIGURE 11.1 The obtained t score in relation to the t critical value

Effect Size: Cohen's d and r2

In addition to the reported statistic, alpha level, and graph, the American Psychological Association (2010) recommends that we also look at effect size —the proportion of variance in the dependent variable that is accounted for by the manipulation of the independent variable. Effect size indicates how big a role the conditions of the independent variable play in determining scores on the dependent variable. Thus, it is an estimate of the effect of the independent variable, regardless of sample size. The larger the effect size, the more consistent is the influence of the independent variable. In other words, the greater the effect size, the more knowing the conditions of the independent variable improves our accuracy in predicting subjects’ scores on the dependent variable. For the t test, one formula for effect size, known as Cohen's d , is

d=¯¯¯X1−¯¯¯X2√s212+s222d=X¯1−X¯2s122+s222

effect size The proportion of variance in the dependent variable that is accounted for by the manipulation of the independent variable.

Cohen's d An inferential statistic for measuring effect size when using a t test.

Let's begin by working on the denominator, using the data from the spaced-versus massed-study experiment:

√s212+s222=√4.002+4.322=√2.00+2.16=√4.16=2.04s122+s222=4.002+4.322=2.00+2.16=4.16=2.04

FIGURE 11.2 Mean number of items answered correctly under spaced- and massed-study conditions

We can now put this denominator into the formula for Cohen's d:

d=22−16.92.04=5.12.04=2.50d=22−16.92.04=5.12.04=2.50

According to Cohen (1988, 1992), a small effect size is one of at least 0.20, a medium effect size is at least 0.50, and a large effect size is at least 0.80. Obviously, our effect size of 2.50 is far greater than 0.80, indicating a very large effect size (most likely a result of using fabricated data). Using APA style, we report that the effect size estimated with Cohen's dis 2.50, or you can report Cohen's d with the t score in the following manner:

t(18)=5.59,p<.0005(one-tailed), d=2.50t(18)=5.59, p < .0005(one-tailed), d=2.50

In addition to Cohen's d, we can also measure effect size for the independent-groups t test using r2, also known as the coefficient of determination. The coefficient of determination (r2) tells us how much of the variance in one variable can be determined from its relationship with the other variable. When we use it with the t test (based on experimental designs with one dependent and one independent variable), we are measuring the proportion of variance accounted for in the dependent variable based on knowing which treatment group the subjects were assigned to for the independent variable. To calculate r2, use the following formula:

r2=t2t2+dfr2=t2t2+df

r 2 (coefficient of determination) A measure of the proportion of the variance in one variable that is accounted for by another variable.

Thus, in our example this would be

r2=5.5925.592+18=31.2531.25+18=31.2549.25=.63r2=5.5925.592+18=31.2531.25+18=31.2549.25=.63

According to Cohen (1988), if r2 is at least .01, the effect size is small; if it is at least .09, it is medium; and if it is at least .25, it is large. Thus, our effect size based on r2 is large—just as it was when we used Cohen's d.

The preceding example illustrates a t test for independent groups with equal n values (sample sizes). In situations where the n values are unequal, a modified version of the previous formula is used. If you need this formula, you can find it in Appendix D.

Power Calculations for the Independent-Groups t Test

Look back at the formula for t and think about what will affect the size of the t score. We would like the t score to be large in order to increase the chance that it will be significant. What will increase the size of the t score? Anything that increases the numerator or decreases the denominator in the equation will increase the t score. What will increase the numerator? A larger difference between the means for the two groups (a greater difference produced by the independent variable) will increase the numerator. This difference is somewhat difficult to influence. However, if we minimize chance in our study and the independent variable truly does have an effect, then the means should be different. What will decrease the size of the denominator? Because the denominator is the standard error of the difference between the means (S¯¯¯X1−¯¯¯X2SX¯1−X¯2) and is derived by using s (the unbiased estimator of the population standard deviation), we can decrease S¯¯¯X1−¯¯¯X2SX¯1−X¯2by decreasing the variability within each condition or group or by increasing sample size. Look at the formula and think about why this would be so. In summary, then, three aspects of a study can increase power:

•Greater differences produced by the independent variable

•Smaller variability of raw scores in each condition

•Increased sample size

We can actually calculate the power of a study using formulas derived by Cohen (1988). To determine the power of a study in which the independent-groups t test is used we'll need to use the previously calculated Cohen's d, along with the sample size in one group (n) in order to calculate 8 (delta). Once we calculate delta, we use it with Table A.3 in Appendix A to determine the power of the study. Delta is calculated as follows:

δ=d√n2=2.50√102=2.50(2.24)=5.59δ=dn2=2.50102=2.50(2.24)=5.59

Once we have calculated delta, we use Table A.3 from Appendix A to determine the power of the study. As you can see in Table A.3, δ is the first column running down the left-hand side of the table. Follow this column to the second half of the page and you will see that our obtained δ is so large that it is beyond the values available in the table. Thus, based on this table, the power of the present study is 1.00. As you may remember from Module 9, Statistical Power is the probability of correctly rejecting a false H0. Accordingly, for this study the probability of correctly rejecting a false H0 is 1.00.

Assumptions of the Independent-Groups t Test

The assumptions of the independent-groups t test are similar to those of the single-sample t test. They are as follows:

•The data are interval-ratio scale.

•The underlying distributions are normal.

•The observations are independent.

•Homogeneity of variance: If we could compute the true variance of the population represented by each sample, the variance in each population would be the same.

If any of these assumptions were violated, it would be appropriate to use another statistic. For example, if the scale of measurement is not interval-ratio or if the underlying distribution is not normal, it may be more appropriate to use a nonparametric statistic (described in Chapter 10). If the observations are not independent, then it is appropriate to use a statistic for a correlated-groups design (described in the next module).

Confidence Intervals

As with the single-sample z and t tests discussed in Chapter 5, we can also compute confidence intervals for the independent-groups t test. We use the same basic formula we did for computing confidence intervals for the single-sample t test in Chapter 5, except that rather than using the sample mean and the standard error of the mean, we use the difference between the means and the standard error of the difference between means. The formula for the 95% confidence interval would be

CI95=¯¯¯X1−¯¯¯X2±tCV(S¯¯¯X−¯¯¯X2)CI95=X¯1−X¯2±tCV(SX¯−X¯2)

We have already calculated the means for the two study conditions (¯¯¯X1X¯1and¯¯¯X2X¯2) and the standard error of the difference between means (S¯¯¯X1−−¯¯¯X2SX¯1−−X¯2) as part of the previous t test problem. Thus, we simply need to determine t to compute the confidence interval, which should contain the difference between the means for the two conditions. Because we are determining a 95% confidence interval, we use tcv at the .05 level, and just as in Chapter 5, we always use the tcv for a two-tailed test because we are determining a confidence interval that contains values both above and below the difference between the means. Consulting Table A.2 in Appendix A for the tcv for 18 degrees of freedom and a two-tailed test, we find that it is 2.101. We can now determine the 95% confidence interval for this problem.

CI95=22−16.9±2.101(.912)=5.1±1.92=3.18−7.02CI95=22−16.9 ±2.101(.912)=5.1 ± 1.92=3.18−7.02

Thus, the 95% confidence interval that should contain the difference in mean test scores between the spaced and the massed groups is 3.18−7.02. This means that if someone asked us how big a difference study type makes on test performance, we could answer that we are 95% confident that the difference in performance on the 30-item test between the spaced- versus massed-study groups would be between 3.18 and 7.02 correct answers.

INDEPENDENT-GROUPS t TEST

What It Is	A parametric test for a two-group between-subjects design
What It Does	Compares performance of the two groups to determine whether they represent the same population or different populations
Assumptions	Interval-ratio data
	Underlying distributions are normal
	Independent observations
	Homogeneity of variance

1.How is effect size different from significance level? In other words, how is it possible to have a significant result, yet a small effect size?

2.How does increasing sample size affect a t test? Why does it affect it in this manner?

3.How does decreasing variability affect a t test? Why does it affect it in this manner?

REVIEW OF KEY TERMS

Cohen's d (p. 182)

effect size (p. 182)

independent-groups t test (p. 178)

r2 (coefficient of determination) (p. 183)

standard error of the difference between means (p. 179)

MODULE EXERCISES

(Answers to odd-numbered questions appear in Appendix B.)

1.Explain when it would be appropriate to use the independent-groups t test.

2.What is the standard error of the difference between means?

3.Why does APA recommend that we calculate a measure of effect size in addition to calculating the test statistic?

4.A college student is interested in whether there is a difference between male and female students in the amount of time spent studying each week. The student gathers information from a random sample of male and female students on campus. Amount of time spent studying is normally distributed. The data follow:

Males	Females
27	25
25	29
19	18
10	23
16	20
22	15
14	19

a.What statistical test should be used to analyze these data?

b.Identify H0 and Ha for this study.

c.Conduct the appropriate analysis.

d.Should H0 be rejected? What should the researcher conclude?

e.If significant, compute the effect size and interpret this.

f.If significant, draw a graph representing the data.

g.Determine the 95% confidence interval.

h.Calculate the power of this test.

5.A student is interested in whether students who study with music playing devote as much attention to their studies as do students who study under quiet conditions. He believes students who study with music do not devote as much attention to their studies as do students who study under quiet conditions. He randomly assigns the 18 subjects to either music or no music conditions and has them read and study the same passage of information for the same amount of time. Participants are then all given the same 10-item test on the material. Their scores appear next. Scores on the test represent interval-ratio data and are normally distributed.

Music	No Music
6	10
5	9
6	7
5	7
6	6
6	6
7	8
8	6
5	9

a.What statistical test should be used to analyze these data?

b.Identify H0 and Ha for this study.

c.Conduct the appropriate analysis.

d.Should H0 be rejected? What should the researcher conclude?

e.If significant, compute the effect size and interpret this.

f.If significant, draw a graph representing the data.

g.Determine the 95% confidence interval.

h.Calculate the power of this test.

6.What is a confidence interval?

CRITICAL THINKING CHECK ANSWERS

Critical Thinking Check 11.1

1.The effect size indicates the magnitude of the experimental treatment regardless of sample size. A result can be statistically significant because sample size was very large, but the effect of the independent variable was not so large. Effect size would indicate whether this was the case, because in this type of situation the effect size should be small.

2.In the long run it means that the obtained t is more likely to be significant. This is so because in terms of the formula used to calculate t, increasing sample size will decrease the standard error of the difference between means (S¯¯¯X1−¯¯¯X2SX¯1−X¯2). This in turn will increase the size of the obtained t, which means that it is more likely to exceed the critical value and be significant.

3.Decreasing variability also makes a t test more powerful (likely to be significant). It does so because decreasing variability also means that (the standard error of the difference between (S¯¯¯X1−¯¯¯X2SX¯1−X¯2) will be smaller. This in turn will increase the size of the obtained t, which means that it is more likely to exceed the critical value and be significant.

MODULE 12

t Test for Correlated Groups (Samples)

Learning Objectives

•Explain when the t test for correlated groups should be used.

•Calculate a correlated-groups t test.

•Interpret a correlated-groups t test.

•Calculate and interpret Cohen's d and r2.

•Explain the assumptions of the correlated-groups t test.

•Calculate confidence intervals.

t Test for Correlated Groups: What It Is and What It Does

The correlated-groups t test , like the previously discussed t test, compares the performance of subjects in two groups. In this case, however, the same people are used in each group (a within-subjects design), or different participants are matched between groups (a matched-subjects design). The test indicates whether there is a difference in sample means and whether this difference is greater than would be expected based on chance. In a correlated-groups design, the sample includes two scores for each person (or matched pair in a matched-subjects design), instead of just one. To conduct the t test for correlated groups (also called the t test for dependent groups or samples), we must convert the two scores for each person into one score. That is, we compute a difference score for each person by subtracting one score from the other for that person (or for the two individuals in a matched pair). Although this may sound confusing, the dependent-groups t test is actually easier to compute than the independent-groups t test that we learned about in the previous module. Because the two samples are related, the analysis becomes easier because we work with pairs of scores. The null hypothesis is that there is no difference between the two scores; that is, a person's score in one condition is the same as that (or a matched) person's score in the second condition. The alternative hypothesis is that there is a difference between the paired scores—that the individuals (or matched pairs) performed differently in each condition.

correlated-groups t test A parametric inferential test used to compare the means of two related (within- or matched-subjects) samples.

To illustrate the use of the correlated-groups t test, imagine that we conduct a study in which participants are asked to learn two lists of words. One list is composed of 20 concrete words (for example, desk, lamp, bus); the other is composed of 20 abstract words (for example, love, hate, deity). Each participant is tested twice, once in each condition.

Because each participant provides one pair of scores, a correlated-groups t test is the appropriate way to compare the means of the two conditions. We expect to find that recall performance is better for the concrete words. Thus, the null hypothesis is

H0:μ1−μ2=0H0:μ1 − μ2=0

and the alternative hypothesis is

Ha:μ1−μ2=0Ha:μ1 − μ2=0

representing a one-tailed test of the null hypothesis.

To better understand the correlated-groups t test, consider the sampling distribution for the test. This is a sampling distribution of the differences between pairs of sample means. In other words, imagine the population of people who must recall abstract words versus the population of people who must recall concrete words. Further, imagine that samples of 8 participants are chosen (the 8 subjects in each individual sample come from one population), and each sample's mean score in the abstract condition is subtracted from the mean score in the concrete condition. We do this repeatedly until the entire population has been sampled. If the null hypothesis is true, the differences between the sample means should be zero, or very close to zero. If, as the researcher suspects, subjects remember more concrete words than abstract words, the difference between the sample means should be significantly larger than zero.

The data representing each participant's performance are presented in Table 12.1.

TABLE 12.1 Number of abstract and concrete words recalled by each participant using a correlated-groups (within-subjects) design

PARTICIPANT	CONCRETE	ABSTRACT
1	13	10
2	11	9
3	19	13
4	13	12
5	15	11
6	10	8
7	12	10
8	13	13

TABLE 12.2 Number of concrete and abstract words recalled by each participant with difference scores provided

PARTICIPANT	CONCRETE	ABSTRACT	D (DIFFERENCE SCORE)
1	13	10	3
2	11	9	2
3	19	13	6
4	13	12	1
5	15	11	4
6	10	8	2
7	12	10	2
8	13	13	0
			Σ = 20

Notice that we have two sets of scores, one for the concrete word list and one for the abstract list. Our calculations for the correlated-groups t test involve transforming the two sets of scores into one set by determining difference scores. Difference scores represent the difference between subjects’ performance in one condition and their performance in the other condition. The difference scores for our study are shown in Table 12.2.

difference scores Scores representing the difference between subjects’ performance in one condition and their performance in a second condition.

Calculations for the Correlated-Groups t Test

After calculating the difference scores, we have one set of scores representing the performance of participants in both conditions. We can now compare the mean of the difference scores with zero (based on the null hypothesis stated previously). The computations from this point on for the correlated-groups t test are similar to those for the single-sample t test in Module 10.

t=¯¯¯D−0S¯¯¯Dt=D¯−0SD¯

where

¯¯¯D=themeanofthedifferencescoresS¯¯¯D=thestandarderrorofthedifferencescoresD¯=the mean of the difference scoresSD¯=the standard error of the difference scores

The standard error of the difference scores (S¯¯¯DSD¯) represents the standard deviation of the sampling distribution of mean differences between dependent samples in a two-group experiment. It is calculated in a manner similar to the estimated standard error of the mean (S¯¯¯XSX¯) that we learned how to calculate in Module 10:

S¯¯¯D=SD√NSD¯=SDN

standard error of the difference scores The standard deviation of the sampling distribution of mean differences between dependent samples in a two-group experiment.

where sD = the estimated standard deviation of the difference scores. The standard deviation of the difference scores is calculated in the same manner as the standard deviation for any set of scores:

SD=√Σ(D−¯¯¯D)2N−1SD=Σ(D − D¯)2N−1

where

D=eachdifferencescore,¯¯¯D=themeanofthedifferencescores,andN=thetotalnumberofdifferencescores.D=each difference score,D¯=the mean of the difference scores, andN=the total number of difference scores.

Let's use these formulas to determine sD, S¯¯¯DSD¯, and the final t score.

We begin by determining the mean of the difference scores ((¯¯¯D)(D¯)), which is 20/8 = 2.5, and then use this to determine the difference scores, the squared difference scores, and the sum of the squared difference scores, all needed to calculate the standard deviation (sD). These are shown in Table 12.3. We then use this sum (24) to determine sD.

SD=√247=√3.429=1.85SD=247=3.429=1.85

Next, we use the standard deviation (sD = 1.85) to calculate the standard error of the difference scores (S¯¯¯DSD¯):

S¯¯¯D=SD√N=1.85√8=1.852.83=0.65SD¯=SDN=1.858=1.852.83=0.65

Finally, we use the standard error of the difference scores (S¯¯¯DSD¯= .65) and the mean of the difference scores (2.5) in the t test formula:

t=¯¯¯D−0S¯¯¯D=2.5−00.65=2.50.65=3.85t=D¯−0SD¯=2.5−00.65=2.50.65=3.85

TABLE 12.3 Difference scores and squared difference scores for concrete and abstract words

D (DIFFERENCE SCORE)	D−¯¯¯DD− D¯	(D−¯¯¯DD− D¯)2
3	0.5	0.25
2	−0.5	0.25
6	3.5	12.25
1	−1.5	2.25
4	1.5	2.25
2	−0.5	0.25
2	−0.5	0.25
0	−2.5	6.25
		Σ = 24

Interpreting the Correlated-Groups t Test and Graphing the Means

The degrees of freedom for a correlated-groups t test are equal to N − 1—in this case, 8 − 1 = 7. We can use Table A.2 in Appendix A to determine t for a one-tailed test with α = .05 and df = 7. Consulting this table, we find that tcv = 1.895. Our tobt = 3.85 and therefore falls in the region of rejection. Because the t score was significant at the .05 level, we should check for significance at the .025, .01, .005, and .0005 levels provided in Table A.2. Our tobt of 3.85 is larger than the critical values of the .025, .01, and .005 levels. Figure 12.1shows this tobt in relation to the t . In APA style, this would be reported as t(7) = 3.85, p < .005, (one-tailed), indicating that there is a significant difference in the number of words recalled in the two conditions. Instructions on using Excel, SPSS, or the TI-84 calculator to conduct this correlated-groups t test appear in the Statistical Software Resources section at the end of this chapter.

This difference in recall performance is illustrated in Figure 12.2, in which the mean number of concrete and abstract words recalled by the participants has been graphed. Thus, we can conclude that subjects performed significantly better in the concrete word condition, supporting the alternative (research) hypothesis.

FIGURE 12.1 The obtained t score in relation to the t critical valueFIGURE 12.2 Mean number of words recalled correctly under concrete and abstract word conditions

Effect Size: Cohen's d and r2

As with the independent-groups t test, we should also compute Cohen's d (the proportion of variance in the dependent variable that is accounted for by the manipulation of the independent variable) for the correlated-groups t test. Remember, effect size indicates how big a role the conditions of the independent variable play in determining scores on the dependent variable. For the correlated-groups t test, the formula for Cohen's d is

d=¯¯¯DSDd=D¯SD

where ¯¯¯DD¯ is the mean of the difference scores and sD is the standard deviation of the difference scores. We have already calculated each of these as part of the t test. Thus,

d=2.51.85=1.35d=2.51.85=1.35

Cohen's d for a correlated-groups design is interpreted in the same manner as d for an independent-groups design. That is, a small effect size is one of at least 0.20, a medium effect size is at least 0.50, and a large effect size is at least 0.80. Obviously, our effect size of 1.35 is far greater than 0.80, indicating a very large effect size.

We can also compute r2 for the correlated-groups t test just as we did for the independent-groups t test using the same formula we did in Module 11:

r2=t2t2+df=3.8523.852+7=14.8221.82=.68r2=t2t2+df=3.8523.852+7=14.8221.82=.68

Using the guidelines established by Cohen (1988) and noted earlier in the chapter, this is a large effect size.

Power Calculations for the Correlated-Groups t Test

Just as we calculated power for the independent-groups t test, we can also do so for this t test. To determine the power of a study in which the correlated-groups t test is used we'll need to use the previously calculated Cohen's d, along with the sample size (N) in order to calculate δ (delta). Once we calculate delta, we use it with Table A.3 in Appendix A to determine the power of the study. Delta is calculated as follows:

δ=d√N=1.35√8=1.35(2.83)=3.82δ=dN=1.358=1.35(2.83)=3.82

Once we have calculated delta, we use Table A.3 from Appendix A to determine the power of the study. As you can see in Table A.3, δ is the first column running down the left hand side of the table. Follow this column to the second half of the page and you will see that our obtained δ of 3.82 for a one-tailed test yields a power value of .98. Thus, based on this table, the power of the present study is .98. As you may remember from Module 9, statistical power is the probability of correctly rejecting a false H0. Accordingly, for this study the probability of correctly rejecting a false H0 is .98.

Assumptions of the Correlated-Groups t Test

The assumptions for the correlated-groups t test are the same as those for the independent-groups t test, except for the assumption that the observations are independent. In this case, the observations are not independent—they are correlated (dependent).

Confidence Intervals

Just as with the independent-groups t test, we can calculate confidence intervals based on a correlated-groups t test. In this case, we use a formula very similar to that used for the single-sample t test from Module 10:

CI.95=¯¯¯D±tCV(S¯¯¯D)CI.95=D¯ ± tCV(SD¯)

We have already calculated ¯¯¯DD¯ and S¯¯¯DSD¯as part of the previous t test problem. Thus, we only need to determine tcv in order to calculate the 95% confidence interval. Once again, we consult Table A.2 in Appendix A for a two-tailed test (remember we are determining values both above and below the mean, so we use the tcv for a two-tailed test) with 7 degrees of freedom. We find that the t cv is 2.365. Using this, we calculate the confidence interval as follows:

CI.95=2.5±2.365(0.65)=2.5±1.54=0.95−4.04CI.95=2.5 ±2.365(0.65)=2.5 ± 1.54=0.95−4.04

Thus, the 95% confidence interval that should contain the difference in mean test scores between the concrete and the abstract words is 0.96−4.04. This means that if someone asked us how big a difference word type makes on memory performance, we could answer that we are 95% confident that the difference in performance on the 20-item memory test between the two wordtype conditions would be between 0.96 and 4.04 words recalled correctly.

CORRELATED-GROUPS t TESTS

What It Is	A parametric test for a two-group within-subjects or matched-subjects design
What It Does	Analyzes whether each individual performed in a similar or different manner across conditions
Assumptions	Interval-ratio data
	Underlying distributions are normal
	Correlated (dependent) observations
	Homogeneity of variance

1.Explain what difference scores are and how they are calculated for a correlated-groups t test.

2.Why is H0 for a correlated-groups t test H0: µ1 − µ2 = 0? In other words, why should the difference scores be equal to 0 if H0 is true?

REVIEW OF KEY TERMS

correlated-groups t test (p. 188)

difference scores (p. 190)

standard error of the difference scores (p. 190)

MODULE EXERCISES

(Answers to odd-numbered questions appear in Appendix B.)

1.What is the difference between an independent-groups t test and a correlated-groups t test in terms of when each should be used?

2.What are the assumptions of a correlated-groups t test?

3.When using a correlated-groups t test, how do we take the two scores recorded for each participant and “turn them into” one score for each participant?

4.What measures of effect size are used for a correlated-groups t test?

5.A researcher is interested in whether participating in sports positively influences self-esteem in young girls. She identifies a group of girls who have not played sports before, but are now planning to begin participating in organized sports. She gives them a 50-item self-esteem inventory before they begin playing sports and administers it again after six months of playing sports. The self-esteem inventory is measured on an interval scale, with higher numbers indicating higher self-esteem. The researcher expects the self-esteem scores to be higher after the girls have played sports. In addition, scores on the inventory are normally distributed. The scores are:

Before	After
44	46
40	41
39	41
46	47
42	43
43	45

a.What statistical test should be used to analyze these data?

b.Identify H0 and Ha for this study.

c.Conduct the appropriate analysis.

d.Should H0 be rejected? What should the researcher conclude?

e.If significant, compute the effect size and interpret this.

f.If significant, draw a graph representing the data.

g.Determine the 95% confidence interval.

h.Calculate the power of this test.

6.The researcher in exercise number 5 from Module 11 decides to conduct the same study using a within-subjects design in order to control for differences in cognitive ability. He selects a random sample of participants and has them study different material of equal difficulty in both the music and no music conditions. The data appear next. As before, they are measured on an interval-ratio scale and are normally distributed.

Music	No Music
6	10
7	7
6	8
5	7
6	7
8	9
8	8

a.What statistical test should be used to analyze these data?

b.Identify H0 and Ha for this study.

c.Conduct the appropriate analysis.

d.Should H0 be rejected? What should the researcher conclude?

e.If significant, compute the effect size and interpret this.

f.If significant, draw a graph representing the data.

g.Determine the 95% confidence interval.

h.Calculate the power of this test.

CRIIICAL THINKING CHECK ANSWERS

Critical Thinking Check 12.1

1.Difference scores represent the difference in performance for each participant between the score in one condition versus the other condition in the experiment. Thus, we simply take the score from one condition and subtract it from the score in the other condition—always subtracting in the same order (for example, condition one from condition two, or vice versa).

2.If H0 is true, then the independent variable in the study should not have had any effect. If this is the case, then the difference score for each participant should be zero because the performance in each condition should be the same.

CHAPTER SIX SUMMARY AND REVIEW

Two-Group t Tests

CHAPTER SUMMARY

Two inferential statistics were presented in this chapter. All statistics discussed in this chapter are parametric and for use with interval-ratio data. The statistics varied based on whether the design was between-subjects or correlated-groups. It is imperative that the appropriate statistic be used to analyze the data collected in an experiment. The first point to consider when determining which statistic to use is whether it should be a parametric or nonparametric statistic. This decision is based on the type of data collected, the type of distribution to which the data conform, and whether any parameters of the distribution are known. Second, we need to know whether the design is between-subjects or correlated-groups when selecting a statistic. Lastly, we need to determine how many groups we are comparing. For designs in which interval-ratio data were collected on two groups, we use a ttest—independent-groups for between-subjects designs and correlated-groups for within-subjects and matched-subjects designs.

CHAPTER 6 REVIEW EXERCISES

(Answers to exercises appear in Appendix B.)

Fill-in Self-Test

Answer the following questions. If you have trouble answering any of the questions, restudy the relevant material before going on to the multiple-choice self-test.

1.A(n) __________ is a parametric inferential test for comparing sample means of two independent groups of scores.

2.__________ is an inferential statistic for measuring effect size with t tests.

3.A(n) __________ is a parametric inferential test used to compare the means of two related samples.

4.When using a correlated-groups t test, we calculate __________, scores representing the difference between subjects’ performance in one condition and their performance in a second condition.

5.The standard deviation of the sampling distribution of mean differences between dependent samples in a two-group experiment is the __________.

Multiple-Choice Self-Test

Select the single best answer for each of the following questions. If you have trouble answering any of the questions, restudy the relevant material.

1.When comparing the sample means for two unrelated groups, we use the

a.correlated-groups t test.

b.independent-groups t test.

c.z test.

d.single-sample t test.

2.The value of the t test will __________ as sample variance decreases.

a.increase

b.decrease

c.stay the same

d.not be affected

3.Which of the following t test results has the greatest chance of statistical significance?

a.t ( 28) = 3.12

b.t ( 14) = 3.12

c.t(18) = 3.12

d.t(10) = 3.12

4.If the null hypothesis is false, then the t test should be

a.equal to 0.00.

b.greater than 1.

c.greater than .05.

d.greater than .95.

5.Imagine that you conducted an independent-groups t test with 10 participants in each group. For a one-tailed test, the tcv at α = .05 would be

a.±1.729.

b.±2.101.

c.±1.734.

d.± 2.093.

6.If a researcher reported for an independent-groups t test that t (26) = 2.90, p < .005, how many participants were there in the study?

a.13

b.26

c.27

d.28

7.Ha: µ1 ≠ µ2 is the __________ hypothesis for a _________-tailed test.

a.null; two

b.alternative; two

c.null; one

d.alternative; one

8.Cohen's d is a measure of __________ for a(n) __________.

a.significance; t test

b.significance; ANOVA

c.effect size; t test

d.effect size; ANOVA

9.tcv = ±2.15 and tobt = −2.20. Based on these results we ____________.

a.reject H0

b.fail to reject H0

c.accept H0

d.reject Ha

10.If a correlated-groups t test and an independent-groups t test both have df = 10, which experiment used fewer participants?

a.They both used the same number of participants (n = 10).

b.They both used the same number of participants (n = 11).

c.The correlated-groups t test

d.The independent-groups t test

11.If researchers reported that, for a correlated-groups design, t(15) = 2.57, p < .05, you can conclude that

a.a total of 16 people participated in the study.

b.a total of 17 people participated in the study.

c.a total of 30 people participated in the study.

d.there is no way to determine how many people participated in the study.

Self-Test Problems

1.A college student is interested in whether there is a difference between male and female students in the amount of time spent doing volunteer work each week. The student gathers information from a random sample of male and female students on her campus. Amount of time volunteering (in minutes) is normally distributed. The data appear next. They are measured on an interval-ratio scale and are normally distributed.

Males	Females
20	35
25	39
35	38
40	43
36	50
24	49

a.What statistical test should be used to analyze these data?

b.Identify H0 and Ha for this study.

c.Conduct the appropriate analysis.

d.Should H0 be rejected? What should the researcher conclude?

e.If significant, compute the effect size and interpret this.

f.If significant, draw a graph representing the data.

g.Determine the 95% confidence interval.

h.Calculate the power of this test.

2.A researcher is interested in whether studying with music helps or hinders the learner. In order to control for differences in cognitive ability, the researcher decides to use a within-subjects design. He selects a random sample of subjects and has them study different material of equal difficulty in both the music and no music conditions. Subjects then take a 20-item quiz on the material. The study is completely counterbalanced to control for order effects. The data appear next. They are measured on an interval-ratio scale and are normally distributed.

Music	No Music
17	17
16	18
15	17
16	17
18	19
18	18

a.What statistical test should be used to analyze these data?

b.Identify H0 and Ha for this study.

c.Conduct the appropriate analysis.

d.Should H0 be rejected? What should the researcher conclude?

e.If significant, draw a graph representing the data.

g.Determine the 95% confidence interval.

h.Calculate the power of this test.

CHAPTER SIX

Statistical Software Resources

If you need help getting started with Excel or SPSS, please see Appendix C: Getting Started with Excel and SPSS.

MODULE 11 Independent-Groups t Test

The problem we'll be using to illustrate how to calculate the independent-groups t test appears in Module 11.

Let's use the example from Module 11 in which a researcher wants to study the effects on exam performance of massed versus spaced study. All participants in the experiment study the same material for the same amount of time. The difference between the groups is that one group studies for 6 hours all at once (massed study), whereas the other group studies for 6 hours broken into three 2-hour blocks (spaced study). The dependent variable is the subjects’ scores on a 30-item test of the material; these scores are listed in Table 11.1 in Module 11.

Using Excel

We'll use the data from Table 11.1 to illustrate how to use Excel to calculate an independent-groups t test. The data represent number of items answered correctly for two groups of subjects when one group used a spaced-study technique and the other used a massed-study technique. The researcher predicted that those in the spaced-study condition would perform better. The data from Table 11.1 have been entered into the following Excel worksheet, with the data from the spaced condition in Column A and the data from the massed condition in Column B.

Next, with the Data ribbon active, we click on Data Analysis in the top right corner of the screen and the following dialog box appears:

You can see that I have selected t-test: Two-Sample Assuming Equal Variances. After you have done the same, click OK and you will then see the following dialog box:

With the cursor in the Variable 1 Range box, highlight the data in the A column in the Excel spreadsheet so that they are entered into the Variable 1 Range box (do not highlight the column heading of Spaced Study). Do the same for Column B and enter these data into the Variable 2 Range box. Then click OK. You will see the following output:

We are provided with the t test statistic of 5.59 along with the probability and critical values for both one- and two-tailed tests. We can see based on the one-tailed critical value of t that the test is significant at p = .0000132. This would be reported in APA style as t (18) = 5.59, p = .000 (one-tailed).

Using SPSS

We'll use the same problem to illustrate the use of SPSS for an independent-groups t test. In this study, researchers have subjects use one of two types of study, spaced or massed, and then measure exam performance. The data from Table 11.1 are entered into SPSS as in the following window:

Notice that the independent variable of Type of Study has been converted to a numeric variable where the number 1 represents the spaced-study condition and the number 2 represents the massed-study condition. Thus, the data in rows 1–10 represent spaced-study data, and the data in rows 11–20 represent the massed-study data. Click on the Analyzetab and then Compare Means followed by Independent-Samples T Test as is illustrated next.

The following dialog box will appear:

We'll place the Examscore data into the Test Variable (dependent variable) box and the Typeofstudy data into the Grouping Variable (independent variable) box by highlighting each variable and using the arrow keys in the middle of the dialog box to move the variables The dialog box should appear as follows once you've completed this task:

Once you have done this, click on the Grouping Variable box and the Define Groups box below it, which will become active, and you will receive a dialog box as follows:

We have to let SPSS know what values we used to designate the spaced-versus the massed-study groups. Thus, enter a 1 into the Group 1 box and a 2 into the Group 2 box and click Continue. Then click OK in the Independent-Samples T Test dialog box. You should receive output similar to the following:

Descriptive statistics for the two conditions are reported in the first table followed by the t test score of 5.590. Because we are assuming equal variances, we use the df, t score, and other data from that row in the table. Moreover, the two-tailed significance level is provided, but because this was a one-tailed test, you should divide the p-value in half, or consult a critical values table for t in a statistics text. We are also provided with the 95% confidence interval for the t test.

Using the TI-84

Let's use the data from Table 11.1 to conduct the test using the TI-84 calculator.

1.With the calculator on, press the STAT key.

2.EDIT will be highlighted. Press the ENTER key.

3.Under L1 enter the data from Table 11.1 for the spaced-study group.

4.Under L2 enter the data from Table 11.1 for the massed-study group.

5.Press the STAT key once again and highlight TESTS.

6.Scroll down to 2-SampTTest. Press the ENTER key.

7.Highlight DATA. Enter L1 next to List1 (by pressing the 2nd key followed by the 1 key). Enter L2 next to List2 (by pressing the 2nd key followed by the 2 key).

8.Scroll down to µ1 and select >µ2 (for a one-tailed test in which we predict that the spaced-study group will do better than the massed-study group). Press ENTER.

9.Scroll down to Pooled: and highlight YES. Press ENTER.

10.Scroll down to and highlight CALCULATE. Press ENTER.

The t score of 5.59 should be displayed followed by the significance level of .000013 and the df of 18. In addition, descriptive statistics for both variables on which you entered data will be shown.

MODULE 12 Correlated-Groups t Test

To illustrate the use of the correlated-groups t test, let's use the example from Module 12 in which subjects are asked to learn two lists of words. One list is composed of 20 concrete words (for example, desk, lamp, bus); the other is composed of 20 abstract words (for example, love, hate, deity). Each subject is tested twice, once in each condition. Because each subject provides one pair of scores, a correlated-groups t test is the appropriate way to compare the means of the two conditions.

Using Excel

Using Excel to calculate a correlated-groups t test is very similar to using it to calculate an independent-groups t test. We'll use the data from Table 12.1 in Module 12 to illustrate its use. For this t test we are comparing memory for concrete versus abstract words for a group of 8 participants. Each participant served in both conditions. First enter the data from Table 12.1 into an Excel spreadsheet (as seen next). The data for the concrete-word condition are entered into Column A and the data for the abstract-word condition into Column B.

Then, with the Data ribbon active, click on Data Analysis and select t-test: Paired Two Sample for Means as indicated in the following dialog box. Click OK after doing this.

You will then get the following dialog box into which you will enter the data from Column A into the Variable 1 Range box by clicking in the Variable 1 Range box and then highlighting the data in Column A and then doing the same with the data in Column B and the Variable 2 Range box. After doing this, the dialog box should appear as follows:

Click OK and you will receive the output as it appears next.

We can see that t (7) = 3.82, p = .0033 (one-tailed).

Using SPSS

To illustrate the correlated-groups t test, we'll use the same problem described above in which a researcher has a group of participants study a list of 20 concrete words and 20 abstract words and then measures recall for the words within each condition. The researcher predicts that the subjects will have better recall for the concrete words. The data from Table 12.1 (in Module 12) are entered into SPSS as follows. We have 8 participants and each serves in both conditions. Thus the scores for each subject in both conditions appear in a single row.

Next we click on the Analyze tab followed by the Compare Means tab and then Paired-Samples T Test, as illustrated next.

These actions will produce the following dialog box:

Highlight the Concretewords variable and then click the arrow button in the middle of the screen. The Concretewords variable should now appear under Variable1 in the box on the right of the window. Do the same for the Abstractwords variable and it should appear under Variable2 in the box on the right. The dialog box should now appear as follows:

Click OK and the output will appear in an output window as below:

As in the independent-samples t test in the previous example, descriptive statistics appear in the first table, followed by the correlation between the variables. Lastly, the correlated-groups t test results appear in the third table with the t score of 3.819, 7 degrees of freedom, and the two-tailed significance level. Because this was a one-tailed test, we can find the significance level for this one-tailed t test by dividing the two-tailed significance level in half. Thus, for this problem t (7) = 3.82, p = .0035 (one-tailed). As in the previous t test in Module 7, the 95% confidence interval is also reported.

Using the TI-84

Let's use the data from Table 12.1 to conduct the test using the TI-84 calculator.

1.With the calculator on, press the STAT key.

2.EDIT will be highlighted. Press the ENTER key.

3.Under L1 enter the data for Concrete Words.

4.Under L2 enter the data for Abstract Words.

5.Move the cursor so that L3 is highlighted and then enter the following formula: L1 − L2 and press ENTER (to enter L1, press the 2nd key followed by the 1 key; to produce L2, press the 2nd key followed by the 2 key). This will produce a list of difference scores (the Concrete Word scores minus the Abstract Word scores) for each subject.

6.Press the STAT key once again and highlight TESTS.

7.Scroll down to T-Test. Press the ENTER key.

8.Highlight DATA. Enter 0 next to µ0:. Enter L3 next to List (by pressing the 2nd key followed by the 3 key).

9.Scroll down to µ: and select >µ0 (for a one-tailed test in which we predict that the difference between the scores for each condition will be greater than 0). Press ENTER.

10.Scroll down to and highlight CALCULATE. Press ENTER.

The t score of 3.82 should be displayed followed by the significance level of .0033. In addition, descriptive statistics will be shown.