300wds

profilejery.b
07CH_Tanner_Statistics.pdf

191

7Repeated Measures Designs for Interval Data

Karen Kasmauski/Corbis

Chapter Learning Objectives After reading this chapter, you should be able to do the following:

1. Explain how initial between-groups differences affect t test or analysis of variance.

2. Compare the independent t test to the dependent-groups t test.

3. Complete a dependent-groups t test.

4. Explain what “power” means in statistical testing.

5. Compare the one-way ANOVA to the within-subjects F.

6. Complete a within-subjects F.

tan82773_07_ch07_191-226.indd 191 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.1 Reconsidering the t and F Ratios

Introduction Tests of significant difference, such as the t test and analysis of variance, take two basic forms, depending upon the independence of the groups. Up to this point, the text has focused only on independent-groups tests: tests where those in one group cannot also be subjects in other groups. However, dependent-groups procedures, in which the same group is used multiple times, offer some advantages.

This chapter focuses on the dependent-groups equivalents of the independent t test and the one-way ANOVA. Although they answer the same questions as their independent-groups equivalents (are there significant differences between groups?), under particular circum- stances these tests can do so more efficiently and with more statistical power.

7.1 Reconsidering the t and F Ratios The scores produced in both the independent t and the one-way ANOVA are ratios. In the case of the t test, the ratio is the result of dividing the difference between the means of the groups by the standard error of the difference:

t 5 M1 2 M2

SEd

With ANOVA, the F ratio is the mean square between (MSbet) divided by the mean square within (MSwith):

F 5 MSbet MSwith

With either t or F, the denominator in the ratio reflects how much scores vary within (rather than between) the groups of subjects involved in the study. These differences are easy to see in the way the standard error of the difference is calculated for a t test. When group sizes are equal, recall that the formula is

SEd 5 Î (SEM1) 2 1 (SEM2)

2

with

SEM 5 s √n

and s, of course, a measure of score variation in any group.

So the standard error of the difference is based on the standard error of the mean, which in turn is based on the standard deviation. Therefore, score variance within in a t test has its root in the standard deviation for each group of scores. If we reverse the order and work from the standard deviation back to the standard error of the difference, we note the following:

• When scores vary substantially in a group, the result is a large standard deviation. • When the standard deviation is relatively large, the standard error of the mean must

likewise be large because the standard deviation is the numerator in the formula for SEM. • A large standard error of the mean results in a large standard error of the difference be-

cause that statistic is the square root of the sum of the squared standard errors of the mean.

tan82773_07_ch07_191-226.indd 192 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.1 Reconsidering the t and F Ratios

• When the standard error of the difference is large, the difference between the means has to be correspondingly larger for the result to be statistically significant. The table of critical values indicates that no t ratio (the ratio of the differences between the means and the stan- dard error of the difference) less than 1.96 to 1 is going to be significant, and even that value requires an infinite sample size.

Error Variance The point of the preceding discussion is that the value of t in the t test—and for F in an ANOVA—is greatly affected by the amount of variability within the groups involved. Other factors being equal, when the variability within the groups is extensive, the values of t and F are diminished and less likely to be statistically significant than when groups have relatively little variability within them.

These differences within groups stem from differences in the way individuals within the samples react to whatever treatment is the independent variable; different people respond differently to the same stimulus. These differences represent error variance—the outcome whenever scores differ for reasons not related to the IV.

But within-group differences are not the only source of error variance in the calculation of t and F. Both t test and ANOVA assume that the groups involved are equivalent before the independent variable is introduced. In a t test where the impact of relaxation therapy on cli- ents’ anxiety is the issue, the test assumes that before the therapy is introduced, the treat- ment group which receives the therapy and the control group which does not both begin with equivalent levels of anxiety. That assumption is the key to attributing any differences after the treatment to the therapy, the IV.

Confounding Variables In comparisons like the one studying the effects of relaxation therapy, the initial equivalence of the groups can be uncertain, however. What if the groups had differences in anxiety before the therapy was introduced? The employment circumstances of each group might differ, and perhaps those threat- ened with unemployment are more anxious than the others. What if age-related differences exist between groups? These other influences that are not controlled in an experiment are sometimes called confounding variables.

A psychologist who wants to examine the impact that a substance abuse program has on addicts’ behavior might set up a study as follows. Two groups of the same number of addicts are selected,

Greg Smith/Corbis

In a study of the impact of substance abuse programs on addicts’ behavior, confounding variables could include ethnic background, age, or social class.

Try It!: #1 If the size of the group affects the size of the standard deviation, what then is the relationship between sample size and error in a t test?

tan82773_07_ch07_191-226.indd 193 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.2 Dependent-Groups Designs

and one group participates in the substance-abuse program. After the program, the psycholo- gist measures the level of substance abuse in both groups to observe any differences.

The problem is that the presence or absence of the program is not the only thing that might prompt subjects to respond differently. Perhaps subjects’ background experiences are differ- ent. Perhaps ethnic-group, age, or social-class differences play a role. If any of those differ- ences affect substance-abuse behavior, the researcher can potentially confuse the influence of those factors with the impact of the substance-abuse program (the IV). If those other dif- ferences are not controlled and affect the dependent variable, they contribute to error vari- ance. Error variance exists any time dependent-variable (DV) scores fluctuate for reasons unrelated to the IV.

Thus, the variability within groups reflects error variance, and any difference between groups that is not related to the IV represents error variance. A statistically significant result requires that the score variance from the independent variable be substantially greater than the error variance. The factor(s) the researcher controls must contribute more to score values than the factors that remain uncontrolled.

7.2 Dependent-Groups Designs Ideally, any before-the-treatment differences between the groups in a study will be minimal. Recall that random selection entails every member of a population having an equal chance of being selected. The logic behind random selection dictates that when groups are randomly drawn from the same population, they will differ only by chance; as sample size increases,

probabilities suggest that they become increasingly similar in characteristic to the population. No sample, however, can represent the population with complete fidelity, and sometimes the chance differences affect the way subjects respond to the IV.

One way researchers reduce error variance is to adopt what are called dependent-groups designs. The inde- pendent t test and the one-way ANOVA required inde-

pendent groups. Members of one group could not also be members of other groups in the same study. But in the case of the t test, if the same group is measured, exposed to a treatment, and then measured again, the study controls an important source of error variance. Using the same group twice makes the initial equivalence of the two groups no longer a concern. Other aspects being equal, any score difference between the first and second measure should indi- cate only the impact of the independent variable.

The Dependent-Samples t Tests One dependent-groups test where the same group is measured twice is called the before/after t test. An alternative is called the matched-pairs t test, where each participant in the first group is matched to someone in the second group who has a similar characteristic. The before/after t test and the matched-pairs t test both have the same objective—to control the error variance that is due to initial between-groups differences. Following are examples of each test.

Try It!: #2 How does the use of random selection enable us to control error variance in sta- tistical testing?

tan82773_07_ch07_191-226.indd 194 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.2 Dependent-Groups Designs

• The before/after design: A researcher is interested in the impact that positive reinforcement has on employees’ sales productivity. Besides the sales commission, the researcher introduces a rewards program that can result in increased vaca- tion time. The researcher gauges sales productivity for a month, introduces the rewards program, and gauges sales productivity during the second month for the same people.

• The matched-pairs design: A school counselor is interested in the impact that verbal reinforcement has on students’ reading achievement. To eliminate between-groups differences, the researcher selects 30 people for the treatment group and matches each person in the treatment group to someone in a control group who has a similar reading score on a standardized test. The researcher then introduces the verbal reinforcement program to those in the treatment group for a specified period of time and then compares the performance of students in the two groups.

Although the two tests are set up differently, both cal- culate the t statistic the same way. The differences between the two approaches are conceptual, not math- ematical. They have the same purpose—to control between-groups score variation stemming from non- relevant factors.

Calculating t in a Dependent-Groups Design The dependent-groups t may be calculated using several methods. Each method takes into account the relationship between the two sets of scores. One approach is to calculate the correlation between the two sets of scores and then to use the strength of the correlation as a mechanism for determining between-groups error variance: the higher the correlation between the two sets of scores, the lower the error variance. Because this text has yet to dis- cuss correlation, for now we will use a t statistic that employs “difference scores.” The differ- ent approaches yield the same answer.

The distribution of difference scores came up in Chapter 5 when it introduced the indepen- dent t test. Recall that the point of that distribution is to determine the point at which the difference between a pair of sample means (M1 2 M2) is so great that the most probable explanation is that the samples came from different populations.

Dependent-groups tests use that same distribution, but rather than the difference between the means of the two groups (M1 2 M2), the numerator in the t ratio is the mean of the dif- ferences between each pair of scores. If that mean is sufficiently different from the mean of the population of difference scores (which, recall, is 0), the t value is statistically sig- nificant; the first set of measures belongs to a different population than the second set of measures. That may seem odd since in a before/after test, both sets of measures come from the same subjects, but the explanation is that those subjects’ responses (the DV) were altered by the impact of the independent variable; their responses are now different.

The denominator in the t ratio is another standard error of the mean value, but in this case, it is the standard error of the mean of the difference scores. The researcher checks for signifi- cance using the same criteria as for the independent t:

Try It!: #3 How do the before/after t test and the matched-pairs t test differ?

tan82773_07_ch07_191-226.indd 195 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.2 Dependent-Groups Designs

• A critical value from the t table, determined by degrees of freedom, defines the point at which the calculated t value is statistically significant.

• The degrees of freedom are the number of pairs of scores minus 1 (n 2 1).

The dependent-groups t test statistic uses this formula:

Formula 7.1

t 5 Md

SEMd

where

Md 5 the mean of the difference scores

SEMd 5 the standard error of the mean for the difference scores

The steps for completing the test are as follows:

1. From the two scores for each subject, subtract the second from the first to determine the difference score, d, for each pair.

2. Determine the mean of the d scores:

Md 5 Sd

number of pairs

3. Calculate the standard deviation of the d values, sd. 4. Calculate the standard error of the mean for the difference scores, SEMd, by dividing

sd by the square root of the number of pairs of scores,

SEMd 5 sd

Î number of pairs 5. Divide Md by SEMd, the standard error of the mean for the difference scores:

t 5 Md

SEMd

Figure 7.1 depicts these steps.

The following is an example of a dependent-measures t test: A psychologist is investigating the impact that verbal reinforcement has on the number of questions university students ask in a seminar. Ten upper-level students participate in two seminars where a presentation is followed by students’ questions. In the first seminar, the instructor provides no feedback after a student asks the presenter a question. In the second seminar, the instructor offers feedback—such as “That’s an excellent question” or “Very interesting question” or “Yes, that had occurred to me as well”—after each question.

Is there a significant difference between the number of questions students ask in the first seminar compared to the number of questions students ask in the second seminar? Problem 7.1 shows the number of questions asked by each student in both seminars and the solution to the problem.

tan82773_07_ch07_191-226.indd 196 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Subtract the second score from the first for each pair

to determine d

Determine the mean of the d score; Md

Determine Sd by taking the standard deviation

of the d scores

Divide Sd by the square root of the number of

pairs to determine SEMd

Divide Md by SEMd to determine t

Section 7.2 Dependent-Groups Designs

Problem 7.1: Calculating the before/after t test

Seminar 1 Seminar 2 d

1 1 3 22 2 0 2 22 3 3 4 21 4 0 0 0 5 2 3 21 6 1 1 0 7 3 5 22 8 2 4 22 9 1 3 22 10 2 1 1

Sd 5 211

(continued)

Figure 7.1: Steps for calculating the before/after t test

Subtract the second score from the first for each pair

to determine d

Determine the mean of the d score; Md

Determine Sd by taking the standard deviation

of the d scores

Divide Sd by the square root of the number of

pairs to determine SEMd

Divide Md by SEMd to determine t

tan82773_07_ch07_191-226.indd 197 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.2 Dependent-Groups Designs

The calculated value of t exceeds the critical value from Table 5.1 (Table B.2 in Appendix B). Therefore, the result is statistically significant. Note that we are interested in the absolute value of the calculated t. Because the question was whether there is a significant difference in the number of questions, it is a two-tailed test. It does not matter which session had the greater number—whether Session 1 is larger than Session 2 or the other way around. The students in the second session, where questions were followed by feedback, asked signifi- cantly more questions than the students in the first session, when no feedback was offered by the instructor.

Degrees of Freedom, the Dependent-Groups Test, and Power When Md 5 21.1, the two sets of scores show comparatively little difference. What makes such a small mean difference statistically significant? The answer is in the amount of error variance in this problem. When there is minimal error variance—for example, the standard error of the difference scores is just 0.348—comparatively small mean differences can be

Problem 7.1: Calculating the before/after t test (continued)

1. Determine the difference between each pair of scores, d, using subtraction. 2. Determine the mean of the difference, the d values (Md).

Md 5 Sd 10

5 11 10

5 21.1

3. Calculate the standard deviation of the d values (Sd). Verify that

Sd 5 1.101.

4. Just as the standard error of the mean in the earlier test was s√n, determine standard error of the mean for the difference scores (SEMd) by dividing the result of step 3 by the square root of the number of pairs. Verify that

SEMd 5 sd

Î np 5

1.101

Î 10 5 0.348

5. Divide Md by SEMd to determine t.

t 5 Md

SEMd 5 2

1.1 0.348

5 23.161

6. As noted earlier, the degrees of freedom for the critical value of t for this test are the number of pairs of scores, np 2 1.

t0.05(9) 5 2.262

tan82773_07_ch07_191-226.indd 198 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.2 Dependent-Groups Designs

statistically significant. The ability to detect such small differences, which are nevertheless statistically significant, is the rationale for using dependent-groups tests, which brings us back to power in statistical testing, a topic first raised in Chapter 6.

Table B.2 in Appendix B, the critical values of t, indicates that critical values decline as degrees of freedom increase. That occurs not only in the critical values for t, but also for F in analysis of variance and, in fact, for most tables of critical values for statistical tests.

• For the dependent-groups t test, the degrees of freedom are the number of pairs of related scores, 21.

• For the independent-groups t test (Chapter 5),

df 5 n1 1 n2 22

With the smaller numerical value for df, the dependent-groups test has the higher standard to meet for statistical significance, even though the number of raw scores is the same. But even a test with a larger critical value can produce significant results when it has less error variance. This is what dependent-groups tests do. The central point is that when each pair of scores comes from the same participant, or from a matched pair of participants, the random vari- ability from nonequivalent groups is minimal because scores tend to vary similarly for each pair, resulting in relatively little error variance. The reduced error more than compensates for the fewer degrees of freedom and the associated larger critical value.

Recall that in statistical testing, power is defined as the likelihood of detecting a significant difference when it is present. The more powerful statistical test is the one that will most readily detect a significant difference. As long as the sets of scores are closely related, the dependent-measures, or dependent-groups, test is more powerful than the independent-groups equivalent.

A Matched-Pairs Example The other form of the dependent-groups t test is the matched-pairs design. In this approach, rather than measure the same people repeatedly, each participant in one group is paired with a participant who is similar from the other group.

For example, consider a psychologist who wants to determine whether a video on domestic violence will prompt viewers to be less tolerant of domestic violence. The psychologist selects a group of subjects, introduces them to the video which they view, and measures their atti- tudes toward domestic violence. A second group does not view the video. Reasoning that age and gender might be relevant to attitudes about domestic violence, the psychologist selects people for the second group who match these characteristics of those in the first group. Problem 7.2 shows subjects’ scores from an instrument designed to measure attitudes about domestic violence and the matched-pairs t solution.

Try It!: #4 What does it mean to say that the within- subjects test has more power than the independent t test?

tan82773_07_ch07_191-226.indd 199 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.2 Dependent-Groups Designs

The absolute value of t is less than the critical value from Table 5.1 (or Table B.2 in Appendix B) for df 5 9. The difference is not statistically significant. There are probably several ways to explain the outcome, but we will explore just three.

1. The most obvious explanation is that the video was ineffective. Subjects’ attitudes were not significantly altered as a result of the viewing.

2. Another explanation has to do with the matching. Perhaps age and gender are not related to individuals’ attitudes. Prior experience with domestic violence may be the most important characteristic, a factor left uncontrolled in the pairing.

3. Another explanation is related to sample size. Small samples tend to be more variable than larger samples, and variability is what the denominator in the t ratio reflects. Perhaps if this had been a larger sample, the SEMd would have had a smaller value and the t would have been significant.

The second explanation points out the disadvantage of matched-pairs designs compared to repeated-measures designs. The individual conducting the study must be in a position to know which characteristics of the participants are most relevant to explaining the depen- dent variable so that they can be matched in both groups. Otherwise it is impossible to know whether a nonsignificant outcome reflects an inadequate match, control of the wrong vari- ables, or a treatment that just does not affect the DV.

Problem 7.2: Calculating a matched-pairs t test

Subject Viewed Did not view d

1 1.5 3 21.5 2 4 0 4 3 3 2 1 4 0 0 0 5 2 0 2 6 4.5 4 0.5 7 6 2 4 8 0 1 21.0 9 5.25 2 3.25 10 2 3 21.0

Verify that Md 5 1.125

Sd 5 2.092

SEMd 5 sd

Î np 5

2.092

Î 10 5 0.662

t 5 Md

SEMd 5 2

1.125 0.662

5 1.700

t0.05(9) 5 2.262

tan82773_07_ch07_191-226.indd 200 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.2 Dependent-Groups Designs

Comparing the Dependent-Samples t Test to the Independent t Test To compare the dependent-samples t test and the independent t more directly, we will apply both tests to the same data to illustrate how each test deals with error variance. Before beginning, a necessary caution: Once data are collected, there is no situation where someone can choose which test to use. Either the groups are independent, or they are not. Our comparison is purely an academic exercise.

A university program encourages students to take a service-learning class that emphasizes the importance of community service as a part of the students’ educational experience. Data are gathered on the number of hours former students spend in community service per month after they complete the course and graduate from the university.

• For the independent t test, the students are divided between those who took a service-learning class and graduates of the same year who did not.

• For the dependent-groups t test, those who took the service-learning class are matched to a student with the same major, age, and gender who did not take the class.

The data and the solutions to both tests are listed in Problem 7.3.

Problem 7.3: The before/after t versus the independent t test

Student Class No class d

1 4.000 3.000 1.00 2 3.000 2.000 10 3 3.000 2.000 1.00 4 2.000 2.000 00 5 3.000 2.5.00 0.5.00 6 4.000 3.000 10 7 1.000 2.000 21.00 8 5.000 4.000 10 9 6.000 5.000 1.00 10 4.000 3.000 10 M 3.500 2.850 0.650 s 1.434 1.001 0.669

SEM 0.453 0.316 0.211

For an independent t test, the results show:

SEd 5 Î (SEM12 1 SEM2

2) 5 Î 0.4532 1 0.3162 5 0.553

t 5 M1 2 M2

SEd 5

3.50 2 2.850 0.553

5 1.175; t0.05(18) 5 2.101. The result is not significant.

For a matched-pairs t test, the results show:

t 5 Md

SEMd 5 0.650 1 0.211 5 3.081; 5 2.262. The result is significant.

tan82773_07_ch07_191-226.indd 201 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.2 Dependent-Groups Designs

Because the differences between the scores are quite consistent, as they tend to be when participants are matched effectively, very little variation exists between the individuals in each pair. Minimal variation results in a comparatively small standard deviation of difference scores and a small standard error of the mean for the difference scores. The small standard deviation and standard error of the mean make it more likely that t ratios with even rela- tively small numerators will be statistically significant. Since the independent t test does not assume that the two groups are related, error variance is based on the differences within the groups of raw scores, rather than between the individuals in each pair, and the denominator is large enough that in that test, the t value is not significant.

Computing the Dependent-Groups t Test Using Excel To use Excel to complete Problem 7.3 as a dependent-groups test, follow this procedure:

1. Create the data file in Excel. 2. a. Label Column A “Class” to indicate those who had the service learning class, and

label column B “No Class.” b. Enter the data, beginning with cell A2 for the first group and cell B2 for the

second group. 3. Click the Data tab at the top of the page. 4. At the extreme right, choose Data Analysis. 5. In the Analysis Tools window, select ttest: Paired Two Sample for Means and

click OK. 6. In the blanks for Variable 1 Range and Variable 2 Range, enter A2:A11 for

the data in the first (Class) group (cells A2 to A11), and enter B2:B11 for the No Class data (cells B2 to B11).

7. Indicate that the hypothesized mean difference is 0. This reflects the value for the mean of the distribution of difference scores.

8. Indicate A13 for the output range so that the results do not overlay the data scores. 9. Click OK.

Widen column A so that all the output is readable. Figure 7.2 shows the resulting screenshot.

In the Excel solution, t 5 3.074 rather than the 3.081 from the manually calculated solution. Excel calculates the correlation between scores to find a solution, rather than determining the difference between scores as we did. In any event, the very minor difference, 0.007, between the solution shown in Problem 7.3 and the Excel solution in Figure 7.2 is not relevant to the outcome. The Excel output also indicates results for one-tailed and two-tailed tests. At p 5 0.05, the outcome is statistically significant in either case.

tan82773_07_ch07_191-226.indd 202 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.2 Dependent-Groups Designs

Figure 7.2: Excel output for the dependent-samples t test using data from

Problem 7.3

Source: Microsoft Excel. Used with permission from Microsoft.

Comparing the Two Dependent t Tests The before/after and matched-pairs approaches to calculating a dependent-groups t test have their individual advantages. The before/after design provides the greatest control over the extraneous variables that can confound the results in a matched-pairs design. The matching approach always has the chance that subjects in Group 2 are not matched closely enough on some relevant variable to minimize the error variance. In the service-learning example, students were matched according to age, major, and gender. But if marital status affects students’ willingness to be involved in community service and that variable is not controlled, an imbalance of married/not-married students could confound results. The before/after procedure involves the same subjects, and unless their status on some impor- tant variable changes between measures (a rash of marriages between the first and second measurement, for example), that approach will better control error variance.

Note that the matched-pairs approach relies on a large sample from which to draw to select participants who match those in the first group. As the number of variables on which partici- pants must be matched increases, so must the size of the sample from which to draw to find participants with the correct combination of characteristics.

tan82773_07_ch07_191-226.indd 203 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.2 Dependent-Groups Designs

Apply It! Repeated Measures

A research team is investigating the impact of fixed-ratio reinforcement on laboratory rats. Initially, the rats receive food reinforcers each time they make a correct turn in a maze. The control rats receive no reinforcement. The depen- dent variable is the amount of time in seconds it takes each rat to complete the maze. Table 7.1 shows the results of the investigation.

Table 7.1: Impact of fixed-ratio reinforcement on laboratory rats

Rat

Time(s)

With reinforcement Without reinforcement

A 112 120

B 85 82

C 103 116

D 154 168

E 65 75

F 52 51

G 85 96

H 72 79

I 167 178

J 123 141

K 142 153

Table 7.2 shows the Excel solution to the t test.

mbot/iStock/Thinkstock

tan82773_07_ch07_191-226.indd 204 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.2 Dependent-Groups Designs

The advantage of the matched-pairs design, on the other hand, is that it takes less time to execute. The treatment group and the control group can both be involved in the study at the same time. By way of a summary, note the comparisons among t tests in Table 7.3.

Table 7.3: Comparing the t tests

Independent t Before/after Matched-pairs

Groups Independent groups One group measured twice

Two groups: each subject from the first group matched to one in the second

Denominator/ error term

Within-groups and between-groups variability

Within-groups variability only

Within-groups variability only

Table 7.2: Summary statistics from the Excel t test

Variable 1 Variable 2

Mean 105.45 114.45

Variance 1428.67 1736.27

Observations 10 10

Pearson Correlation 0.99

Hypothesized Mean Difference 0.00

df 9

t Stat 24.817

P(T�t) one-tail 0.0003

t Critical one-tail 1.8331

P(T�t) two-tail 0.0007

t Critical two-tail 2.2622

The magnitude of the calculated value of t 5 24.817 exceeds the critical two-tail value from the table of tcrit 5 2.26. The result indicates that providing reinforcement for correct decisions has a statistically significant effect on the time it takes a rat to complete the maze.

Apply It! boxes written by Shawn Murphy

tan82773_07_ch07_191-226.indd 205 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.3 The Within-Subjects F

7.3 The Within-Subjects F Sometimes two measures of the same group are not enough to track changes in the depen- dent variable. Maybe the researchers conducting the service-learning study want to compare how much time students devoted to community service the year they graduated, one year later, and then two years after graduation. The within-subjects F is a dependent-groups procedure for two or more groups of scores when the DV is interval or ratio scale. Because the dependent-groups t test is the repeated-measures equivalent of the independent t test, the within-subjects F is the repeated-measures or matched-pairs equivalent of the one-way ANOVA. The same Ronald Fisher who developed analysis of variance also developed this test, which is a form of ANOVA, and the test statistic is still F.

Here too, the dependent groups can be formed either by repeatedly measuring the same group or by matching separate groups of participants on the relevant variables. When more than two groups are involved, matching becomes increasingly problematic, however. Although it is theoretically possible to match the participants across any number of groups, to match more than one or two relevant variables across more than two or three groups of subjects is a highly complex undertaking. Imagine the difficulty, for example, of matching subjects on some measure of aptitude, their income, and their level of optimism in three or more differ- ent groups. Even matching these variables for two groups might prove quite difficult. For this reason, repeatedly measuring the same participants is much more common than matching across several groups.

Managing Error Variance in the Within-Subjects F Recall from Chapter 6 that when Fisher developed ANOVA, he shifted away from calculating score variability with the standard deviation, standard error of the mean, and so on and used sums of squares instead. The particular sums of squares computed are the key to the strength of this procedure.

If a researcher measures a group of participants in a study on a dependent variable at three different intervals and records their scores in parallel columns, the result is a data sheet simi- lar to Table 7.4.

• The column scores for the first, second, and third measures are treated the way scores from three different groups were treated in a one-way ANOVA; the differences from column to column reflect the effect of the IV, the treatment.

• The participant-to-participant differences, which are like the within-group differ- ences in a one-way ANOVA, are reflected in the differences in the scores from row to row. Those differences are error variance, just as they were in the one-way ANOVA.

Table 7.4: A data sheet

1st measure 2nd measure 3rd measure

Participant 1 . . .

Participant 2 . . .

tan82773_07_ch07_191-226.indd 206 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.3 The Within-Subjects F

• The within-subjects F calculates the variability between rows (the within- groups variance), and then, because that variance comes from participant-to- participant differences that will be the same in each group, eliminates it from further analysis.

• The only error variance that remains is that which does not stem from initial person-to-person differences. It will be from such sources as inaccurate measures of the DV, mistakes in coding the DV, or differences in how sensitive the subjects are to the DV that change from treatment to treatment.

In the dependent-samples t test, the within-subjects variance—error variance—is reduced by using subjects in two groups that are highly similar to begin with or because they are the same people measured before and after a treatment. In either case, initial between-groups differences, an important source of variance, are minimized, and attributing differences to the effect of the independent variable becomes easier.

In the within-subjects F, the variability within groups is calculated and then simply discarded so that it is no longer a part of the analysis. That cannot be done in the one-way ANOVA because the amount of variability within groups is different for each group, and there is no way to separate it from the balance of the error variance in the problem.

A Within-Subjects F Example A psychologist is studying practice effect in connection with the ability of 12-year-olds to solve a series of puzzles involving logic and reasoning. The study has five subjects, who solve as many puzzles as they can during a 30-minute period. The psychologist conducts three trials an hour apart. Although the puzzles are similar, each trial involves different puzzles. The researcher wants to answer the question whether greater familiarity with the puzzles is associated with solving more puzzles correctly. Table 7.5 shows the study’s results.

Table 7.5 Data from puzzle-solving study

Number of puzzles solved

1st trial 2nd trial 3rd trial

Diego 2 5 4

Harold 4 7 7

Wilma 3 6 5

Carol 4 5 6

Moua 5 8 9

tan82773_07_ch07_191-226.indd 207 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.3 The Within-Subjects F

The independent variable (the IV, the treatment) is the particular trial. The dependent vari- able (the DV) is the number of puzzles successfully solved. The research question is whether the second or third trials will result in significantly more puzzles solved than in the first trial. In Chapter 6, the sum of squares between (SSbet) measured the variability related to the IV. This study gauges the same source of variance, except that it is called the sum of squares between columns (SScol).

The Components of the Within-Subjects F Calculating the within-subjects F begins just as the one-way ANOVA begins, by determining all variability from all sources with the sum of squares total (SStot). It is calculated the same way as it was in Chapter 6:

1. The formula for the sum of squares total is

SStot =∑(x 2 MG)2

a. Subtract each score (x) from the mean of all the scores from all the groups (MG), b. square the difference, and then c. sum the squared differences.

The balance of the problem is completed with the following steps:

2. The equation for the sum of squares between columns (SScol) is much like SSbet in the one-way ANOVA. The scores in each column are treated the same way the different groups were treated in the one-way ANOVA. For columns 1, 2, and through k:

Formula 7.2

SScol 5 (Mcol 1 2 MG)2ncol 1 1 (Mcol 2 2 MG)2ncol 2 1 . . . 1 (Mcol k 2 MG)2ncol k

a. calculate the mean for each column of scores (Mcol), b. subtract the mean for all the data (MG) from each column mean, c. square the result, and d. multiply the squared result by the number of scores in the column (ncol).

3. The sum of squares between rows is also like the SSbet from the one-way problem except that it treats the scores for each row as a separate group. For rows 1, 2, and through i:

Formula 7.3

SSrows 5 (Mrow 1 2 MG)2nrow 1 1 (Mrow 2 2 MG)2nrow 2 1 . . . 1 (Mrow i 2 MG)2nrow i

a. calculate the mean for each row of scores (Mrow), b. subtract the mean for all the data (MG) from each row mean, c. square the result, and d. multiply the squared result by the number of scores in the row.

tan82773_07_ch07_191-226.indd 208 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.3 The Within-Subjects F

4. The residual sum of squares is the error term in the within-subjects F. It is the equivalent of SSwith or the SSerr in the one-way ANOVA. With the within-subjects F, the person-to-person differences within each measure are calculated and eliminated since they are the same for each set of measures. Unexplained variance is what remains after the treatment effect (the effect of the IV) and the person-to-person differences within in each group are eliminated:

Formula 7.4

SSresid 5 SStot 2 SScol 2 SSrows

a. If from all variance from all sources (SStot), b. the treatment effect (SScol) is subtracted c. and the person-to-person differences (SSrows) are subtracted, d. what remains is unexplained variance, error.

Completing the Within-Subjects F Calculations Just as with one-way problems, the mean square values are calculated by dividing the sums of squares by their degrees of freedom. The degrees of freedom values are as follows:

• df total 5 N 2 1 • df columns 5 number of columns 2 1 • df rows 5 number of rows 2 1 • df residual 5 df columns 3 df rows

Although we listed the degrees of freedom values for total and rows, as well as for col- umns and residuals, there are no MS values for total and rows. The df values for those two variance measures are listed because the sum of all df values must equal df for total; they allow for a quick check of df values. The next step is to complete the ANOVA table, including the calculation of F. We can determine the test statistic, F, in the within-subjects ANOVA by dividing the treatment effect (MScol) by the error term (MSresid); F 5 MScol / MSresid

Problem 7.4 shows the calculations and the table for the impact of the practice-effects study.

As with one-way ANOVA, the first step is to calculate the SStot. It is the sum of the squared differences between each individual score (x) and the grand mean (MG). The SStot is followed by the SS for the differences between columns (SScol). It is the sum of the squared differences between each column mean (Mcol1, for example) and the grand mean (MG), times the number of scores in the column (ncol1, for example). Next, calculate the SS for the differences from row to row. For each row, square the difference between the row mean (Mr1, for example) and the grand mean (MG), and then multiply the squared difference by the number of scores in the row (nr1, for example). Finally, find the error term—the residual sum of squares—which is what remains from SStot 2 SScol 2 SSrows.

tan82773_07_ch07_191-226.indd 209 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.3 The Within-Subjects F

The calculated value of F exceeds the critical value of F from the table. The number of puzzles completed is significantly different for the different trials. The significant F indicates that dif- ferences of this magnitude are unlikely to have occurred by chance.

Completing the Post Hoc Test Ordinarily, the calculation of F leaves unanswered the question of which set of measures is significantly different from which. However, in this particular problem there is only one

Problem 7.4: A within-subjects F example

Puzzles completed

1st trial 2nd trial 3rd trial Row means

Diego 2 5 4 3.667 Harold 4 7 7 6.0 Wilma 3 6 5 4.667 Carol 4 5 6 5.0 Moua 5 8 9 7.333 Column means 3.60 6.20 6.20 Grand mean (Md) 5.333

1. SStot 5 ∑(x 2 MG)2

(2 2 5.333)2 1 (4 2 5.333)2 1 . . . 1 (9 2 5.333)2 5 49.333

2. SScol 5 (Mcol 1 – MG)2ncol 1 1 (Mcol 2 – MG)2ncol 2 1 . . . 1 (Mcol k – MG)2ncol k

(3.6 2 5.333)25 1 (6.2 2 5.333)25 1 (6.2 2 5.333)25 5 22.533

3. SSrows 5 (Mr1 – MG)2nr1 1 (Mr2 – MG)2nr2 1 . . . 1 (Mri – MG)2nri

(3.667 2 5.333)23 1 (6.0 2 5.333)23 1 (4.667 2 5.333)23 1 (5.0 2 5.333)23 1 (7.333 2 5.333)23 5 23.333

4. The residual sum of squares.

SSresid 5 SStot 2 SScol 2 SSrows 5 49.333 2 22.533 2 23.333 5 3.467

The ANOVA table

Source SS df MS F Fcrit

Total 49.333 14 Columns 22.533 2 11.267 26.0 4.46 Rows 23.333 4 Residual 3.467 8 0.433

tan82773_07_ch07_191-226.indd 210 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.3 The Within-Subjects F

possibility. Because both the second trial and the third trial measures have the same mean (M 5 6.20), they must both be significantly different from the only other group of measures in the problem, the first trial mea- sures, for which M 5 3.6. As a demonstration of how we would determine which groups were significantly different from which were it otherwise, honestly sig- nificant difference (HSD) is completed anyway.

The HSD procedure is the same as for the one-way test, except that the error term is now MSresid. Substituting MSresid for MSwith in the formula provides

HSD 5 x Ñ MSresid

n

where x is a value from Table B.4 in Appendix B. It is based on the number of means, which is the same as the number of groups of measures, 3 in the example, and the df for MSresid, which is 8. n 5 the number of scores in any one measure, 5 in this instance.

For the number-of-puzzles-solved correctly study,

4.04 Ñ 0.433

5 5 1.19

A difference of 0.306 or greater between any pair of means is statistically significant.

Using the same approach used in Chapter 6, the matrix in Table 7.6 indicates how the difference between each pair of means helps us determine which differences are statistically significant.

Table 7.6: Matrix of differences of means

1st trial (3.6) 2nd trial (6.2) 3rd trial (6.2)

1st trial (3.6) diff 5 0 diff 5 2.6* diff 5 2.6*

2nd trial (6.2) diff 5 0.00

3rd trial (6.2)

*Indicates a significant difference

The first trial measures are significantly different from the second and third measures. Because the mean values for the second and third trial measures are the same, neither of those two is significantly different from the other. For these 12-year-old subjects working with this kind of logic/reasoning puzzle, practice effect is greatest from first to subsequent trials.

Calculating the Effect Size The final question for a significant F is the question of the practical importance of the result. Using eta-squared as the measure of effect size produces the following:

η2 5 SScol SStot

Try It!: #5 How is the error term in the within- subjects F different from that in the one- way ANOVA?

tan82773_07_ch07_191-226.indd 211 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.3 The Within-Subjects F

with SScol taking the place of SSbet in the one-way ANOVA.

For the problem just completed, SScol 5 22.533 and SStot 5 49.333, so

η2 5 22.533 49.333 5 0.457

The eta-squared value indicates that approximately 46% of the variance in the number of puzzles solved successfully by these subjects can be explained by whether it was the first or some subsequent trial.

Apply It! The Meditation Pilot Program Revisited

Recall Chapter 5’s example of the middle school that adopted a meditation program in an effort to relieve stress among students, increase their test scores, and improve student behavior. In the earlier chapter, we used a one-sample t test to determine that a statistically significant increase in GPAs occurred among participating students. Now, we will use a within-subject F test to see if their stress levels have decreased over successive intervals.

Ten randomly chosen students selected for the program filled out questionnaires about their stress levels. Scores ranged from 1 to 10, with 10 indicating the most stress. The survey was given before the start of the program and at three-month intervals. The time elapsed repre- sents the independent variable, the treatment effect that drives this analysis. The dependent variable is the stress score. This example includes four groups of DV scores.

Results of the stress questionnaires appear in Table 7.7.

Table 7.7: Stress over time for 10 students

Student

Time (months)

0 3 6 9

1 7 6 6 6

2 9 6 5 5

3 7 5 5 4

4 5 3 3 2

5 7 6 4 4

6 8 5 7 5

7 5 4 4 3

8 7 5 6 5

9 6 6 4 4

10 7 5 5 5

Table 7.8 shows results of the within-subject F test calculations.

tan82773_07_ch07_191-226.indd 212 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Table 7.8: Within-subject F test calculations for changes in stress over time

Source SS df MS F

Total 82.000 39 Columns 34.475 3 11.492 26.36 Subjects 35.725 9 Residual 11.775 27 0.436 f.05(3,27) 2.96

The F value of 26.36 is greater than the critical F value of 2.96, results that are unlikely to have occurred by chance. It seems clear that the length of time during which students practice meditation has a significant effect on stress levels.

The significant value of F indicates the need for a post hoc test to determine which group(s) of stress measures are significantly different from which others. Recall that the HSD formula is as follows:

HSD 5 x Ñ MSresid

n

Entering the MSresid value from the ANOVA table and relevant value of x from the Tukey’s table gives us

HSD 5 3.875 Ñ 0.436

10 5 0.81

A difference of 0.81 or greater between any two means indicates that the difference between those intervals is statistically significant. A matrix that shows the difference between each pair of means makes interpreting the HSD value easier, as in Table 7.9.

Table 7.9: Detecting significant differences among multiple groups

0 month (6.8) 3 months (5.1) 6 months (4.9) 9 months (4.3)

0 month (6.8) diff 5 1.7* diff 5 1.9* diff 5 2.5* 3 months (5.1) diff 5 0.2 diff 5 0.8 6 months (4.9) diff 5 0.6 9 months (4.3)

*Indicates a significant difference

Comparing the means reveals that the greatest decrease in stress occurs during the first three months of the meditation program, a difference between the means of 1.7. It is also appar- ent that the stress scores for any interval are significantly different from the stress recorded before the experiment began.

To determine the practical importance of the decline in stress measures requires an effect-size calculation. Once again, we will use eta squared. For the problem just completed, Icol 5 34.475, and SStot 5 82.000. Therefore,

η2 5 34.475 82.000 5 0.42.

About 42% of the variance in stress can be explained by how long the student has been enrolled in the meditation program.

The within-subjects F test allowed analysis of students’ stress levels at multiple times through- out the year and showed that the program was reducing stress levels by significant amounts from the stress recorded among subjects before the program began.

Apply It! boxes written by Shawn Murphy

tan82773_07_ch07_191-226.indd 213 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.3 The Within-Subjects F

Comparing the Within-Subjects F and the One-Way ANOVA In the one-way ANOVA, within-group variance is different for each group because each group is made up of different participants. With no way to distinguish between the subject-to- subject variability within groups from other sources of error variance, the subject-to-subject variance cannot be calculated and eliminated from further analysis, as it can be in the within- subjects F. The smaller error term that is the result in the within-subjects test (which, remem- ber, is the divisor in the F ratio) allows relatively small differences between sets of measures to be statistically significant.

The effect of eliminating some sources of error is illustrated by using the same data in the study of practice effect on problem solving. If those same data were treated as the num- ber of problems solved by separate groups, rather than by the same group over time, the researcher analyzes using a one-way ANOVA instead of the within-subjects F. We caution that this approach is for illustration only because groups are either independent or dependent, and one set of data cannot fit both scenarios. We use it here to allow us to compare the error terms for each approach.

The SStot and the SSbet will be the same as the SStot and the SScol in the within-subjects problem.

SStot 5 49.333

SSbet 5 22.533

But with no way to isolate the participant-to-participant differences from the balance of the error variance in the one-way ANOVA, the SSwith amount in a one-way ANOVA ends up the same as SSrows 1 SSresid in the within-subjects F in Problem 7.4.

SSwith 5 ∑(xa 2 Ma)2 1 ∑(xb 2 Mb)2 1 ∑(xc 2 Mc)2 5 (2 2 3.60)2 1 (4 2 3.60)2 1 . . . 1 (9 2 6.20)2 5 26.80

From Table 7.10, we can make the following observations:

• The number of degrees of freedom for “within” changes from the 8 for residual to 12, which results in a smaller critical value for the independent-groups test, but that adjustment does not compensate for the additional error in the term.

Table 7.10: The within-subjects F example repeated as a one-way ANOVA

The ANOVA table

Source SS df MS F Fcrit

Total 49.333 14

Between 22.533 2 11.267 5.045 3.89

Within 26.800 12 2.233

tan82773_07_ch07_191-226.indd 214 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.3 The Within-Subjects F

• Note that the sum of squares for the error term jumps from 3.467 in the within- subjects test to 26.80 in the independent-groups test.

• The F value is reduced from 26.0 in the within problem to 5.046 in the one-way problem, a factor of about one-fifth.

Although calculating both one-way ANOVA and with-subjects F results for the same data is not realistic, the comparison illustrates what can be gained by setting up a dependent-groups test. That is an option that researchers do have at the planning level.

Another Within-Subjects F Example A psychologist working at a federal prison is interested in the relationship between the amount of time a prisoner is incarcerated and the number of violent acts in which the pris- oner is involved. Using self-reported data, inmates respond anonymously to a questionnaire administered one month, three months, six months, and nine months after incarceration. Problem 7.5 shows the data and the solution.

The results (F) indicate that there are significant differences in the number of violent acts documented for the inmate related to the length of time the inmate has been incarcerated. The HSD results indicate that those incarcerated for one month are involved in a signifi- cantly different number of violent acts than those who have been in for three or six months. Those who have been in for six months are involved in a significantly different number of violent acts than those who have been in for nine months. The eta squared value indi- cates that about 37% of the variance in number of vio- lent acts is a function of how long the inmate has been incarcerated.

Try It!: #6 How do the eta squared values compare for the one-way ANOVA/within-subjects F problem?

Problem 7.5: Another within-subjects F example: Violent acts and time of incarceration

Percentile improvement

Inmate 1 month 3 months 6 months 9 months Row means

1 4 3 2 5 3.50 2 5 4 3 4 4.0 3 3 1 1 2 1.750 4 4 2 1 3 2.50 5 2 1 2 3 2.0 Column means 3.60 2.20 1.80 3.40

MG 5 2.750 (continued)

tan82773_07_ch07_191-226.indd 215 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.3 The Within-Subjects F

Problem 7.5: Another within-subjects F example: Violent acts and time of incarceration (continued)

Verify that

1. SStot 5 ∑(x 2 MG)2 5 31.750

2. SScol 5 (Mcol 1 2 MG)2ncol 1 1 (Mcol 2 2 MG)2ncol 2 1 (Mcol 3 2 MG)2ncol 3 1 (Mcol 4 2 MG)2ncol 4

(3.6 2 2.75)25 1 (2.2 2 2.75)25 1 (1.8 2 2.75)25 1 (3.4 2 2.75)25 5 11.750

3. SSsubj 5 (Mr1 2 MG)2nr1 1 (Mr2 2 MG)2nr2 1 (Mr3 2 MG)2nr3 1 (Mr4 2 MG)2nr4 1 (Mr5 2 MG)2n5

(3.6 2 2.75)24 1 (4.0 2 2.75)24 1 (1.75 2 2.75)24 1 (2.5 2 2.75)24 1 (2.0 2 2.75)24 5 15.0

4. SSresid 5 SStot 2 SScol 2 SSsubj 5 31.75 2 11.75 2 15 5 5.0

The ANOVA table

Source SS df MS F

Total 31.75 19 Columns 11.75 3 3.917 9.393 Subjects 15.00 4 Residual 5.0 12 0.417

F0.05(3.12) 5 3.49. F is significant.

The post hoc test:

HSD 5 x0.05 Ña MSw

n b 5 4.20 Ña

0.417 5

b 5 1.213

M1 5 3.6 M2 5 2.2 M3 5 1.8 M4 5 3.4

M1 5 3.6 1.4* 1.8* 0.2

M2 5 2.2 0.4 1.2

M3 5 1.8 1.6*

M4 5 3.4

*The differences marked with an asterisk are significant.

n2 5 SScol SStot

5 11.75 31.75 5 0.370% of the variance in violence witnessed is related to how long the

inmate has been incarcerated.

tan82773_07_ch07_191-226.indd 216 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.3 The Within-Subjects F

Computing Within-Subjects F Using Excel In spite of the important increase in power that is available compared to independent-groups tests, a dependent-groups ANOVA is not one of the more common tests. Excel does not offer it as an option in the list of Data Analysis Tools, for example. However, like many statistical procedures the dependent-groups ANOVA involves a number of repetitive calculations, which Excel can simplify. We will complete the second problem as an example.

1. Set the data up in four columns just as they appear in Problem 7.5, but insert a blank column to the right of each column of data. With a row at the top for the labels, begin entering data in cell A2.

2. Calculate the row and column means as well as a grand mean as follows:

a. For the column means, place the cursor in cell A7 just beneath the last value in the first column and enter the formula =average(A2:A6), then press Enter.

b. To repeat this for the other columns, left click on the solution that is now in A7, drag the cursor across to G7, and release the mouse button. In the Home tab, click Fill and then Right. This will repeat the column-means calculations for the other columns. Delete the entries that populate cells B7, D7, and F7, which are still empty at this point.

c. For the row means, place the cursor in cell I2 and enter the formula =average(A2, C2, E2, G2) followed by Enter.

d. To repeat this for the other rows, left click on the solution that is now in I2, drag the cursor down to I6, and release the mouse button. In the Home tab, click Fill and then Down. This will repeat the calculation of means for the other rows.

e. For the grand mean, place the cursor in cell I7 and enter the formula =average(I2:I6) followed by Enter (the mean of the row means will be the same as the grand mean—the same could have been done with the column means).

3. To determine the SStot:

a. In cell B2, enter the formula =(A222.75)^2 and press Enter. This will square the difference between the value in A2 and the grand mean. To repeat this for the other data in the column, left-click the cursor in cell B2, and drag down to cell B6. Click Fill and Down. Place the cursor in cell B7, click the summation sign (∑) at the upper right of the screen, and press Enter. Repeat these steps for columns D, F, and H.

b. Place the cursor in H9, type SStot=, and click Enter. In cell I9, enter the formula =Sum(B7,D7,F7,H7) and press Enter. The value will be 31.75, which is the total sum of squares.

4. For the SScol:

a. In cell A8, enter the formula =(3.622.75)^2*5 and press Enter. This will square the difference between the column mean and the grand mean and multiply the result by the number of measures in the column, 5. In cells C8, E8, G8, repeat this for each of the other columns, substituting the mean for each column for the 3.60 that was the column 1 mean.

b. With the cursor in H10, type in SScol= and click Enter. In cell I10, enter the formula =Sum(A8,C8,E8,G8) and press Enter. The value will be 11.75, which is the sum of squares for the columns.

tan82773_07_ch07_191-226.indd 217 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.3 The Within-Subjects F

5. For the SSrows:

a. In cell J2, enter the formula =(I222.75)^2*4 and press Enter. Repeat this in rows I3–I6 by left-clicking on what is now I2 and dragging the cursor down to cell I6. Click Fill and Down.

b. With the cursor in H11, type SSrow= and click Enter. In cell I11, enter the formula =Sum (J2:J6) and press Enter. The value will be 15.0, which is the sum of squares for the participants.

6. For the SSresid, in cell H12, enter SSresid= and click Enter. In cell I12, enter the formula =I102I112I12. The resulting value will be 5.0.

We used Excel to determine all the sums-of-squares values. Now, the mean squares are determined by dividing the sums of squares for columns and for residual by their degrees of freedom:

MScol 5 11.75

3 5 3.917

MSresid 5 5

12 5 0.417

F 5 MScol

MSresid 5

3.917 0.417

5 9.393, which agrees with the earlier calculations done by hand.

To create the ANOVA table, enter the following data:

• Beginning in cell A10, type in Source; in B10 SS; df in C10; MS in D10; F in E10; and Fcrit in F10.

• Beginning in cell A11 and working down, type in total, columns, rows, residual.

For the sum-of-squares values:

• In cell B11, enter =I9. • In cell B12, enter =I10. • In cell B13, enter =I11. • In cell B14, enter =I12.

For the degrees of freedom:

• In cell C11, enter 19 for total degrees of freedom. • In cell C12, enter 3 for columns degrees of freedom. • In cell C13, enter 4 for rows degrees of freedom. • In cell C14, enter 12 for residual degrees of freedom.

For the mean squares:

• In cell D12, enter =B12/C12. The result is MScol. • In cell D14, enter =B14/C14. The result is MSresid.

For the F value in cell E12, enter =D12/D14.

In cell F12, enter the critical value of F for 3 and 12 degrees of freedom, which is 3.49.

tan82773_07_ch07_191-226.indd 218 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 7.3 The Within-Subjects F

The list of commands looks intimidating, but mostly because every keystroke has been included. With some practice, using Excel in this way will become second nature. Figure 7.3 shows a screenshot of the result of the calculations.

Writing Up Statistics Because of some of the strengths noted earlier, repeated-measures designs are a fixture in psychological research. Lambert-Lee et al. (2015) used a before/after t test to evaluate autistic children’s basic language progress during a 12-month period. They concluded that an applied behavior analysis approach to teaching basic-language skills to autistic children results in a statistically significant improvement in their language skills. One of the difficul- ties in a study such as this, however, is knowing whether factors other than the treatment— applied behavior analysis in this case—might have prompted the significant improvement.

Figure 7.3: Screenshot of a within-subjects F problem

Source: Microsoft Excel. Used with permission from Microsoft.

tan82773_07_ch07_191-226.indd 219 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Summary and Resources

There is always the possibility, particularly with younger subjects, that simply the passage of time explains the change.

Sometimes when using the within-subjects F, the dependent variable measure is the amount of difference between the various measures, called “change scores,” rather than the raw scores upon which the researcher ordinarily relies. One of the criticisms of repeated-measures designs is that change scores—the amount of improvement between measures—tend to be unreliable. In a measurement context, this unreliability means that the scores may not be repeatable; someone replicating the experiment with new subjects under similar conditions might find substantially different amounts of score improvement. Thomas and Zumbo (2012) examined this criticism of change scores using a within-subjects F (also called a repeated measures ANOVA) and found the criticism unwarranted.

Summary and Resources

Chapter Summary Any statistical procedure has advantages and disadvantages. The downside of the different independent-groups designs is that subjects within the individual groups often respond to the independent variable differently. Those differences are a source of error variance that is unique to each group. Even with random selection and fairly large groups, there will be dif- ferences in the way that people in the same group respond to whatever stimulus is offered. The before/after t and within-subjects F tests eliminate that source of error variance by either using the same people repeatedly or by matching subjects on the most important characteristics. Controlling error variance results in a test that is more likely to detect a sig- nificant difference (Objectives 1 and 5).

In dependent-groups designs, using the same group repeatedly allows for a smaller number of participants involved (Objectives 1, 2, 3, 4, and 6). One of the downsides to repeated- measures designs, however, is that they take more time to complete. Unless subjects are matched across measures, the different levels of the independent variable cannot be admin- istered concurrently as they can in independent-groups tests. More time increases the potential for attrition. If one of the participants drops out of a repeated-measures study, all the data measures of the dependent variable for that subject are lost (Objectives 2 and 4).

Another potential problem stems from the “practice effect.” In an experiment where a group is measured multiple times, each time with an increasing amount of the IV, early exposure may change the way subjects respond later. Dependent-groups also present the related problem of carry-over effects. Exposure to a level of the independent variable may alter the way the subject responds later to a different level of that same variable; exposure to a modest amount of positive reinforcement may affect the way the same individual responds to a substantial amount of posi- tive reinforcement later, an effect that is not a problem for studies involving independent groups.

Independent-groups and dependent-groups tests have important, underlying consistencies. Whether the test is independent t, before/after t, one-way ANOVA, or a within-subjects F, in each case the independent variable is nominal scale, and the dependent variable is interval or ratio scale (Objective 2). Furthermore, all of these test significant differences. In the for- mal language of statistics, they “test the hypothesis of difference.” Sometimes, however, the test questions the strength of the association rather than the difference. That discussion will introduce correlation, which is the focus of Chapter 8.

tan82773_07_ch07_191-226.indd 220 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Summary and Resources

before/after t test A dependent-groups application of the t test in which one group is measured before and after a treatment.

confounding variables Variables that influence an outcome but are uncontrolled in the analysis and obscure the effects of other variables. If a psychologist is inter- ested in gender-related differences in problem-solving ability but does not control for age differences, differences in gender may be confounded by differences that are actually age-related.

dependent-groups designs Statistical procedures in which the groups are related, either because multiple measures are taken of the same participants, or because each participant in a particular group is matched on characteristics relevant to the analysis to a participant in the other groups with the

same characteristics. Dependent-groups designs minimize error variance because they reduce score variation due to factors unrelated to the independent variable.

matched-pairs t test A dependent-groups application of the t test in which each par- ticipant in the second group is paired to a participant in the first group with the same characteristics, so as to limit the error vari- ance that would otherwise stem from using dissimilar groups.

within-subjects F The dependent-groups equivalent of the one-way ANOVA. In this procedure, either participants in each group are paired on the relevant characteristics with participants in the other groups, or one group is measured repeatedly after differ- ent levels of the independent variable are introduced.

Key Terms

Review Questions Answers to the odd-numbered questions are provided in Appendix A.

1. A group of clients is being treated for a compulsive behavior disorder. The number of times in an hour that each one manifests the compulsivity is gauged before and after a mild sedative is administered. The data are as follows:

Client Before After

1 5 4

2 6 4

3 4 3

4 9 5

5 5 6

6 7 3

7 4 2

8 5 5

a. What is the standard deviation of the difference scores?

tan82773_07_ch07_191-226.indd 221 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Summary and Resources

b. What is the standard error of the mean for the difference scores? c. What is the calculated value of t? d. Are the differences statistically significant?

2. A researcher is examining the impact that a political ad has on potential donors’ willingness to contribute. The data indicate the amount (in dollars) each is willing to donate before viewing that advertisement and after viewing the advertisement.

Potential donor Before After

1 0 10

2 20 20

3 10 0

4 25 50

5 0 0

6 50 75

7 10 20

8 0 20

9 50 60

10 25 35

a. Do the amounts represent significant differences? b. What is the value of t if this study is an independent t test? c. Explain the difference between before/after and independent t tests.

3. Participants attend three consecutive sessions in a business seminar. The first has no reinforcement when participants respond to the session moderator’s questions. In the second, those who respond are provided with verbal reinforcers. In the third session, responders receive pieces of candy as reinforcers. The dependent variable is the number of times the participants respond in each session.

Participant None Verbal Token

1 2 4 5

2 3 5 6

3 3 4 7

4 4 6 7

5 6 6 8

6 2 4 5

7 1 3 4

8 2 5 7

tan82773_07_ch07_191-226.indd 222 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Summary and Resources

a. Are the column-to-column differences significant? If so, which groups are signifi- cantly different from which?

b. Of what data scale is the dependent variable? c. Calculate and explain the effect size.

4. In the calculations for Question 3, what step is taken to minimize error variance?

a. What is the source of that error variance? b. If Question 3 had been a one-way ANOVA, what would have been the degrees of

freedom for the error term? c. How does the change in degrees of freedom for the error term in the within-

subjects F affect the value of the test statistic?

5. Because SScol in the within-subjects F contains the treatment effect and measure- ment error, if there is no treatment effect, what will be the value of F?

6. Why is matching uncommon in within-subjects F analyses?

7. A group of nursing students is approaching the licensing test. Their level of anxiety is measured at 8 weeks prior to the test, then 4 weeks, 2 weeks, and 1 week before the test. Assuming that anxiety is measured on an interval scale, are there significant differences?

Student 8 weeks 4 weeks 2 weeks 1 week

1 5 8 9 9

2 4 7 8 10

3 4 4 4 5

4 2 3 5 5

5 4 6 6 8

6 3 5 7 9

7 4 5 5 4

8 2 3 6 7

a. Is anxiety related to the time interval? b. Which groups are significantly different from which? c. How much of anxiety is a function of test proximity?

tan82773_07_ch07_191-226.indd 223 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Summary and Resources

8. A psychology department sponsors a study of the relationship between participation in a particular internship opportunity and students’ final grades. Eight students in their second year of graduate study are matched to eight students in the same year by grade. Those in the first group participate in the internship. The study compares students’ grades after the second year.

Student Internship No Internship

1 3.6 3.2

2 2.8 3.0

3 3.3 3.0

4 3.8 3.2

5 3.2 2.9

6 3.3 3.1

7 2.9 2.9

8 3.1 3.4

a. Are the differences statistically significant? b. The study should be completed as a dependent-samples t test. Since two separate

groups are involved, why?

9. A team of researchers associated with an accrediting body studies the amount of time professors devote to their scholarship before and after they receive tenure. Scores represent hours per week.

Professor Before tenure After tenure

1 12 5

2 10 3

3 5 6

4 8 5

5 6 5

6 12 10

7 9 8

8 7 7

a. Are the differences statistically significant? b. What is t if the groups had been independent? c. What is the primary reason for the difference in the two t values?

tan82773_07_ch07_191-226.indd 224 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Summary and Resources

10. A supervisor is monitoring the number of sick days employees take by month. For 7 people, these numbers are as follows:

Employee Oct Nov Dec

1 2 4 3

2 0 0 0

3 1 5 4

4 2 5 3

5 2 7 7

6 1 3 4

7 2 3 2

a. Are the month-to-month differences significant? b. What is the scale of the independent variable in this analysis? c. How much of the variance does the month explain?

11. If the people in each month of the Question 10 data were different, the study would have been a one-way ANOVA.

a. Would the result have been significant? b. Because total variance (SStot) is the same in either 10 or 11, and the SScol (10) is

the same as SSbet (11), why are the F values different?

Answers to Try It! Questions

1. Small samples tend to be platykurtic because the data in small samples are often highly variable, which translates into relatively large standard deviations and large error terms.

2. If groups are created by random sampling, they will differ from the population from which they were drawn only by chance. That means that error can occur with ran- dom sampling, but its potential to affect research results diminishes as the sample size grows.

3. The before/after t and the matched-pairs t differ only in that the before/after test uses the same group twice, while the matched-pairs test matches each subject in the first group with one in the second group who has similar characteristics. The calcu- lation and interpretation of the t value are the same in both procedures.

4. The within-subjects test will detect a significant difference more readily than an independent t test. Power in statistical testing is the likelihood of detecting significance.

tan82773_07_ch07_191-226.indd 225 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Summary and Resources

5. Because the same subjects are involved in each set of measures, the within-subjects test allows us to calculate the amount of score variability due to individual differ- ences in the group and eliminate it because it is the same for each group. This source of error variance is eliminated from the analysis, leaving a smaller error term.

6. The eta squared value would be the same in either problem. Note that in a one-way ANOVA, eta squared is the ratio of SSbet to SStot. In the within-subjects F, it is SScol to SStot. Because SSbet and SScol both measure the same variance, and the SStot values will be the same in either case, the eta squared values will likewise be the same. What changes is the error term. Ordinarily, SSresid will be much smaller than SSwith, but those values show up in the F ratio by virtue of their respective MS values, not in eta squared.

tan82773_07_ch07_191-226.indd 226 3/3/16 12:26 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.