ANOVA
Jeff Rotman/The Image Bank/Getty Images
chapter 7
Dependent t-Tests and Repeated Measures Analysis of Variance
Learning Objectives
After reading this chapter, you will be able to. . .
1. describe the impact that initial between-groups differences have on test results when using the t-test or analysis of variance.
2. compare the independent t-test to the dependent-groups t-test.
3. complete a dependent-(paired/repeated-)samples t-test.
4. explain what power means in statistical testing.
5. compare the one-way ANOVA to the repeated-measures ANOVA.
6. complete a repeated-measures ANOVA.
7. interpret results and draw conclusions of within-group designs.
8. present within-group analysis results in APA format.
9. employ Wilcoxon signed-ranks W-test and Friedman’s nonparametric ANOVA.
CN
CO_LO
CO_TX
CO_NL
CT
CO_CRD
suk85842_07_c07.indd 235 10/23/13 1:29 PM
CHAPTER 7Section 7.1 Reconsidering the t and F Ratios
Tests of significant difference, such as the t-test and analysis of variance, are of two kinds: tests involving independent (or between) groups and those that employ related, or dependent (or within) groups. The tests covered to this point in the book have involved only independent groups tests. However, there are important advantages related to the dependent groups procedures, and they are used frequently in data analysis.
In this chapter, the focus will be on the dependent groups equivalents of the independent t-test and the one-way ANOVA. Since the same groups are used over time or treatments, these are called dependent/within-groups designs, whereas the matched or equivalent groups can also be employed as an alternative design—all collectively known as repeated- measures designs. Although repeated-measures designs answer the same questions as their independent groups equivalents (i.e., are there significant differences within groups, across times/treatments, or between matched/equivalent groups) under particular cir- cumstances they can do so with greater economy and more statistical power.
7.1 Reconsidering the t and F Ratios
The scores produced in both the independent t and the one-way ANOVA are ratios. In the case of the t-test, the ratio is the result of dividing the difference between the means of the groups by the standard error of the difference:
t 5 M1 2 M2
SEd With ANOVA, the F ratio is the mean square between divided by the mean square within:
F 5 MSbet MSwith
With either t or F, the denominator in the ratio reflects how much scores vary within (rather than between) the groups of subjects involved in the study. These differences are easiest to see in the way the standard error of the difference is calculated for a t-test. When group sizes are equal, the formula is
SED 5 "1SEM1 2 2 1 1SEM2 2 2
with SEM 5 s
"n and s, of course, a measure of score variation in any group.
So the standard error of the difference is based on the standard error of the mean, which in turn is based on the standard deviation. These connections make it clear that score vari- ance within in a t-test has its root in the standard deviation for each group of scores. If we reverse the order and work from the standard deviation back to the standard error of the difference, note the following:
• When scores vary substantially in a group, it is reflected in a large standard deviation.
• When the standard deviation is relatively large, the standard error of the mean must likewise be large because the standard deviation is the numerator in the formula for SEM.
H1
TX_DC
BLF
TX
BL
BLL
suk85842_07_c07.indd 236 10/23/13 1:29 PM
CHAPTER 7Section 7.1 Reconsidering the t and F Ratios
• A large standard error of the mean results in a large standard error of the difference because that statistic is the square root of the sum of the squared standard errors of the mean.
• When the standard error of the difference is large, the differ- ence between the means has to be correspondingly larger in order for the result to be statistically significant. The table of critical values indicates that no t ratio (the ratio of the differ- ences between the means and the standard error of the differ- ence) may be less than 1.96 for a two-tailed test and less than 1.645 for a one-tailed test based on the critical a 5 .05.
Error Variance
The point of this is that the value of t in the t-test—and it is the same for F in an ANOVA— is greatly affected by the amount of variability within the groups involved. When the variability within those groups is extensive, the values of t and F are correspondingly diminished and less likely to be statistically significant than when there is relatively little variability within the groups.
These differences within groups stem from differences in the way individuals within the samples react to whatever treatment is the independent variable; different people respond differently to the same stimulus. These differences represent error variance, which is what occurs whenever scores differ for reasons not related to the influence of the IV.
Other Sources of Error Variance
Within-group differences are not the only source of error variance in the calculation of t and F. Both t-test and ANOVA are based on the assumption that the groups involved are equivalent before the independent variable is introduced. In a t-test where the impact of relaxation therapy on clients’ anxiety is the issue, the assumption is that before the ther- apy is introduced, the group, which will receive the therapy, and the control group, which will not, begin with equivalent levels of anxiety. That assumption is the key to attributing any differences after the treatment to the therapy, the IV.
Confounding Variables
In comparisons such as this, the initial equivalence of the groups can be a problem, how- ever. Maybe there were differences in anxiety before the therapy was introduced. There might be differences in the employment circumstances of each group, and perhaps those threatened with unemployment are more anxious than the others. Maybe there are age- related differences. These other influences that are not controlled in an experiment are sometimes called confounding variables.
If a psychologist wants to examine the impact that a substance abuse program has on addicts’ behavior, a study might be set up as follows. Two groups of the same number of addicts are selected, and the substance abuse program is provided to one group. After the program, the psychologist measures the level of substance abuse in both groups to see whether there is a difference.
A If the size of the standard deviation is related to the size of the group, in a t-test, what is the relationship between sample size and error?
Try It!
suk85842_07_c07.indd 237 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
The problem is that the presence or absence of the program is not the only thing that might prompt subjects to respond differently. Perhaps subjects’ background experiences are dif- ferent. Maybe there are ethnic group differences, age differences, or social class differ- ences. If any of those differences affect substance abuse behavior, there is an opportunity to confuse the influence of those factors with the impact of the substance abuse program, which is the IV. If those other differences are not controlled and they affect the dependent variable, they contribute to error variance. There is error variance any time the dependent variable (DV) scores fluctuate for reasons unrelated to the IV.
Therefore, there is error variance reflected in the variability within groups, and there is error variance represented in any difference between groups that is not related to the IV. Test results can be meaningful only when the score variance that is related to the independent variable is substantially greater than the error variance—what is controlled must contrib- ute more to score values than what is left uncontrolled. This makes it important to look for ways to control error variance so that it is not confused with the variability in scores that stems from the independent variable. Controlling for confounding variables is a necessary research activity. A confounding variable can affect the IV-DV relationship, thereby lowering internal validity and thus the statistical conclusion validity of your findings. Failing to take confounding variables into account can result in misleading data and erroneous conclu- sions, to the detriment of the researcher’s reputation. In other words, be careful of research findings and sweeping general statements as there may be several confounding elements. That said, controlling for extraneous confounding variables could be done in several ways.
7.2 Dependent-Groups Designs Ideally, any before-the-treatment differences between the groups in a study will be min- imal. Recall that random selection occurs when every member of a population has an equal chance of being selected. The logic behind random selection is that when groups are randomly drawn from the same population, they will differ only by chance, but they will differ because no sample can represent the population with complete fidelity, and occa- sionally, the chance differences will affect the way subjects respond to the IV.
One way to reduce error variance is to adopt what are called dependent-groups designs. The independent t-test and the one-way ANOVA required independent groups. Members of one group can- not also be members of other groups in the same study. However, in the case of the t-test, if the same group is measured, exposed to the treatment, and then measured again, an important source of error variance is controlled. Using the same group twice makes initial equivalence no longer a concern. Any scoring variability between the first and second measure should more accurately reflect the impact of the independent variable.
The Dependent-Samples t-Tests
One dependent-groups test where the same group is measured twice is called the before/ after t-test, also known as the pre/post t-test. An alternative is called the matched-pairs or dependent-samples t-test, where each participant in the first group is matched to someone in the second group who has a similar characteristic. Yet a third alternative that
B How does random selection attempt to control error variance in statistical testing?
Try It!
suk85842_07_c07.indd 238 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
is basically the same as a before/after design is the within-treatment design where each participant is used across two treatment groups (usually given at two different times, which makes it the same as the before/after t-test). In the latter option the participant acts as his or her own control where one of the treatments may be a placebo. All three types of dependent-samples t-tests have the same objective, which is controlling the error variance that is due to initial between-groups differences. Following are examples of each test.
• The before/after design: A researcher is interested in the impact that positive rein- forcement has on employees’ sales productivity. Besides the sales commission, the researcher introduces a rewards program that can result in increased vaca- tion time. The researcher gauges sales productivity for a month, introduces the rewards program, and gauges sales productivity during the second month for the same people. The researcher will explore differences in employee productivity before and after the positive reinforcement intervention (the rewards program). If significance was obtained then the null (i.e., there is no difference in employee productivity after the introduction of the rewards program) can be rejected and find support for the alternative hypothesis (i.e., there is a significant increase in employee productivity after the introduction of the rewards program).
• The matched-pairs design: A school counselor is interested in the impact that verbal reinforcement has on students’ reading achievement. To eliminate between-groups differences, the researcher selects 30 people for the treatment group and matches each person in the treatment group to someone in the control group who has a similar reading score on a standardized test. The researcher then introduces the verbal reinforcement program to those in the treatment group for a specified period and over time compares the performance of students in the two groups as well as their performance within the group. The matched-pairs design is similar to an inde- pendent group design with one major exception: that the groups are as matched (or equivalent) to each other as closely as possible based on a particular measure—in this case the match or equivalent is based on reading scores on a standardized test.
• Within-treatment design: A psychiatrist measures each study participant on taking a placebo, and then the actual drug for depression to test for significant differences over the two treatments (placebo versus drug). Here a counterbalancing design may be employed to minimize the order effects that plague repeated-measures design. Specif- ically the order in which treatments are given can influence the outcome. Therefore, the treatments are given to the groups at different times as depicted in Figure 7.1.
Figure 7.1: Counterbalance design
Source: Oskar Blakstad (May 8, 2009). Counterbalanced Measures Design by Martyn Shuttleworth. Retrieved Aug 22, 2013 from Explorable.com: http://explorable.com/counterbalanced-measures-design.
Although there are differences in how the tests are set up, calculating the t-statistic is the same in each case. The differences between the approaches are conceptual, not
Group 1 Treatment A Treatment B Posttest
Group 2 Treatment B Treatment A Posttest
suk85842_07_c07.indd 239 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
mathematical. Both approaches have the same purpose—to control for any score varia- tion stemming from nonrelevant factors. They both reduce the error variance that comes from using nonequivalent groups. Therefore, testing for homogeneity of variance or the Levene’s test is moot here, as we are not dealing with differences between groups. On the other hand, we are dealing with variance within groups and across pairs of treatments. If there are such significant differences, then this issue constitutes what is described as a violation of sphericity, which will be discussed in Section 7.3 with more depth when we examine repeated-measures ANOVA.
Calculating t in a Dependent-Groups Design
Although the differences between before/after, matched-pairs, and within-treatment t-tests are not math-related, there are several approaches to calculating the t statistic in the
dependent-groups tests. Whatever their differences, they all take into account the fact that the two sets of scores are related. One approach is to calculate the correlation between the two sets of scores and then to use the strength of the correlation value to reduce the error variance— the higher the correlation between the two sets of scores, the lower the error variance. Rather than correlations, which come up later in the book, we will rely on “difference scores.” But whether we use correla- tion values or difference scores, the result is the same.
The distribution of difference scores was discussed in Chapter 5 when the independent t-test was introduced. The point of that distribution is to determine the point at which the difference between a pair of sample means (M1 2 M2) is so great that the most probable explanation is that the samples were not drawn from populations with the same means.
That same distribution also provides the theoretical underpinning for the dependent- groups tests, but rather than the difference between the means of the two groups (M1 2 M2), the difference score in the dependent-groups tests is based on the mean of the differences between pairs of individual scores. That is, the differences between each pair of related scores will be determined, and then the mean of those differences will become the numerator in the t ratio. If the mean of the difference scores is sufficiently different from the mean of the distribution of difference scores (which, recall, is 0), the t value will be statistically significant.
The denominator in the t ratio is another standard error of the mean value, but in this case, it is the standard error of the mean for those difference scores. The mechanics of checking for significance are similar to what was done for the independent t:
• A critical value from the t table defines the point at which the t ratio is statisti- cally significant.
• The critical value is dependent upon the degrees of freedom for the problem. For the dependent-samples t, the degrees of freedom are the number of pairs of scores minus 1 (n 2 1).
The dependent-groups t-test statistic has this form:
t 5 Md
SEMd Formula 7.1
C How are the before/after t-test and the matched-pairs t-test different?
Try It!
suk85842_07_c07.indd 240 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
Where
Md 5 the mean of the difference scores
SEMd 5 the standard error of the mean for the difference scores
The steps for completing the test follow:
1. From the two scores for each subject, subtract the second from the first to deter- mine the difference score, d for each pair.
2. Determine the mean of the d scores: Md 5 a d
number of pairs 3. Calculate the standard deviation of the d values, sd. 4. Calculate the standard error of the mean for the difference scores, SEMd, by
dividing the result of Step 3 by the square root of the number of pairs of scores,
SEMd 5 sd
"number of pairs 5. t 5
Md SEMd
Following is an example to illustrate the steps to calculating the dependent measures t-test. A psychologist is investigating the impact that verbal reinforcement has on the number of questions students ask in a seminar.
• Ten upper-level students participate in two seminars where a presentation is fol- lowed by students’ questions.
• In the first seminar, no feedback is provided by the instructor after a student asks the presenter a question.
• In the second seminar, the instructor offers feedback—such as “That’s an excel- lent question” or “Very interesting question” or “Yes, that had occurred to me as well”—after each question.
• The psychologist will test the following hypothesis:
H0: There is no significant mean difference in the number of student questions asked from seminar 1 to seminar 2
H0: mseminar_1_questions 5 mseminar_2_questions
• By rejecting H0 , the psychologist will find support for the alternative hypothesis:
Ha: There is a significant mean difference in the number of student questions asked from seminar 1 to seminar 2
Ha: mseminar_1_questions ? mseminar_2_questions
Is there a significant difference between the number of questions students ask in the first seminar compared to the number of questions students ask in the second seminar? The number of questions asked by each student in both seminars and the solution to the prob- lem are in Figure 7.2.
suk85842_07_c07.indd 241 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
Figure 7.2: Calculating the before/after and within-treatment t
Se m
in ar
1
Se m
in ar
2
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
1
0
3
0
2
1
3
2
1
2
3
2
4
0
3
1
5
4
3
1
Se m
in ar
1
Se m
in ar
2 d
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
1
0
3
0
2
1
3
2
1
2
3
2
4
0
3
1
5
4
3
1
�2
�2
�1
0
�1
0
�2
�2
�2
1
1. Determine the difference between each pair of scores, d by subtraction.
2. Determine the mean of the differences, the d values (Md).
Md = = � = �1.1
3. Calculate the standard deviation of the d values (sd).
Verify that sd = 1.101
�d = �11
determine standard error of the mean for the difference scores (SEMd) 4. Just as the standard error of the mean in the earlier tests was s/�n,
by dividing the result of step 3 by the square root of the number of pairs.
Verify that SEmd = = = 0.348
5. Divide Md by SEMd to determine t.
t = = � = �3.161
this test are the number of pairs of scores, np �1.
t0.05(9) = 2.262
6. As noted earlier, the degrees of freedom for the critical value of t for
�d
10
Md
SEMd
1.1
0.348
sd
�np
1.101
�10
11
10
suk85842_07_c07.indd 242 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
The calculated value of t exceeds the critical value from Table 5.1 (which is also Table B in the Appendix). The result is statistically significant. Note that it is the absolute value of the calculated t in which we are interested. Because the question was whether there is a sig- nificant difference in the number of questions, it is a two-tailed test, and it does not matter which session had the greater number; it also does not matter whether Session 1 is larger than Session 2 or the other way around. The students in the second session, where ques- tions were followed by feedback, asked significantly more questions than the students did in the first session, when the instructor offered no feedback.
The Degrees of Freedom, the Dependent-Groups Test, and Power
With Md 5 21.1, there is comparatively little difference between the two sets of scores. What makes such a small mean difference statistically significant? The answer is in the amount of error variance in this problem. When the error variance is also very small—the standard error of the difference scores is just .348—comparatively small mean differences can be statistically significant. The rationale for using dependent- groups tests as opposed to independent-group designs is that the for- mer are comparatively more powerful; there is less error to contend with, thereby increasing the probability of rejecting the null hypoth- esis. This brings us to the discussion of power in statistical testing.
Table B in the Appendix, the critical values of t, indicates that critical values decline as degrees of freedom increase. That occurs not only in the critical values for t but also for F in analysis of variance and in fact for most tables of critical values for statistical tests. For the dependent- groups t-test, the degrees of freedom are based on
• the number of pairs of related scores, 21.
For the independent-groups t-test, the degrees of freedom are based on
• the number of scores in both groups, 22 (Chapter 5).
This means that critical values are larger in a dependent-groups test for the same number of raw scores involved. But even a test with a larger critical value can produce significant results when there is more control of error variance. This is what the dependent-groups test provides. The central point is this: When each pair of scores comes from the same par- ticipant, or from a matched pair of participants, the random variability from nonequiva- lent groups is minimal because scores tend to vary similarly for each pair, resulting in relatively little error variance. The small SEMd value that results more than compensates for the fewer degrees of freedom and the associated larger critical value connected to dependent-groups tests.
In statistical testing, power is defined as the likelihood of detecting a significant differ- ence when it is present. The more powerful statistical test is the one that will most read- ily detect a significant difference. As long as the sets of scores are closely related, the dependent-measures test is more powerful than the independent-groups equivalent.
D What does it mean to say that the within-subjects test has more power than the independent t-test?
Try It!
suk85842_07_c07.indd 243 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
A Matched-Pairs Example
Another form of the dependent-groups t-test is the matched-pairs design. In this approach, rather than measure the same people repeatedly, each participant in one group is paired with a participant in the other group who is similar.
For example, consider a market analyst who wants to determine whether a television com- mercial will induce consumers to spend more on a breakfast cereal. The analyst selects a group of consumers entering a grocery store, induces them to view the television com- mercial, and then tracks their expenditures on breakfast cereal. A second group is selected, and they also shop, but they do not view the television commercial. The analyst selects people for the second group who match the age and gender characteristics of those in the first group. This controls for age and gender because those characteristics might affect spending for the particular product. Each individual from Group 1 has a companion in Group 2 of the same age and sex. The expenditures in dollars for the members of each group and the solution to the problem are in Figure 7.3.
Figure 7.3: Calculating a matched-pairs t-test
Vi ew
ed
Di d
no t
Vi ew d
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
1.5
4
3
0
2
4.5
6
0
5.25
2
3
0
2
0
0
4
2
1
2
3
�1.5
4
1
0
2
0.5
4
�1
3.25
�1
sd = 2.092
Verify that Md = 1.125
SEMd = = = 0.662
t = = = 1.700
t0.05(9) = 2.262
sd
�np
2.092
�10
Md
SEMd
1.125
0.662
suk85842_07_c07.indd 244 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
The absolute value of t is less than the critical value from Table 5.1 (or Appendix Table B) for df 5 9. The difference is not statistically significant. There are probably several ways to explain the outcome, but we will explore just three.
• The most obvious explanation is that the commercial did not work. The shoppers who viewed the commercial were not induced to spend significantly more than those who did not view it.
• Another explanation has to do with the matching. Perhaps age and gender are not related to how much people spend shopping for the particular product. Per- haps the shopper’s level of income is the most important characteristic, and that was not controlled in the pairing.
• Another explanation is related to sample size. Small samples tend to be more variable than larger samples, and variability is what the denominator in the t-ratio reflects. Perhaps if this had been a larger sample, the SEMd would have had a smaller value, and the t would have been significant.
The second explanation points out the disadvantage of matched-pairs designs compared to repeated-measures designs. The individual conducting the study must be in a posi- tion to know which characteristics of the participants are most relevant to explaining the dependent variable so that they can be matched in both groups. Otherwise, it is impos- sible to know whether a nonsignificant outcome reflects an inadequate match, control of the wrong variables, or a treatment that just does not affect the DV.
Comparing the Dependent-Samples t-Test to the Independent t
In order to compare the dependent-samples t-test and the independent t more directly, we are going to apply both tests to the same data. This will illustrate how each test deals with error variance; however, a caution is necessary before beginning: Once data is collected, there really is no situation where someone can choose which test to use because either the groups are independent, or they are not. Therefore, we proceed purely as an academic exercise, recognizing that such a situation is not going to happen in the ordinary course of events.
As an example, a university program encourages students to take a service learning class that emphasizes the importance of community service as a part of the students’ educa- tional experience. Data is gathered on the number of hours former students spend in com- munity service per month after they complete the course and graduate from the university.
• For the independent t-test, the students are divided between those who took a service learning class and graduates of the same year who did not.
• For the dependent-groups t-test, those who took the service learning class are matched to a student with the same major, age, and gender who did not take the class.
The data and the solutions to both tests are in Figure 7.4.
suk85842_07_c07.indd 245 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
Figure 7.4: The before/after t-test versus the independent t-test
Because the differences between the scores are quite consistent, as they tend to be when participants are matched effectively, there is very little variance in the difference scores. This results in a comparatively small standard deviation of difference scores and a small standard error of the mean for the difference scores. This allows for t ratios with even rela- tively small numerators to be statistically significant. Because for the independent t-test, there is no assumption that the two groups are related, error variance is based on the dif- ferences within the groups of raw scores, and the denominator is large enough that the t value is not significant.
Cl as
s No
Cl as
s d
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
4
3
3
2
3
4
1
5
6
4
3
2
2
2
2.5
3
2
4
5
3
1
1
1
0
0.5
1
�1
1
1
1
M
s
SEM
3.50
1.434
0.453
2.850
1.001
0.316
0.650
0.669
0.211
As an independent t-test we have,
SEd = �(SEM12 + SEM22) = �(0.4532 + 0.3162) = 0.553
As a matched pairs t-test the results are,
t = = 0.650 + 0.211 = 3.081; t0.05(9) = 2.262. The result is significant.
M1 � M2 SEd
3.50 � 2.850 = 1.175; t0.05(18) = 2.101. The result is not significant.=t =
0.553
Md
SEMd
suk85842_07_c07.indd 246 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
The Dependent-Groups t-Test on Excel
If the problem in Figure 7.4 is completed in Excel as a dependent-groups test, the proce- dure is as follows:
• Create the data file in Excel.
Column A is labeled Class to indicate those who had the service learning class, and column B is labeled No Class.
Enter the data, beginning with cell A2 for the first group and cell B2 for the second group.
• Click the Data tab at the top of the page. • At the extreme right, choose Data Analysis. • In the Analysis Tools window, select t-test: Paired Two Sample for Means and
click OK. • There are two blanks near the top of the window for Variable 1 Range and
Variable 2 Range. In the first, enter A2:A11 indicating that the data for the first (Class) group is in cells A2 to A11. In the second, enter B2:B11 for the No Class group.
• Indicate that the hypothesized mean difference is 0. This reflects the value for the mean of the distribution of difference scores.
• Indicate A13 for the output range, so that the results do not overlay the data scores.
• Click OK.
Widen column A so that all the output is readable. The result is the screenshot that is Figure 7.5.
suk85842_07_c07.indd 247 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
Figure 7.5: The Excel output for the dependent-samples t-test using the data from Figure 7.4
In the Excel solution, t 5 3.074 rather than the 3.081 from the longhand solution. The Excel approach is to calculate the correlation between scores to find a solution, rather than to determine the difference between scores as we did. Note that the Pearson correlation (which will be explained in Chapter 8) is indicated at .91. In any event, the very minor dif- ference, .007, between the solution in Figure 7.4 and the Excel solution in Figure 7.5 is not relevant to the outcome as these are attributed to rounding errors. The Excel output also indicates results for both one-tailed and two-tailed tests at p 5 .05; the outcome is statisti- cally significant.
suk85842_07_c07.indd 248 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
Apply It!
Repeated Measures
Diabetes is a group of metabolic diseases in which the body cannot properly regulate blood sugar. Management of this disease is achieved by controlling
normal levels of glucose in the blood for as much of the time as possible. This requires an accurate, portable glucose monitor for home use.
A medical device company has developed a new portable glucose monitor and wishes to compare it against a laboratory standard. This will produce a data set in which two different monitors measure the glucose level of 11 randomly chosen diabetes patients. Although the two monitors take the blood samples at the same time, this can be considered an example of the before/after dependent-samples t-test because the same group is measured twice. By using the same set of patients for both monitors, each patient is his or her own control. Obtaining two measurements for each patient reduces measurement variability compared to using two independent sets of patients. Choosing a level of significance of p ≤ .05, we use the paired-sample t-test to test the null hypothesis that there is no difference in measure- ments between the two monitors.
• H0: mglucose_portable_monitor 5 mglucose_lab_monitor
By rejecting H0 the company will find support for the alternative hypothesis that there is a significant mean difference in the glucose level between both machines.
• Ha: mglucose_portable_monitor ? mglucose_lab_monitor
The glucose readings from each of the two monitors are measured in milligrams per deciliter and are shown in the following table. There is a large variability within each column because each patient is different, and the readings were taken at various times of the day.
Patient Portable Monitor Laboratory Standard
A 112 120
B 85 82
C 103 116
D 154 168
E 65 75
F 52 51
G 85 96
H 72 79
I 167 178
J 123 141
K 142 153
(continued)
suk85842_07_c07.indd 249 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
Comparing the Three Dependent t-Tests With the Independent t-Test
The before/after and matched-pairs approaches to calculating a dependent-groups t-test each have advantages. The before/after design provides the greatest control over the extraneous variables that can confound the results in a matched-pairs design. When using the matching approach, there is always a chance that subjects in Group 2 are not matched closely enough on some relevant variable and the resulting mismatches create error vari- ance. In the service learning example, students were matched according to age, major, and gender. But if marital status affects students’ willingness to be involved in commu- nity service and it is not controlled, there could be an imbalance of married/not-married
Apply It! (continued)
The Excel solution follows:
Variable 1 Variable 2
Mean 105.45 114.45
Variance 1428.67 1736.27
Observations 10 10
Pearson Correlation 0.99
Hypothesized Mean Difference
0.00
df 9
t Stat 24.817
P(Tdt) one-tail 0.0003
tcrit one-tail 1.8331
P(Tdt) two-tail 0.0007
tcrit two-tail 2.2622
The magnitude of the calculated value of t 5 24.817 exceeds the critical two-tail value from the table of tcrit 5 2.26. The result is statistically significant so we reject the null hypothesis that the means are the same. The portable monitor measures glucose levels lower than the laboratory standard.
Based on results of this test, the company continued research on the portable monitor until they could devise a solution that would more accurately replicate laboratory standard results.
Apply It! boxes written by Shawn Murphy.
suk85842_07_c07.indd 250 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
students that confounds results. The before/after procedure involves the same students, and unless their status on some important variable changes between measures (a rash of marriages between the first and second measurement, for example), there is going to be better control of error variance with that approach.
Note that the matched-pairs and the within-treatments approach also assume a large sam- ple from which to draw in order to select participants who match those in the first group. As the number of variables on which participants must be matched increases, so must the size of the sample from which to draw in order to find participants with the correct com- bination of characteristics.
The advantage of the matched-pairs design, on the other hand, is that it takes less time to execute. The treatment group and the control group can both be involved in the study at the same time. By way of a summary, note the comparisons in Table 7.1.
Table 7.1: Comparing the t-tests
Independent t Before/After Matched-Pairs Within- Treatments
Groups Independent groups
One group measured twice
Two groups: each subject from the first group matched to one in the second
One group measured twice for two treatments
Denominator/ error term
Within groups variability plus between groups
Only within groups variability
Only within groups variability
Only within groups variability
7.3 The Within-Subjects F
Sometimes two measures of the same group are not enough to track changes in the DV. Maybe the researchers running the service learning study want to compare how much time students devoted to community service the year they graduated, one year later, and then two years after graduation. The within-subjects F is a dependent-groups procedure for two or more groups of scores when the dependent variable is interval or ratio scale. Because the dependent-groups t-test is the repeated-measures equivalent of the indepen- dent t-test, the within-subjects F is the repeated-measures or matched-pairs equivalent of the one-way ANOVA. The same Ronald A. Fisher who developed analysis of variance also developed this test, which is a form of ANOVA, and the test statistic is still F.
Here too, the dependent groups can be formed by either repeatedly measuring the same group or by matching separate groups of participants on the relevant variables. When there are more than two groups, matching becomes increasingly problematic, however, and although it is theoretically possible to match any number of participant groups, it is
suk85842_07_c07.indd 251 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
a highly complex undertaking to match all the relevant variables across more than two or three measures. Repeatedly measuring the same participants is much more common than matching.
Managing Error Variance in the Within-Subjects F
Recall from Chapter 6 that when Fisher developed ANOVA, he shifted away from calcu- lating score variability with the standard deviation, standard error of the mean, and so on and used sums of squares instead. The particular sums of squares computed are the key to the strength of this procedure.
If a group of participants in a study is measured on a dependent variable at three different intervals and their scores are recorded in parallel columns, the researcher will have a data sheet similar to the following:
First Measure Second Measure Third Measure
Participant 1 . . .
Participant 2 . . .
• The column scores are the equivalent of scores from the different groups in a one- way ANOVA, and any differences from column to column reflect the effect of the IV, the treatment.
• The participant-to-participant differences, the within-group differences, are reflected in the differences in the scores from row to row. Those differences are error variance just as they are with the one-way ANOVA.
• The within-subjects F approach is to calculate the variability between rows (the within-groups variance), and then, because it comes from participant-to- participant differences that are the same in each group, to eliminate it from further analysis.
• The only error variance that remains is that which does not stem from the person- to-person differences.
In the dependent-samples t-test, the within-subjects variance is managed by reducing the denominator in the t ratio according to how highly correlated the two sets of measures are (the Excel approach) or by the longhand approach of using the standard deviation of the difference scores, which is relatively small when scores are related.
In the within-subjects F, the variability within groups is calculated and then adjusted if there are issues with too much variance between pairs of treatments. This detection of variance is based on the Mauchly’s test of sphericity (W) developed by John W. Mauchly in 1940. If the W-test is significant (p , .05), then there is a violation of sphericity, which means that there is too much variance within the group across pairs of times/treat- ments (see Table 7.2). Therefore, since sphericity is violated, degrees of freedom adjust- ments are made that include the Greenhouse-Geisser or Huynh-Feldt calculations. These are adjustments of the degrees of freedom (df ) based on their respective epsilon or
suk85842_07_c07.indd 252 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
e-values (discussed more in Chapter 8). Of the two options, the Greenhouse-Geisser is more conservative in that it is harder to reject the null hypothesis, with a lower prob- ability of a type I error. The Huynh-Feldt is based on a bias corrected value that is not as conservative.
One final note regarding error variance is that it can only be calculated across comparison of pairs of treatments so therefore the W-test for dependent-samples t-test is not necessary since there is only one pair of values. In addition, the test for sphericity cannot be done in the one-way ANOVA because the amount of variability within groups is different for each group, and there is no way to separate it from the balance of the error variance in the problem, which can be a severe limitation in affecting the power of between-group designs. An example and interpretation of sphericity will be shown in the SPSS example later in the chapter.
Table 7.2: The concept of sphericity
Patient Tx A Tx B Tx C Tx 12Tx 2 Tx 12Tx 3 Tx 22Tx 3
1 30 27 20 3 10 7
2 35 30 28 5 7 2
3 25 30 20 25 5 10
4 15 15 12 0 3 3
5 9 12 7 23 2 5
Variance 17 10.3 10.3
A Within-Subjects F Example
An industrial/organizational psychologist is conducting a study of employees who assemble electronic components. The study examines how productivity changes dur- ing the length of time employed. The psychologist identifies five workers hired in the same month and then gauges the number of assembled components each employee aver- ages per hour one week, one month, and then two months after beginning work. Is there is a relationship between the number of completed components and the length of time employed? The data for the five employees follows:
Products Assembled per Hour
1 week 1 month 2 months
Diego 2 5 4
Harold 4 7 7
Wilma 3 6 5
Carol 4 5 6
Moua 5 8 9
suk85842_07_c07.indd 253 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
• The independent variable (the IV, the treatment) is the time elapsed. • The dependent variable (the DV) is the number of components assembled. • The issue is whether there are significant differences in the measures from col-
umn to column (over time).
In Chapter 6 the variability related to the IV was measured in the sum of squares between (SSbet). The same source of variance is gauged here, except that it is called the sum of squares between columns (SScol).
The Components of the Within-Subjects F
Calculating the within-subjects F begins just as the one-way ANOVA begins, by determin- ing all variability from all sources with the sum of squares total (SStot). It is even calculated the same way as it was in Chapter 6:
1. The sum of squares total.
SStot 5 a (x 2 MG)2
a. Subtract each score from the mean of all the scores from all the groups, b. square the difference, and then c. sum the squared differences.
The balance of the problem is completed with the following steps:
2. The sum of squares between columns (SScol). This equation is much like SSbet in the one-way ANOVA. The scores in each column are treated the same way the indi- vidual groups are treated in the one-way ANOVA. For columns 1, 2, through k,
SScol 5 (Mcol 1 2 MG ) 2ncol 1 1 (Mcol 2 2 MG )
2ncol 2 1 . . . 1 ( Mcol k 2 MG )
2ncol k Formula 7.2
a. calculate the mean for each column of scores, b. subtract the mean for all the data (MG) from each column mean, c. square the result, and d. multiply the squared result by the number of scores in the column.
3. The sum of squares between rows. This too, is like the SSbet from the one-way prob- lem except that it treats the scores for each row as a separate group. For rows 1, 2, through i
SSrows 5 (Mrow 1 2 MG) 2 nrow 1 1 (Mrow 2 2 MG)
2 nrow 2 1 . . . 1 (Mrow i 2 MG)
2 nrow i
Formula 7.3
a. calculate the mean for each row of scores, b. subtract the mean for all the data from each row mean, c. square the result, and d. multiply the squared result by the number of scores in the row.
suk85842_07_c07.indd 254 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
4. The residual sum of squares is the error term in the within-subjects F. It is the equivalent of SSwith in the one-way ANOVA. With the within-subjects F, the vari- ability in scores due to person-to-person differences within the same measure is calculated, and because it is the same for each set of measures, it is eliminated. This will result in a reduced error term. It is determined as follows:
SSresid 5 SStot 2 SScol 2 SSrows Formula 7.4
a. Take all variance from all sources (SStot), b. subtract from it the treatment effect (SScol), and c. subtract the person-to-person differences (SSrows).
The Within-Subjects F Calculations
When the sums of squares values are completed, the next step is to complete the ANOVA table. The degrees-of-freedom values are as follows:
• dftot 5 N 2 1 • dfcol 5 number of columns 2 1 • dfrows 5 number of rows 2 1 • dfresid 5 dfcol 3 dfrows
Just as with one-way problems, the mean square values are calculated by dividing the sums of squares by their degrees of freedom. The components of the F value and the only MS values required are the MScol, which includes the treatment effect, and the MSresid, which is the error term. The MS is not determined for total or for rows. The F value in the within-subjects ANOVA is then MScol 4 MSresid.
The calculations and the table for the products-assembled-per-hour problem are in Figure 7.6.
suk85842_07_c07.indd 255 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
E How is the error term in the within-subjects F different from that in the one-way ANOVA?
Try It!
Figure 7.6: A within-subjects F example
The calculated value of F exceeds the critical value of F from the table. The number of products assembled per hour is significantly different according to the amount of time the employee has been on the job. The significant F indicates that this much difference between measures is unlikely to have occurred by chance.
Diego
Moua
Carol
Wilma
Harold
Column Means Grand Mean (Md)
1 week
1 month
2 months
Row Means
2
4
3
4
5
3.60
5.333
5
7
6
5
8
6.20
4
7
5
6
9
6.20
3.667
6.0
4.667
5.0
7.333
The Products Assembled per Hour
Source
Residual
Rows
Columns
Total
SS
49.333
22.533
23.333
3.467
df
14
2
4
8
MS
11.267
0.433
F
26.0
Fcrit
4.46
The ANOVA Table
1. SStot = �(x � MG)2
(2 � 5.333)2 + (4 � 5.333)2 + ... + (9 � 5.333)2 = 49.333
4. The residual sum of squares.
SSresid = SStot � SScol � SSrows = 49.333 � 22.533 � 23.333 = 3.467
2. SScol = (Mcol1 � MG) 2ncol1 + (Mcol2 � MG) 2ncol2 + ... + (Mcolk � MG) 2ncolk (3.6 � 5.333)25 + (6.2 � 5.333)25 + (6.2 � 5.333)25 = 22.533
(7.333 � 5.333)23 = 23.333
(3.667 � 5.333)23 + (6.0 � 5.333)23 + (4.667 � 5.333)23 + (5.0 � 5.333)23 +
3. SSrows = (Mr1 � MG) 2nr1 + (Mr2 � MG) 2nr2 + ... + (Mri � MG) 2ni
suk85842_07_c07.indd 256 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
Completing the Post Hoc Test
Ordinarily, the calculation of F leaves unanswered the question of which set of measures is significantly different from which. However, in this particular problem there is only one possibility. Because both the 1-month and the 2-month groups of measures have the same mean (M 5 6.20), they must both be significantly different from the only other group of measures in the problem, the 1-week-on-the-job measures for which M 5 3.6. As a dem- onstration, HSD is completed anyway.
The HSD procedure is the same as for the one-way test, except that the error term is now MSresid. Substituting MSresid for MSwith in the formula provides
HSD 5 x Å MSresid
n
Where
x is a value from Appendix Table D. It is based on the number of means, which is the same as the number of groups of measures, 3 in the example, and the df for MSresid, which are 8.
n 5 the number of scores there are in any one measure, 5 in this instance.
For the number-of-products-assembled-per-hour study,
HSD 5 4.04 Å .433
5 5 1.19
A difference of .306 or greater between any pair of means is statistically significant. Using the same approach used in Chapter 6, a matrix indicating the difference between each pair of means makes it easier to interpret the HSD value.
1 week (3.6) 1 month (6.2) 2 months (6.2)
1 week (3.6) diff 5 0 diff 5 2.6* diff 5 2.6*
1 month (6.2) diff 5 0
2 months (6.2)
*Indicates a significant difference.
The 1-week measures of productivity are significantly different from the 1-month and 2-month measures of productivity. Because the mean values of the 1- and 2-month mea- sures are the same, neither of the last two measures is significantly different from the other. The largest increase in productivity comes between the first week and first month of employment.
Calculating the Effect Size
The final question for a significant F is the question of the practical importance of the result. Using partial-eta squared as the measure of effect size yields the following formula:
partial-h2 5 SScol
SSresid 1 SScol
suk85842_07_c07.indd 257 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
For the problem just completed, SScol 5 22.533 and SSresid 5 3.467, so
partial-h2 5 22.533
26 5 0.87
Approximately 87% of the variance in productivity can be explained by how long the individual has been on the job.
Apply It!
Pilot Program Revisited
Let us return to the example of the middle school that adopted a medita- tion program known as quiet time to relieve stress, increase test scores, and improve student behavior. In Chapter 5, we used a one-sample t-test to deter-
mine that a statistically significant increase in GPAs occurred among participating students. Now, we will use a within-subject F test to see if their stress levels have decreased.
Ten randomly chosen students who participated in the program filled out questionnaires about their stress levels. The aggregate score was from 1 to 10, with 10 indicating the most stress. The survey was given before the start of the program and at 3-month intervals. The time elapsed represents the independent variable, the treatment effect that drives this analysis. The dependent variable is the stress score. The within-subjects F is a dependent- groups procedure for two or more groups of scores for which the dependent variable is interval or ratio scale. In this example, we have four groups of scores.
Results of the stress questionnaires follow.
0 Months 3 Months 6 Months 9 Months
Student 1 7 6 6 6
Student 2 9 6 5 5
Student 3 7 5 5 4
Student 4 5 3 3 2
Student 5 7 6 4 4
Student 6 8 5 7 5
Student 7 5 4 4 3
Student 8 7 5 6 5
Student 9 6 6 4 4
Student 10 7 5 5 5
(continued)
suk85842_07_c07.indd 258 10/23/13 1:29 PM
CHAPTER 7
Apply It! (continued)
The following table shows results of the within-subject F test calculations.
Source SS df MS F
Total 82.000 39
Columns 34.475 3 11.492 26.36
Subjects 35.725 9
Residual 11.775 27 0.436
F.05(3,27) 2.96
The F value of 26.36 is greater than the critical F value of 2.96, so the results are statistically significant.
Because the calculation of F did not identify the measures that were significantly different from the others, we calculate HSD using the following formula:
HSD 5 x Å MSresid n
HSD 5 3.875 Å 0.436
10 5 0.81
A difference of 0.81 or greater between any pair of means is statistically significant. A matrix indicating the difference between each pair of means makes it easier to interpret the HSD value.
0 months (6.8) 3 months (5.1) 6 months (4.9) 9 months (4.3)
0 months (6.8) diff 5 1.7* diff 5 1.9* diff 5 2.5*
3 months (5.1) diff 5 0.2 diff 5 0.8
6 months (4.9) diff 5 0.6
9 months (4.3)
The differences marked with an asterisk are significant. The largest increase in productivity occurs during the first 3 months of the program.
To determine the practical importance of these numbers, partial-eta squared is used.
For the problem just completed, SScol 5 34.475, and SSresid 5 11.775, so
h2 5 34.475 46.25
5 0.75
Section 7.3 The Within-Subjects F
(continued)
suk85842_07_c07.indd 259 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
Comparing the Within-Subjects F and the One-Way ANOVA
In the one-way ANOVA, within-group variance is different for each group because each group is made up of different participants. There is no way to eliminate the error vari- ance as it was eliminated for the within-subjects F because that source of error variance cannot be separated from the balance of the error variance. The smaller error term in the within-subjects test (which is the divisor in the F ratio) allows relatively small differences between the sets of measures to result in a significant F.
This is illustrated by using the same data as the example of the workers who assem- ble electronic components, except here we calculated a one-way ANOVA instead of the within-subjects F. This is for illustration only because groups are either independent or dependent. There is not a situation in that once the test is conducted, someone would wonder which approach is appropriate.
The SStot and the SSbet will be the same as the SStot and the SScol are in the within-subjects problem.
SStot 5 49.333
SSbet 5 22.533
SSwith 5 a (xa 2 Ma )2 1 a (xb 2 Mb)2 1 a (xc 2 Mc)2 (Formula 6.3)
5 (2 2 3.60)2 1 (4 2 3.60)2 1 . . . 1 (9 2 6.20)2 5 26.80
The value for the SSwith in a one-way ANOVA is the same as SSrows 1 SSresid in the within- subjects F in Figure 7.6. It has to be because in the one-way ANOVA, there is no way to separate the participant-to-participant differences from the balance of the error variance
Apply It! (continued)
About 75% of the variance in stress can be explained by how long the student has been enrolled in the program.
The within-subjects F test allowed analysis of students’ stress levels at multiple times throughout the year and showed that the program was reducing stress levels by significant amounts.
Apply It! boxes written by Shawn Murphy.
suk85842_07_c07.indd 260 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
because they are different for each group. With the SSrows added back into the error term, note in Table 7.3 the changes made to the ANOVA table and to F in particular.
• The degrees of freedom for “within” change to 12 from the 8 for residual, which results in a smaller critical value for the independent-groups test, but that adjust- ment does not compensate for the additional error in the term.
• Note that the sum of squares for the error term jumps from 3.467 in the within- subjects test to 26.80 in the independent-groups test.
• The F value is reduced from 26.0 in the within problem to 5.046 in the one-way problem, a factor of about 1/5.
Because groups are either independent or not, the example is not realistic. Nevertheless, the calculations illustrate the advantage to statistical power of setting up a dependent- groups test, an option researchers have at the planning level.
Table 7.3: The within-subjects F example repeated as a one-way ANOVA
The ANOVA table
Source SS df MS F Fcrit
Between 22.533 2 11.267 5.045 3.89
Within 26.800 12 2.233
Total 49.333 14
Another Within-Subjects F Example
A psychologist working at a federal prison is interested in the relationship between the amount of time a prisoner is incarcerated and the number of violent acts in which the prisoner is involved. Using self-reported data, inmates respond anonymously to a ques- tionnaire administered 1 month, 3 months, 6 months, and 9 months after incarceration. The data and the solution are in Figure 7.7.
The results (F) indicate that there are significant differences in the num- ber of violent acts documented for the inmate related to the length of time the inmate has been incarcerated. The HSD results indicate that those incarcerated for 1 month are involved with a significantly dif- ferent number of violent acts than those who have been in for 3 or 6 months. Those who have been in for 6 months are involved with a sig- nificantly different number of violent acts than those who have been in for 9 months. The eta-squared value indicates that about 37% of the variance in number of violent acts is a function of how long the inmate has been incarcerated.
F We compared a one-way ANOVA to a within-subjects F using the same data. How would the eta-squared values for the two problems compare?
Try It!
suk85842_07_c07.indd 261 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
Figure 7.7: Another within-subjects F: Violence and the time of incarceration
Inmate
1
2
3
4
5
1 month
4
5
3
4
2
3.60
3 months
3
4
1
2
1
2.20
6 months
2
3
1
1
2
1.80
9 months
5
4
2
3
3
3.40
Row Means
3.50
4.0
1.750
2.50
2.0
Column Means
Percentile Improvement
Source
Residual
Subjects
Columns
Total
SS
31.75
11.75
15.0
5.0
df
19
3
4
12
MS
3.917
0.417
F
9.393
The ANOVA Table
1. SStot = �(x � MG)2 = 31.750
Verify that,
4. SSresid = SStot � SScol � SSsubj = 31.75 � 11.75 � 15 = 5.0
F0.05(3.12) = 3.49. F is sig.
2. SScol = (Mcol1 � MG) 2ncol1 + (Mcol2 � MG) 2ncol2 + (Mcol3 � MG) 2ncol3 + (Mcol14 � MG) 2ncol14
(3.6 � 2.75)25 + (2.2 � 2.75)25 + (1.8 � 2.75)25 + (3.4 � 2.75)25 = 11.750
(3.5 � 2.75)24 + (4.0 � 2.75)24 + (1.75 � 2.75)24 + (2.5 � 2.75)24 + (2.0 � 2.75)24 = 15.0
3. SSsubj = (Mr1 � MG) 2nr1 + (Mr2 � MG) 2nr2 + (Mr3 � MG) 2nr3 + (Mr4 � MG) 2nr4 + (Mr5 � MG) 2nr5
MG = 2.750
MSw n
0.417
5
M1 = 3.6
M1 = 3.6
1.4*
M2 = 2.2
1.8*
0.4
M3 = 1.8
0.2
1.2
1.6*
M4 = 3.4
M2 = 2.2
M3 = 1.8
M4 = 3.4
The Post-hoc test: HSD = x0.05�( ) = 4.20�( ) = 1.213
SScol SStot
11.75 = 0.370% of the variance in violence witnessed is related to how long
the inmate has been incarcerated. =n2 =
31.75
suk85842_07_c07.indd 262 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
A Within-Subjects F in Excel
In spite of the important increase in power that is available compared to independent- groups tests, a dependent-groups ANOVA is not one of the more common tests. It is not one of the options Excel offers in the list of Data Analysis Tools, for example. However, like many statistical procedures, there are a number of repetitive calculations involved and Excel can simplify these. We will complete the second problem as an example.
1. Set the data up in four columns just as they are in Figure 7.8, but create a blank column to the right of each column of data. With a row at the top for the labels, data begins in cell A2.
2. Calculate the row and column means as well as a grand mean as follows: a. For the column means, place the cursor in cell A7 just beneath the last value in
the first column and enter the formula, 5average(A2:A6) followed by Enter. To repeat this for the other columns, left click on the solution that is now in A7, drag the cursor across to G7, and release the mouse button. In the Home tab, click Fill and then Right. This will repeat the column-means calculations for the other columns. Delete the entries this makes to cells B7, D7, and F7, which are still empty at this point.
b. For the row means, place the cursor in cell I2 and enter the formula 5average(A2,C2,E2,G2) followed by Enter. To repeat this for the other rows, left click on the solution that is now in I2, drag the cursor down to I6, and release the mouse button. In the Home tab, click Fill and then Down. This will repeat the calculation of means for the other rows.
c. For the grand mean, place the cursor in cell I7 and enter the formula 5average(I2:I6) followed by Enter (the mean of the row means will be the same as the grand mean—the same could have been done with the column means).
3. To determine the SStot: a. In cell B2, enter the formula 5(A222.75)^2 and press Enter. This will square
the difference between the value in A2 and the grand mean. To repeat this for the other data in the column, left-click the cursor in cell B2 and drag down to cell B6. Click Fill and Down. With the cursor in cell B7, click the summation sign ( a ) at the upper right of the screen and press Enter. Repeat these steps for columns D, F, and H.
b. With the cursor in H9, type in SStot5 and press Enter. In cell I9, enter the formula 5Sum(B7,D7,F7,H7) and press Enter. The value will be 31.75, which is the total sum of squares.
4. For the SScol: a. In cell A8, enter the formula 5(3.622.75)^2*5 and press Enter. This will square
the difference between the column mean and the grand mean and multiply the result by the number of measures in the column, 5. In cells C8, E8, and G8, repeat this for each of the other columns, substituting the mean for each column for the 3.60 that was the column 1 mean.
b. With the cursor in H10, type in SScol5 and press Enter. In cell I10, enter the formula 5Sum(A8,C8,E8,G8) and press Enter. The value will be 11.75, which is the sum of squares for the columns.
suk85842_07_c07.indd 263 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
5. For the SSrows: a In cell J2, enter the formula 5(I222.75)^2*4 and press Enter. Repeat this in
rows I32I6 by left clicking on what is now I2 and dragging the cursor down to cell I6. Click Fill and Down.
b. With the cursor in H11, type in SSrow5 and press Enter. In cell I11, enter the formula 5Sum(J2:J6) and press Enter. The value will be 15.0, which is the sum of squares for the participants.
6. For the SSresid, in cell H12, enter SSresid5 and press Enter. In cell I12, enter the formula 5I102I112I12. The resulting value will be 5.0.
We used Excel to determine all the sum-of-squares values. Now, the mean squares are determined by dividing the sums of square for columns and residual by their degrees of freedom:
MScol 5 11.75
3 5 3.917
MSresid 5 5
12 5 .417
F 5 MScol
MSresid 5
3,917 .417
5 9.393, which agrees with the earlier calculations done by hand.
To create the ANOVA table,
• beginning in cell A10, type in Source; in B10, SS; in C10, df; in D10, MS; in E10, F; and in F10, Fcrit.
• Beginning in cell A11 and working down, type in total, columns, rows, residual. • For the sum-of-squares values: • In cell B11, enter 5I9. • In cell B12, enter 5I10. • In cell B13, enter 5I11. • In cell B14, enter 5I12.
For the degrees of freedom:
• In cell C11, enter 19 for total degrees of freedom. • In cell C12, enter 3 for columns degrees of freedom. • In cell C13, enter 4 for rows degrees of freedom. • In cell C14, enter 12 for residual degrees of freedom.
For the mean squares:
• in cell D12, enter 5B12/C12. The result is MScol. • in cell D14, enter 5B14/C14. The result is MSresid.
For the F value in cell E12, enter 5D12/D14.
In cell F12, enter the critical value of F for 3 and 12 degrees of freedom, which is 3.49.
suk85842_07_c07.indd 264 10/23/13 1:29 PM
CHAPTER 7Section 7.4 Presenting Results
The list of commands looks intimidating, but mostly because every keystroke has been included. With some practice, this will become second nature. Figure 7.8 is a screenshot of the result of the calculations.
Figure 7.8: A within-subjects F problem in Excel
7.4 Presenting Results
Using the data from Figure 7.8 and analyzing it in Excel, we see the output table high-lighted in yellow. The table is broken down reading from left to right in columns that include Sum of Squares (SS), Degrees of Freedom (df), mean squares (MS), F ratio (F). and F critical value (Fcrit). Interpreting the results, we can see that the F ratio is based on the MScol/MSresid, which is 3.92/.416 5 9.4. This value is larger than our F critical value indicating significance at the p , .05 level. Recall that a psychologist has collected data on an incarcerated group over a 9-month span and the number of violent crimes they have committed. Upon analyzing the findings, we see that as time elapses from 1 month to 3 months to 6 months to 9 months there is a significant change in the number of violent acts being committed. However, you cannot be sure about when the two significant differ- ences occurred since there are four points in time in which data was captured (1 month,
suk85842_07_c07.indd 265 10/23/13 1:29 PM
CHAPTER 7Section 7.4 Presenting Results
3 months, 6 months, and 9 months). As a result post hoc tests will be needed to indicate where these differences lie.
In regards to the hypotheses of the repeated measures ANOVA, it would be a comparison of mean differences across time. Therefore,
H0: m1month 5 m3months 5 m6months 5 m9months
The null hypothesis states there is no significant difference between the mean number of violent incidences from 1 month to 3 months to 6 months to 9 months. Keep in mind that ANOVAs are an omnibus test so we are testing any overall differences between the months. There may be differ- ences between any two months and not necessarily all of the months with each other, where we can follow up with paired comparisons.
Ha: m1month ? m3months ? m6months ? m9months
The alternative (or research) hypothesis states there is a significant differ- ence between the mean number of violent incidences from 1 month to 3 months to 6 months, to 9 months. The alternative can also be a prediction in the increase between the mean number of violent incidences from 1 month to 3 months to 6 months to 9 months.
Ha: m1month , m3months , m6months , m9months
To analyze and present results using SPSS, let us first look at an example of a paired sample/dependent t-test then a repeated-measures ANOVA example.
SPSS Example 1: Steps for a Paired (Matched)-Samples t-Test
From the data set provided (Figure 7.9), a college professor wants to look at mean differ- ences in scores over the first two quizzes of his statistics class. With his scores in SPSS, go to Analyze S Compared Means S Paired-Samples. Input Score 1 in the first box and Score 2 in the second box that is available as seen in Figure 7.10. The click OK. The result- ing SPSS output tables are provided in Figure 7.11.
suk85842_07_c07.indd 266 10/23/13 1:29 PM
CHAPTER 7Section 7.4 Presenting Results
Figure 7.9: Data set for quiz scores
Figure 7.10: SPSS steps in performing a paired-samples t-test
suk85842_07_c07.indd 267 10/23/13 1:29 PM
CHAPTER 7Section 7.4 Presenting Results
Figure 7.11: SPSS results of a paired-samples t-test
SPSS Example 2: Steps for a Repeated-Measures ANOVA
This example uses data gathered from SPSS (PASW) On-Line Training Workshop (1999), available at the following link: http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data .htm
The data is measuring cancer treatments over time (see Figure 7.12):
TOTALCIN 5 oral condition at the initial stage
TOTALCW2 5 oral condition at the end of week 2
TOTALCW4 5 oral condition at the end of week 4
TOTALCW6 5 oral condition at the end of week 6
Go to Analyze S General Linear Model S Repeated Measures. As shown in Figure 7.13, type in the Within-Subject Factor Name: CW_Times, Number of Levels: 4, and the Mea- sure Name: CW; then click Define. As shown in Figure 7.14, put in the four TOTALCW variables in simultaneous order in the Within-Subjects Variables box, click Plots, and move CW_Times into the Horizontal Axis. Then click Options and move CW_Times into Display Means for, click Compare Main Effects, and select Sidak from the dropdown box just below. Then click Descriptive statistics and Estimates of effect size. Click Continue and OK.
Paired Samples Statistics
Pair 1
N
14
14
Std. Deviation
42.893
38.7857
Mean Std. Error Mean
4.3992
7.15949
1.1757
1.91345Score_2
Score_1
Paired Samples Test
Pair 1
Std. Deviation
Mean Std. Error Mean
95% Confidence Interval of
the Difference
Lower Bound
Upper Bound
Paired Differences t df Sig. (2-tailed)
Score_2
Score_1 4.10714 7.00598 1.87243 .06201 8.15228 2.193 13 .047
suk85842_07_c07.indd 268 10/23/13 1:29 PM
CHAPTER 7Section 7.4 Presenting Results
Figure 7.12: Data set of cancer treatments over time
Data from http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
Figure 7.13: Repeated-measures steps (part 1)
Data from http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
suk85842_07_c07.indd 269 10/23/13 1:29 PM
CHAPTER 7Section 7.4 Presenting Results
Figure 7.14: Repeated-measures steps (part 2)
Data from http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
suk85842_07_c07.indd 270 10/23/13 1:29 PM
CHAPTER 7Section 7.4 Presenting Results
Figure 7.15: SPSS results of cancer treatments over time
Tests of Within-Subjects Effects Measure: CW
Source Type III Sum of Squares
df F Sig. Partial Eta
Squared
Mean Square
Imputation Number
13.760
13.760
13.760
220.340
220.340
220.340
220.340
384.324
384.324
384.324
384.324
3
2.194
2.424
1.000
72
52.656
58.167
24.000
73.447
100.428
90.913
220.340
5.338
7.299
6.607
16.013
13.760
.000
.000
.000
.000
.364
.364
.364
.364 1
CW_Times
Error(CW_Times)
Sphericty Assumed
Greenhouse- Geisser
Huynh-Feldt
Lower-Bound
Sphericty Assumed
Greenhouse- Geisser
Huynh-Feldt
Lower-Bound
Mauchly’s Test of Sphericitya Measure: CW
Mauchly’s WWithin Subjects Effect
Imputation Number
Approx. Chi-Square
.808 .333CW_Times
Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix. a. Design: Intercept Within Subjects Design: CW_Times b. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of Within-Subjects Effects table.
.596 11.752 5 .039 .731
Epsilonbdf Sig.
Greenhouse- Geisser
Huynh- Feldt
Lower- Bound
1
Descriptive Statistics
Std. DeviationMeanImputation Number N
8.28
6.52
10.36
9.76
2.542
1.531
3.475
3.566
25
25
25
25
1
TOTALCIN
TOTALCW2
TOTALCW4
TOTALCW6
suk85842_07_c07.indd 271 10/23/13 1:29 PM
CHAPTER 7Section 7.4 Presenting Results
Figure 7.15: SPSS results of cancer treatments over time (continued)
Data from http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
Figure 7.16: SPSS output graph of cancer treatments over time
Data from http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
Pairwise Comparisons Measure: CW
(I) CW_Times
(J) CW_Times
Mean Difference
(I-J)
Std. Error Sig.bImputation Number
.011
.000
.001
.011
.043
.262
.000
.043
.800
.001
.262
.800
.504
.694
.755
.504
.709
.717
.694
.709
.489
.755
.717
.489
�1.760*
�3.840*
�3.244*
1.760*
�2.080*
�1.484
3.840*
2.080*
.596
3.244*
1.484
�.596
2
3
4
1
3
4
1
2
4
1
2
3
1
2
3
4
1
95% Confidence Interval for Differenceb
�3.205
�5.830
�5.407
.315
�4.113
�3.539
1.850
.047
�.806
1.082
�.571
�1.997
Lower Bound
�.315
�1.850
�1.082
3.205
�.047
.571
5.830
4.113
1.997
5.407
3.539
.806
Upper Bound
Based on estimated marginal means *. The mean difference is significant at the .05 level. b. Adjustment for multiple comparisons: Sidak.
suk85842_07_c07.indd 272 10/23/13 1:29 PM
CHAPTER 7Section 7.5 Interpreting Results
Based on the results (Figures 7.15 and 7.16), the Descriptive Statistics table shows that there are differences in the mean, how significant those differences are determined by the ANOVA, and the consequent post hoc tests. Next the Mauchly’s test of sphericity shows a signifi- cant value based on the x2 distribution with a significance value at the p , .05 level indicating a violation of the sphericity assumption. To reiterate, this indicates that there is variance between pairs of treat- ments or measures for the group. Since there was a significant differ- ences between pairs of treatments compared to other pairs, a viola- tion has occurred. Therefore, looking at the Tests of Within-Subjects Effects table, sphericity cannot be assumed, and a df adjustment will be made by using the Greenhouse-Geisser or the Huynh-Feldt calcu- lations. As seen in the F-value (13.760) and the df, adjustment does not make any difference as there are significant differences across CW_Times (p , .05). The Pairwise Comparisons table is where we see between-treatment differences indicating that all treatment times are significantly different except between times 2 and 4 (p 5 .262) and 3 and 4 (p 5 .800) that are not statistically significant. The line graph also indicates a trend in differences between the first, second, and third treatment times but not much difference from the third to fourth treatments.
7.5 Interpreting Results
Refer to the most recent edition of the APA manual for specific detail on formatting sta-tistics, but Table 7.4 may be used as a quick guide in presenting the statistics covered in this chapter.
Table 7.4: Guide to APA formatting of F statistic results
Abbreviation or Term Description
F F test statistic score
Partial-h2 Partial-eta-squared: a measure of effect size for ANOVA
W Mauchly’s Test of Sphericity
x2 Distribution used for nonparametric tests such as Mauchly’s test of sphericity and Friedman’s ANOVA
SS Sum of Squares
MS Mean Square
Source: Publication Manual of the American Psychological Association, 6th edition. © 2009 American Psychological Association, pp. 119–122.
Using the results from SPSS Example 1, Figure 7.11, we could present the results in the following way:
Access the data and the accompanying video via the links below to perform this analysis yourself. Both links are resources are provided by the Central Michigan University.
Data link: http://calcnet .mth.cmich.edu/org/spss /Prj_cancer_data.htm
Video: http://calcnet .mth.cmich.edu/org/spss /V16_materials/Video _Clips_v16/19repeated _measures/19repeated _measures.swf
Try It!
suk85842_07_c07.indd 273 10/23/13 1:29 PM
CHAPTER 7Section 7.6 Nonparametric Tests
• There was a significant difference between quiz scores 1 (M 5 42.89) and 2 (M 5 38.79) as the difference in means significantly decreased over time t(13) 5 2.19, p , .05.
Using the results from the SPSS Example 2, Figures 7.15 and 7.16, we could present the results in the following way:
• The overall difference between CW_Times was significantly different using the Greenhouse-Geisser results F(2.19, 52.66) 5 13.76, p 5 .203, partial-h2 5 .364.
• Based on the Sidak pairwise comparison, the CW_1 time (M 5 6.52, SD 5 1.53) was significantly different from all the other times. CW_2 (M 5 8.28, SD 5 2.54) and CW_4 (M 5 10.36, SD 5 3.56) were also significantly different from each other.
7.6 Nonparametric Tests
You may have noticed that for every parametric test, there is a nonparametric equiva-lent. The rationale behind nonparametric tests is to obtain a conservative estimate of significance when violations in parametric assumptions have occurred. Such assumptions include linearity, abnormal distributions, and small data sets.
The nonparametric equivalent of the dependent-samples t-test is the Wilcoxon signed- ranks test (not to be confused with Chapter 5’s Wilcoxon rank-sum test for independent samples t-test). Frank Wilcoxon proposed both of these in a single paper published in 1945. The Wilcoxon signed-ranks W-test is known as the Wilcoxon t-test for dependent samples (not independent ones). In brief, the steps in the calculation of W are calculating the differences between scores, taking an absolute value (removing the 1/2 sign), rank- ing the absolute values, reassigning the original (1/2) sign, and then summing the ranks.
The nonparametric equivalent of the parametric repeated-measures ANOVA is Fried- man’s ANOVA. Essentially the analysis looks at the difference in the mean ranks (instead of means) across time, treatments, or matched/equivalent groups. By analyzing differ- ence in the mean ranks, the analysis is in effect eliminating extremes points, or outliers, in the distribution. As noted, this is the disadvantage of using the mean, as these are affected by outliers. Again, nonparametric tests are a conservative, distribution-free analysis that are used when parametric violations have occurred. As a result, it is more difficult to find significance; on the other hand, they are conservative in that there is a lower probability of having a type I error. One important point to note is that even though mean ranks are used to calculate significant difference between times, treatments, or matched/equivalent groups, the results are reported in terms of the median differences as will be shown in the next example.
Friedman’s nonparametric ANOVA dates back to 1937 and is based on ranked (ordinal) data and the comparison of medians. An alternative test that is nonparametric and similar to Friedman’s test is Cochran’s Q-test, which is used for dichotomous data (i.e., only two response choices as in yes/no).
suk85842_07_c07.indd 274 10/23/13 1:29 PM
CHAPTER 7Section 7.6 Nonparametric Tests
Worked Example of the Friedman’s Nonparametric ANOVA and Wilcoxon Signed-Ranks W Using SPSS
To perform the Friedman’s nonparametric ANOVA using the data set in Figure 7.17, go to Analyze S Nonparametric Tests S Legacy Dialogs S K Related Samples. Place Score_1, Score_2, and Score_3 into the Test Variables box (see Figure 7.18). Click on Statistics, check the Quartiles box, and then click Continue and OK.
Figure 7.17: Data set for the Friedman’s ANOVA test
suk85842_07_c07.indd 275 10/23/13 1:29 PM
CHAPTER 7Section 7.6 Nonparametric Tests
Figure 7.18: Steps in SPSS for the Friedman’s nonparametric ANOVA
Figure 7.19: Results of the Friedman’s nonparametric ANOVA
Descriptive Statistics
50th (Median)
Percentiles
25thN 75th
35.2500
39.000
30.3750
40.0000
44.500
40.2500
43.1250
46.125
44.7500
14
14
14
Score_1
Score_2
Score_3
Ranks
Mean Rank
1.86
2.43
1.71
Score_1
Score_2
Score_3
Test Statisticsa
14
2
4.148
.126
Chi-Square
N
df
Asymp. Sig.
a. Friedman Test
suk85842_07_c07.indd 276 10/23/13 1:29 PM
CHAPTER 7Section 7.6 Nonparametric Tests
Interpreting Results
Based on the results of the Friedman’s nonparametric ANOVA in Figure 7.19, there is no significant difference between quiz scores over time based on the x2(14) 5 4.15, p 5 .126. In other words, the students did not change significantly over testing times. Since there were no significant differences in the scores, some researchers will conclude that no post hoc tests will be needed. The few steps required to perform a post hoc Wilcoxon signed-ranks W-test using software will require minimal additional effort; however, the researcher can then be assured that there exist no such significant differences between groups.
The steps for performing the W-test (Figure 7.20) are Analyze S Nonparametric Tests S Legacy Dialogs S 3 Related Samples. Input Score_1 and Score_2 in Row 1, Score_2 and Score_3 into Row 2, and then Score_1 and Score_3 into Row 3. Click on Options, check Quartiles, and click Continue and OK.
Figure 7.20: Steps in SPSS for the Wilcoxon signed-ranks W-test
suk85842_07_c07.indd 277 10/23/13 1:29 PM
CHAPTER 7Section 7.6 Nonparametric Tests
Figure 7.21: Results of the Wilcoxon signed-ranks W-test
Descriptive Statistics
50th (Median)
Percentiles
25thN 75th
35.2500
39.000
30.3750
40.0000
44.500
40.2500
43.1250
46.125
44.7500
14
14
14
Score_1
Score_2
Score_3
Test Statisticsa
Score_2-Score_1
.045
�2.001b Score_3-Score_2
.900
�.126b Score_3-Score_1
.069
�1.821bZ
a. Wilcoxon Signed Ranks Test b. Based on positive ranks.
Ranks
Sum of RanksMean RankN
4.50
7.17
8.42
6.81
5.88
8.15
13.50
64.50
50.50
54.50
23.50
81.50
3b 9a
2c
14
6e 8d
0f
14
4h 10g
0i
14
Score_2-Score_1
Negative Ranks
Positive Ranks
Ties
Total
Negative Ranks
Positive Ranks
Ties
Total
Negative Ranks
Positive Ranks
Ties
Total
Score_3-Score_2
Score_3-Score_1
a. b. c. d. e. f. g. h. i.
Score_2 Score_2 Score_2 Score_3 Score_3 Score_3 Score_3 Score_3 Score_3
< > = < > = < > =
Score_1 Score_1 Score_1 Score_2 Score_2 Score_2 Score_1 Score_1 Score_1
Asymp. Sig. (2-tailed)
suk85842_07_c07.indd 278 10/23/13 1:29 PM
CHAPTER 7Summary
Looking at the results (Figure 7.21) of the Wilcoxon signed-ranks W-test, we see that using this as a post hoc to the Friedman’s ANOVA does have benefits as it identifies a signifi- cant difference between Scores 1 and 2 that was not detected using Friedman’s ANOVA. This is a significant point in the previously noted debate over whether to run post hocs based on the significance of the F value. As a result, the conclusion here, based on the W-test, is that there is a significant difference between Score_1 (Mdn 5 44.50) and Score_2 (Mdn 5 40.00), Z(14) 5 22.00, p , .05, while there were no significant differences in Score_2 (Mdn 5 40.00), and Score_3 (Mdn 5 40.25), Z(14) 5 20.126, p 5 .90, and Score_1 (Mdn 5 44.50), and Score_3 (Mdn 5 40.25), Z(14) 5 21.821, p 5 .90.
Summary Any statistical procedure has advantages and disadvantages. The downside of the dif- ferent independent-groups designs is that subjects within groups often respond to the independent variable differently. Those differences are a source of error variance that is different for each group. No matter how carefully a researcher randomly selects the groups to be used in a study, there are going to be differences in the way that people in the same group respond to whatever stimulus is offered. Both the before/after t-test and within-subjects F test eliminate that source of error variance by either using the same people repeatedly, or matching subjects on the most important characteristics. Control- ling error variance results in a test that is more likely to detect a significant difference (Objectives 1 and 5).
In dependent-groups designs, using the same group repeatedly allows for the number of participants involved to be fewer (Objectives 1, 2, 3, 4, and 6). One of the downsides to repeated-measures designs is that they take more time to complete. Unless subjects are matched across measures, the different levels of the independent variable cannot be administered concurrently as they can in independent-groups tests. With more time, there is an increased potential for attrition. If one of the participants drops out of a repeated- measures study, the data is lost from all the measures of the dependent variable for that subject (Objectives 2 and 4).
Having noted some of the differences between dependent-groups designs and their independent-groups equivalents, it is important to note their consistencies as well. Whether the test is independent t, before/after t, one-way ANOVA, or a within-subjects F, in each case the independent variable is nominal scale, and the dependent variable is interval or ratio scale (Objective 2).
In addition, two repeated-measures designs were performed (i.e., t-tests and ANOVAs) where we presented several scenarios to test their respective null hypotheses to find sup- port for alternative ones (Objectives 3 and 6). Results and conclusions were presented, interpreted, and reported in APA format (Objectives 7 and 8). Finally, the Wilcoxon signed- ranks W-test and the Friedman’s Nonparametric ANOVA were discussed with an appro- priate example (Objective 9).
suk85842_07_c07.indd 279 10/23/13 1:29 PM
CHAPTER 7Key Terms
There is something else that all the tests in this chapter have in common. They all test the hypothesis of difference. Like the z-test and the one sample t-test, they are about signifi- cant differences. Sometimes, however, the question involves the strength of the relation- ships between variables. Those discussions will introduce correlation and the hypothesis of association that are the focus of Chapter 8.
Key Terms
before/after t-test A dependent-group’s application of the t-test, also known as a pre/post t-test. In this particular applica- tion, one group is measured before and after a treatment.
confounding variables Variables that influence an outcome but are uncontrolled in the analysis and obscure the effects of other variables. For example, if a psycholo- gist is interested in gender-related differ- ences in problem-solving ability but doesn’t control for age differences, differences in gender may be confounded by differences that are actually age-related.
dependent-groups designs Statistical procedures in which the groups are related, either because multiple measures are taken of the same participants or because each participant in a particular group is matched on characteristics relevant to the analysis to a participant in the other groups with the same characteristics. Dependent-groups designs reduce error variance because they reduce score variation due to factors unre- lated to the independent variable.
matched-pairs or dependent-samples t-test A dependent-group’s application of the t-test. In this particular application, each participant in the second group is paired to a participant in the first group with the same characteristics in order to limit the error variance that would other- wise stem from using dissimilar groups.
sphericity Nonsignificant differences in the dependent variable across pairs of treat- ments or times for all participants in the group. By minimizing this within-group error variance, sphericity may be assumed. Significant within-group error variances between pairs of treatments are a violation of sphericity. Such variances are detected using Mauchly’s sphericity (W) test.
within-subjects F The dependent-group’s equivalent of the one-way ANOVA. In this procedure either participants in each group are paired on the relevant characteristics with participants in the other groups, or one group is measured repeatedly after different levels of the independent variable are introduced.
suk85842_07_c07.indd 280 10/23/13 1:29 PM
CHAPTER 7Chapter Exercises
Chapter Exercises
Answers to Try It! Questions The answers to all Try It! questions introduced in this chapter are provided below.
A. Small samples tend to be platykurtic because the data in small samples is often highly variable. This translates into relatively large standard deviations and large error terms.
B. If groups are created by random sampling, they will differ from the population from which they were drawn only by chance. That means that with random sampling, there can be error, but its potential to affect research results dimin- ishes as the sample size grows.
C. The before/after t-test and the matched-pairs t-test differ only in that the before/ after test uses the same group twice and the matched-pairs test matches each subject in the first group with one in the second group who has similar charac- teristics. The calculation and interpretation of the t value are the same in both procedures.
D. The within-subjects test will detect a significant difference more readily than an independent t-test. Power in statistical testing is the likelihood of detecting sig- nificance.
E. Because the same subjects are involved in each set of measures, the within- subjects test allows us to calculate the amount of score variability due to indi- vidual differences in the group and eliminate it because it is the same for each group. This source of error variance is eliminated from the analysis, leaving a smaller error term.
F. The eta-squared value would be the same in either problem. Note that in a one- way ANOVA, eta-squared is the ratio of SSbet to SStot. In the within-subjects F, it is SScol to SStot. Because SSbet and SScol both measure the same variance and the SStot values will be the same in either case, the eta-squared values will likewise be the same. What changes, of course, is the error term. Ordinarily, SSresid will be much smaller than SSwith, but those values show up in the F ratio by virtue of their respective MS values, not in eta-squared.
Review Questions The answers to the odd-numbered items can be found in the answers appendix.
1. A group of clients is being treated for a compulsive behavior disorder. The number of times in an hour that each one manifests the compulsivity is gauged before and after a mild sedative is administered. The data is as follows:
Before After
1. 5 4
2. 6 4
3. 4 3
(continued)
suk85842_07_c07.indd 281 10/23/13 1:29 PM
CHAPTER 7Chapter Exercises
Before After
4. 9 5
5. 5 6
6. 7 3
7. 4 2
8. 5 5
a. What is the standard deviation of the difference scores? b. What is the standard error of the mean for the difference scores? c. What is the calculated value of t? d. Are the differences statistically significant?
2. A researcher is examining the impact that a political ad has on potential donors’ willingness to contribute. The data indicates the amount (in dollars) each is willing to donate before viewing that advertisement and after viewing the advertisement.
Before After
1. 0 10
2. 20 20
3. 10 0
4. 25 50
5. 0 0
6. 50 75
7. 10 20
8. 0 20
9. 50 60
10. 25 35
a. Are there significant differences in the amount? b. What is the value of t if this is done as an independent t-test? c. Explain the difference between before/after and independent t-tests.
3. Participants attend three consecutive sessions in a business seminar. In the first, there is no reinforcement for responding to the session moderator’s questions. In the second, those who respond are provided with verbal reinforcers. In the third,
(continued)
suk85842_07_c07.indd 282 10/23/13 1:29 PM
CHAPTER 7Chapter Exercises
responders receive bits of candy as reinforcers. The dependent variable is the num- ber of times the participants respond in each session.
None Verbal Token
1. 2 4 5
2. 3 5 6
3. 3 4 7
4. 4 6 7
5. 6 6 8
6. 2 4 5
7. 1 3 4
8. 2 5 7
a. Are the column-to-column differences significant? If so, which groups are sig- nificantly different from which?
b. Of what data scale is the dependent variable? c. Calculate and explain the effect size.
4. In the calculations for Exercise 3, what step is taken to minimize error variance? a. What is the source of that error variance? b. If Exercise 3 had been a one-way ANOVA, what would have been the degrees of
freedom for the error term? c. How does the change in degrees of freedom for the error term in the within-
subjects F affect the value of the test statistic?
5. Because SScol in the within-subjects F contains the treatment effect and measure- ment error, if there is no treatment effect, what will be the value of F?
6. Why is matching uncommon in within-subjects F analyses?
7. A group of nursing students is approaching the licensing test. The level of anxiety for each student is measured at 8 weeks prior to the test, then 4 weeks, 2 weeks, and 1 week before the test. Assuming that anxiety is measured on an interval scale, are there significant differences?
Student Number
8 weeks 4 weeks 2 weeks 1 week
1. 5 8 9 9
2. 4 7 8 10
3. 4 4 4 5
4. 2 3 5 5
5. 4 6 6 8
6. 3 5 7 9
7. 4 5 5 4
8. 2 3 6 7
suk85842_07_c07.indd 283 10/23/13 1:29 PM
CHAPTER 7Chapter Exercises
a. Is anxiety related to the time interval? b. Which groups are significantly different from which? c. How much of anxiety is a function of test proximity?
8. A psychology department sponsors a study of the relationship between partici- pation in a particular internship opportunity and students’ final grades. Eight students in their second year of graduate study are matched to eight students in the same year by grade. Those in the first group participate in the internship. Stu- dents’ grades after the second year are compared.
Student Pair Number Internship No Internship
1. 3.6 3.2
2. 2.8 3.0
3. 3.3 3.0
4. 3.8 3.2
5. 3.2 2.9
6. 3.3 3.1
7. 2.9 2.9
8. 3.1 3.4
a. Are the differences statistically significant? b. This should be done as a dependent-samples t-test. Because there are two sepa-
rate groups involved, why?
9. A team of researchers associated with an accrediting body studies the amount of time professors devote to their scholarship before and after they receive tenure. Scores are hours per week.
Professor Number Before Tenure After Tenure
1. 12 5
2. 10 3
3. 5 6
4. 8 5
5. 6 5
6. 12 10
7. 9 8
8. 7 7
a. Are the differences statistically significant? b. What is t if the groups had been independent? c. What is the primary reason for the difference in the two t values?
suk85842_07_c07.indd 284 10/23/13 1:29 PM
CHAPTER 7Chapter Exercises
10. A supervisor is monitoring the number of sick days employees take by month. For seven people, they are as follows:
Employee Number Oct Nov Dec
1. 2 4 3
2. 0 0 0
3. 1 5 4
4. 2 5 3
5. 2 7 7
6. 1 3 4
7. 2 3 2
a. Are the month-to-month differences significant? b. What is the scale of the independent variable in this analysis? c. How much of the variance does the month explain?
11. If the people in each month of the Exercise 10 data were different, it would have been a one-way ANOVA.
a. Would the result have been significant? b. Because total variance (SStot) is the same in either Exercise 10 or 11, and the SScol
(Exercise 10) is the same as SSbet (Exercise 11), why are the F values different?
Analyzing the Research Review the article abstracts provided below. You can then access the full articles via your university’s online library portal to answer the critical thinking questions. Answers can be found in the answers appendix.
Using Repeated Measures ANOVA for a Stress Management Study
Elo, A., Ervasti, J., Kuosma, E., & Mattila, P. (2008). Evaluation of an organizational stress management program in a municipal public works organization. Journal of Occu- pational Health Psychology, 13(1), 10–23.
Article Abstract
The aim of this study was to investigate the effects of employee participation in an orga- nizational stress management program consisting of several interventions aiming to improve psychosocial work environment and well-being. Pre- and postintervention ques- tionnaires were used to measure the outcomes with a 2-year interval. This article describes the background of the program, results of previously published effect studies, and a quali- tative evaluation of the program. The authors also tested the effects of level of participa- tion in all interventions among the employees of the service production units by 2 (time) 3 (group) repeated measures ANOVAs (n 5 625). “Active participation” (more than 5.5 days) had a positive effect on feedback from supervisor and flow of information. Work climate remained on a permanent level while it decreased in the categories of moderate
suk85842_07_c07.indd 285 10/23/13 1:29 PM
CHAPTER 7Chapter Exercises
and nonparticipation. The level of participation did not improve individual well-being or other aspects of psychosocial work environment as postulated by the work stress models. The qualitative evaluation and practical conclusions drawn by the management of the Organization provided a positive impression of the impact of the program.
Critical Thinking Questions
1. What is the independent variable for which subjects are being tested, under all treatment levels?
2. Explain the importance of power in relation to this within-group design. 3. What is the disadvantage of testing subject group’s pre- and postparticipation
program intervention? 4. Does this study need to worry about sphericity when conducting the repeated-
measures ANOVA? Why or why not?
Using ANOVA for a Personality Disorder Scales Study
Wise, E. A. (1995). Personality disorder correspondence among the MMPI, MBHI, and MCMI. Journal of Clinical Psychology, 51(6), 790–798.
Article Abstract
MMPI, MBHI, and MCMI personality disorder scales were analyzed for convergent and discriminant validity. Friedman’s ANOVA indicated that there were no significant differ- ences among the sample’s averaged scale scores. Further analyses of the data, however, demonstrated that the Millon instruments classified significantly more of the sample as personality disordered when compared to Morey’s MMPI personality disorder scales. In addition, codetype correspondence among the three instruments was only 4 to 6%. When the instruments were analyzed in a pair-wise fashion, codetype correspondence increased to approximately 10 to 20%. These data indicate that these personality disorder scales do not demonstrate construct equivalence, particularly at the level of the individual profile.
Critical Thinking Questions
1. Why did this study run a Friedman’s Nonparametric ANOVA?
2. The Friedman’s Nonparametric ANOVA showed no significant differences among the tests by scale means. What was the significance level for this to be true?
3. What is reported in Friedman’s Nonparametric ANOVA? Please label what each piece is from the Friedman output when comparing MBHI and MCMI compared to MMPI.
4. Suppose the Friedman’s ANOVA was significant. Would we run a post hoc test? What type of post hoc test?
suk85842_07_c07.indd 286 10/23/13 1:29 PM