ANOVA

Steven911

06CH_Sukal_Statistics.pdf

Home >Social Science homework help >Sociology homework help >ANOVA

iStockphoto/Thinkstock

chapter 6

Analysis of Variance (ANOVA)

Learning Objectives

After reading this chapter, you will be able to. . .

1. explain why it is a mistake to analyze the differences between more than two groups with multiple t-tests.

2. relate sum of squares to other measures of data variability.

3. compare and contrast t-test with ANOVA.

4. demonstrate how to determine which group is significant in an ANOVA with more than two groups.

5. explain the use of eta-squared in ANOVA.

6. present statistics based on ANOVA results in APA format.

7. interpret results and draw conclusions of ANOVA.

8. discuss nonparametric Kruskal-Wallis H-test compared to the ANOVA.

CO_LO

CO_TX

CO_NL

CO_CRD

suk85842_06_c06.indd 183 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

Ronald. A. Fisher was present at the creation of modern statistical analysis. During the early part of the 20th century, Fisher worked at an agricultural research station in rural southern England. In his work analyzing the effect of pesticides and fertilizers on crop yields, he was stymied by the limitations in Gosset’s independent t-test, which allowed him to compare only one pair of samples at a time. In the effort to develop a more com- prehensive approach, Fisher created analysis of variance (ANOVA).

Like Gosset, he felt that his work was important enough to publish, and like Gosset in his effort to publish t-test, Fisher had opposition. In Fisher’s case, the opposition came from a fellow statistician, Karl Pearson. This is the same man who created the first department of statistical analysis at University College, London. In Chapters 9 and 11 you will study some of Pearson’s work with correlations as well as Spearman rho (r) and Chi-square (x2), which are the analysis of categorical (nominal and ordinal) data. Pearson also founded what is probably the most prominent journal for statisticians, Biometrika. Pearson was an advocate of making one comparison at a time and of using the largest groups possible to make those comparisons.

When Fisher submitted his work to Pearson’s journal with procedures suggesting that samples can be small and many comparisons can be made in the same analysis, Pear- son rejected the manuscript. So began a long and increasingly acrimonious relationship between two men who would become giants in the field of statistical analysis and end up in the same department at University College. Interestingly, Gosset also gravitated to the department and managed to get along with both of them.

Fisher’s contributions affect more than this chapter. Besides the development of the ANOVA, the concept of statistical significance is his as well as hypothesis testing discussed in Chapter 5. Note that although a ubiquitous phenomenon, significance testing itself is not always accepted by other statisticians. One such adversary is William [Bill] Kruskal, who consequently derived the nonparametric version of the ANOVA—the Kruskal-Wallis H-test, which is discussed in this chapter. Despite these philosophical and statistical dif- ferences, R. A. Fisher made an enormous contribution to the field of quantitative analysis, as did his nemesis, Karl Pearson with additional statistical contributions by William Sealy Gosset and Bill Kruskal.

6.1 One-Way Analysis of Variance

In any experiment, scores and measurements vary for many reasons. If a researcher is interested in whether children will emulate the videotaped behavior of adults whom they have watched, any differences in the children’s behavior from before they see the adults to after are attributed primarily to the adults’ behaviors. But even if all of the chil- dren watch with equal attentiveness, it is likely there will be differences in their behaviors

TX_DC

BLF

BLL

suk85842_06_c06.indd 184 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

after the video. Some of those differences might stem from age differences among the chil- dren. Perhaps the amount of exposure children otherwise have to television will prompt differences in their behavior. Probably differences in their background experiences will also affect the way they behave.

In an analysis of how behavior changes as a result of watching the video, the independent variable (IV) is whether or not the children have seen the video. Changes in their behavior, the dependent variable (DV), reflect the effect of the IV, but they also reflect all the other factors that prompt the children to behave differently. An IV is also referred to as a factor, particularly in procedures that involve more than one IV. Behavior changes that are not related to the IV reflect the presence of error variance attributed by other factors known as confounding variables.

When researchers work with human subjects, some level of error variance is inescap- able. Even under tightly controlled conditions where all members of a sample receive exactly the same treatment, the subjects are unlikely to respond the same way. There are just too many confounding variables that also affect their behavior. Fisher ’s approach was to calculate the total variability in a problem and then analyze it, thus the name analysis of variance.

Any number of IVs can be included in an ANOVA. Here, we are interested primarily in ANOVA in its simplest form, a procedure called one-way ANOVA. The “one” in one- way ANOVA indicates that there is just one IV in this model. In that regard, one-way ANOVA is similar to the independent-samples t-test discussed in Chapter 5. Both tests have one IV and one DV. The difference is that the independent t-test allows for an IV with just two groups, but the IV in ANOVA can be any number of groups generally more than two. In other words, a one-way ANOVA with just two groups is the same as an independent-samples t-test where the statistic calculated in ANOVA, F is equal to t2; this is addressed and illustrated in Section 6.5.

The ANOVA Advantage

The ANOVA and the t-test both answer the same question: Are there significant differ- ences between groups? So why bother with another test when we have the t-test? Suppose someone has developed a group therapy program for people with anger management problems and the question is, are there significant differences in the behavior of clients who spend (a) 8, (b) 16, and (c) 24 hours in therapy over a period of weeks? Why not answer the question by performing three t-tests as follows?

1. Compare the 8-hour group to the 16-hour group. 2. Compare the 16-hour group to the 24-hour group. 3. Compare the 8-hour group to the 24-hour group.

A What does the “one” in one-way ANOVA refer to?

Try It!

suk85842_06_c06.indd 185 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

The Problem of Multiple Comparisons

These three tests represent all possible comparisons, but there are two problems with this approach. First, all possible comparisons is a good deal more manageable if there are three groups than if there are, say, five groups. If there were five groups, labeled a through e, note the number of comparisons needed to cover all possible comparisons:

1. a to b 2. a to c 3. a to d 4. a to e 5. b to c 6. b to d 7. b to e 8. c to d 9. c to e

10. d to e

All possible comparisons among three tests involve 10 tests as seen above to cover all the combinations of tests.

Family-Wise Error

The other problem is an issue of inflated error in hypothesis testing when doing multiple tests known as family-wise error. Recall that the potential for type I error (a) is deter- mined by the level at which the test is conducted. At a 5 .05, any significant finding will result in a type I error an average of 5% of the time. However, that level of error assumes that each test is conducted with new data thereby increasing the family-wise error rate (FWER). Specifically, if statistical testing is done repeatedly with the same data, the poten- tial for type I error does not remain fixed at .05 (or whatever the level of the testing), but grows. In fact, if 10 tests are conducted in succession with the same data as with groups labeled a, b, c, d, and e mentioned earlier, and each finding is significant, by the time the 10th test is completed, the potential for alpha error is FWER 5 .40 or a 40% error prob- ability, as the following procedure illustrates:

P a 5 1 2 (1 2 pa)n

Where

Pa 5 the probability of alpha error overall

pa 5 the probability of alpha error for the initial significant finding

n 5 the number of tests conducted where the result was significant

P a 5 1 2 (1 2 .052 10 5 1 2 .599

FWER 5 .401

The probability of a type I error at this point is 4 in 10 or 40%!

suk85842_06_c06.indd 186 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

The business of raising the (1 2 pa) difference to the 10th power (or however many com- parisons there are) is not only tedious, but the more important problem is that the prob- ability of a type I error does not remain fixed when there are successive significant results with the same data. Therefore, using multiple t-tests is never a good option.

In the end, running one test in an overall ANOVA will control for inflated FWER. An ANOVA is therefore termed an omnibus test, as it will test the overall significance of the research model based on the differences between sample means. It will not tell you which two means are significantly different, which is why follow-up post hoc comparisons are executed. These concepts will be discussed in further detail throughout the chapter.

The Variance in Analysis of Variance (ANOVA)

To analyze variance, Fisher began by calculating total variability from all sources. He recognized that when scores vary in a research study, they do so for two reasons. They vary because the independent variable (the “treatment”) has had an effect, and they vary because of factors beyond the control of the researcher, producing the error variance referred to earlier.

The test statistic in ANOVA is the F ratio (named for Fisher), which is treatment variance (variance that can be explained by the IV on the DV) divided by error variance (variance that cannot be explained due to confounding variables on the DV). When F is large, it indi- cates that the difference between at least two of the groups in the analysis is not random and that there are significant differences between at least two group means. When the F ratio is small (close to a value of 1), it indicates that the IV has not had enough impact to overcome error variability, and the differences between groups are not significant. We will return to the F ratio when we discuss Formula 6.4.

Variance Between and Within Groups

If three groups of the same size are all selected from one population, they could be repre- sented by three distributions, as shown in Figure 6.1. They do not have exactly the same mean, but that is because even when they are selected from the same population, samples are rarely identical. Those initial differences between sample means indicate some degree of sampling error.

Figure 6.1: Three groups drawn from the same population

suk85842_06_c06.indd 187 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

The reason that each of the three distributions has width is that there are differences within each of the groups. Even if the sample means were the same, individuals selected to the same sample will rarely manifest precisely the same level of whatever is measured. If a population is identified—for example, a population of the academically gifted—and a sample is drawn from that population, the individuals in the sample will not all have the same level of ability. Because they are all members of the population of the academically gifted, they will probably all be higher than the norm for academic ability, but there will

still be differences in the subjects’ academic ability within the sample. These differences within are sources of error variance.

The treatment effect is indicated in how the IV affects the way the DV is manifested. For example, three groups of subjects are administered different levels of a mild stimulant (the IV) to see the effect on level of attentiveness. The issue in ANOVA is whether the IV, the treat- ment, creates enough additional between-groups variability to exceed any error variance. Ultimately, the question is whether, as a result of the treatment, the samples still represent populations with the same mean, or whether, as is suggested by the distributions in Figure 6.2, they may represent populations with different means.

Figure 6.2: Three groups after the treatment

The within-groups variability in these three distributions is the same as it was in the dis- tributions in Figure 6.1. It is the between-groups variability that has changed in Figure 6.2. More particularly, it is the difference between the group means that has changed. Although there was some between-groups variability before the treatment, it was comparatively minor and probably reflected sampling variability. After the treatment, the differences between means are much greater. What F indicates is whether group differences are great enough to be statistically significant not due to chance.

The Statistical Hypotheses in One-Way ANOVA

The hypotheses are very much like they were for the independent t-test, except that they accommodate more groups. For the t-test, the null hypothesis is written H0: m1 5 m2. It indicates that the two samples involved were drawn from populations with the same means. For a one-way ANOVA with three groups, the null hypothesis has this form:

H0: m1 5 m2 5 m3

B If a psychologist is interested in the impact that 1 hour, 5 hours, or 10 hours of therapy have on client behavior, how are behavior differences related to gender explained?

Try It!

suk85842_06_c06.indd 188 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

It indicates that the three samples were drawn from populations with the same means.

Things have to change for the alternate hypothesis, however, because with three groups, there is not just one possible alternative. Note that each of the following is possible:

a. Ha: m1 ? m2 5 m3 Sample 1 represents a population with a mean value different from the mean of the population represented by Samples 2 and 3.

b. Ha: m1 5 m2 ? m3 Samples 1 and 2 represent a population with a mean value different from the mean of the population represented by Sample 3.

c. Ha: m1 5 m3 ? m2 Samples 1 and 3 represent a population with a mean value different from the population represented by Sample 2.

d. Ha: m1 ? m2 ? m3 All three samples represent populations with different means.

Because the several possible alternative outcomes multiply rap- idly when the number of groups increases, a more general alternate hypothesis is given. Either all the groups involved come from popu- lations with the same means, or at least one of them does not. So the form of the alternate hypothesis for an ANOVA with any number of groups is simply

Ha: At least one of the means is different from the other means.

Also remember that all the hypotheses are either nondirectional, in that there is no predic- tion of which sample mean will be higher than the others:

Nondirectional alternative hypothesis: Ha: m1 ? m2 ? m3

or directional, in that there is a prediction of which sample mean will be higher than the other means. As seen below for the directional alternative hypothesis, there is a prediction that m3 will be higher than m2 that is higher than m1.

Directional alternative hypothesis: Ha: m1 , m2 , m3

As a researcher, it is important to consider the value of prediction in terms of a one-tailed test versus no prediction in a two-tailed test as discussed in Chapter 5.

Measuring Data Variability in the One-Way ANOVA

We have discussed several different measures of data variability to this point, including the standard deviation (s), the variance (s2), the standard error of the mean (SEM), the standard error of the difference (SEd), and the range. For ANOVA, Fisher added one more,

C How many t-tests would it take to make all possible comparisons in a procedure with six groups?

Try It!

suk85842_06_c06.indd 189 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

the sum of squares (SS). The sum of squares is the sum of the squared differences between scores and one of several mean values. In ANOVA,

• One sum-of-squares value involves the differences between individual scores and the mean of all the scores in all the groups (the grand mean). This is the called the sum of squares total (SStot) because it measures all variability from all sources.

• A second sum-of-squares value indicates the difference between the means of the individual groups and the grand mean. This is the sum of squares between (SSbet). It measures the effect of the IV, the treatment effect, as well any differ- ences that existed between the groups before the study began.

• A third sum-of-squares value measures the difference between scores in the sam- ples and the means of their sample. These sum of squares within (SSwith) values reflect the differences in the way subjects respond to the same stimulus. Because this value is entirely error variance, it is also called the sum of squares error (SSerr) or the sum of squares residual (SSres).

All Variability From All Sources: The Sum of Squares Total (SStot )

There are multiple formulas for SStot. They all provide the same answer, but some make more sense to look at than others. Formula 6.1 makes it clear that at the heart of SStot is the difference between each individual score (x) and the mean of all scores, or the grand mean, for which the notation is MG.

SStot 5 a (x 2 MG )2 Formula 6.1

Where

x 5 each score in all groups

MG 5 the mean of all data from all groups, the grand mean

To calculate SStot, follow these steps:

1. Sum all scores from all groups and divide by the number of scores to determine the grand mean, MG.

2. Subtract MG from each score (x) in each group, and then square the difference: (x 2 MG)

3. Sum all the squared differences: a (x 2 MG) 2

The Treatment Effect: The Sum of Squares Between (SSbet )

The between-groups variance, the sum of squares between (SSbet), contains the variability due to the independent variable, the treatment effect. It will also contain any initial differ- ences between the groups, which of course is error variance. For three groups labeled a, b, and c, the formula is

SSbet 5 (Ma 2 MG ) 2na 1 (Mb 2 MG)

2nb 1 (Mc 2 MG ) 2nc Formula 6.2

suk85842_06_c06.indd 190 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

Where

Ma 5 the mean of the scores in the first group (a)

MG 5 the same grand mean used in SStot na 5 the number of scores in the first group (a)

To calculate SSbet, follow these steps:

1. Determine the mean for each group: Ma, Mb, and so on. 2. Subtract MG from each sample mean and square the difference: (Ma 2 MG)

3. Multiply the squared differences by the number in the group: (Ma 2 MG) 2na

4. Repeat for each group. 5. Sum ( a ) the results across groups.

The value that results from Formula 6.2 represents the differences between groups and the mean of all the data.

The Error Term: The Sum of Squares Within (SSwith )

When a group receives the same treatment but individuals within the group respond dif- ferently, their differences constitute error—unexplained variability. Maybe subjects’ age differences are the cause, or perhaps the circumstances of their family lives, but for some reason not analyzed in the particular study, subjects in the same group often respond dif- ferently to the same stimulus. The amount of this unexplained variance within the groups is calculated with the SSwith, for which we have Formula 6.3:

SSwith 5 a (xa 2 Ma)2 1 a (xb 2 Mb)2 1 a (xc 2 Mc )2 Formula 6.3

Where

SSwith 5 the sum of squares within

xa 5 each of the individual scores in Group a

Ma 5 the score mean in Group a

To calculate SSwith, follow these steps:

1. Take the mean for each of the groups; these are available from calculating the SSbet earlier.

2. From each score in each group, a. subtract the mean of the group, b. square the difference, and c. sum the squared differences within each group.

3. Repeat this for each group. 4. Sum the results across the groups.

D When will the sum-of-squares values be negative?

Try It!

suk85842_06_c06.indd 191 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

The SSwith (or the SSerr) measures the degree to which scores vary due to factors not con- trolled in the study, fluctuations that constitute error variance.

Because the SStot consists of the SSbet and the SSwith, once the SStot and the SSbet are known, the SSwith can be determined by subtraction:

SStot 2 SSbet 5 SSwith

However, there are two reasons not to determine the SSwith by simple subtraction. First, if there is an error in the SSbet, it is only perpetuated with the subtraction. Second, calculat- ing the value with Formula 6.3 helps clarify that what is being determined is a measure of how much variation in scores there is within each group. For the few problems done entirely by hand, we will take the “high road” and use the conceptual formula.

Conceptual formulas (6.1, 6.2, and 6.3) clarify the logic involved, but in the case of analysis of variance, they also require a good deal of tiresome subtracting and then squaring of numbers. To minimize the tedium, the data sets here are all relatively small. When larger studies are done by hand, people often shift to the “calculation formulas” for simpler arithmetic, but there is a sacrifice to clarity. Happily, you will seldom ever find yourself doing manual ANOVA calculations, and after a few simple longhand problems, this chap- ter will explain how you can utilize Excel or SPSS for help with the larger data sets.

Calculating the Sums of Squares

A researcher is interested in the level of social isolation people feel in small towns (a), suburbs (b), and cities (c). Participants randomly selected from each of those three settings take the Assessment List of Nonnormal Environments (ALONE), for which the following scores are available:

a. 3, 4, 4, 3 b. 6, 6, 7, 8 c. 6, 7, 7, 9

We know we are going to need the mean of all the data (MG) as well as the mean for each group (Ma, Mb, Mc ), so we will start there. Verify that

a x 5 70 and N 5 12, so that MG 5 5.833. For the small-town subjects,

a xa 5 14 and na 5 4, so Ma 5 3.50. For the suburban subjects,

a xb 5 27 and nb 5 4, so Mb 5 6.750.

suk85842_06_c06.indd 192 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

For the city subjects,

a xc 5 29 and nc 5 4, so Mc 5 7.250. For the sum of squares total, the formula is

SStot 5 a (x 2 MG)2. SStot 5 41.67

The calculations are in Table 6.1.

Table 6.1: Calculating the sum of squares total (SStot )

SStot 5 a (x 2 MG) 2, MG 5 5.833

For the Town Data

x 2 M 1x 2 M2 2

3 2 5.833 5 22.833 8.026

4 2 5.833 5 21.833 3.360

3 2 5.833 5 22.833 8.026

For the Suburb Data

x 2 M 1x 2 M2 2

6 2 5.833 5 0.167 0.028

7 2 5.833 5 1.167 1.362

8 2 5.833 5 2.167 4.696

For the City Data

x 2 M 1x 2 M2 2

6 2 5.833 5 0.167 0.028

7 2 5.833 5 1.167 1.362

9 2 5.833 5 3.167 10.030

SStot 5 41.668

suk85842_06_c06.indd 193 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

For the sum of squares between, the formula is

SSbet 5 (Ma 2 MG ) 2na 1 (Mb 2 MG)

2nb 1 (Mc 2 MG ) 2nc

The SSbet involves three groups rather than the 12 individuals required for SStot. The SSbet is as follows:

SSbet 5 (Ma 2 MG ) 2na 1 (Mb 2 MG)

2nb 1 (Mc 2 MG ) 2nc

5 (3.5 2 5.833)2(4) 1 (6.75 2 5.833)2(4) 1 (7.25 2 5.833)2(4)

5 21.772 1 3.364 1 8.032

5 33.17

The SSwith indicates the error variance by determining the differences between individual scores in a group and their means. The formula is

SSwith 5 a (xa 2 Ma )2 1 a (xb 2 Mb)2 1 a (xc 2 Mc)2

SSwith 5 8.50

The calculations are in Table 6.2.

Because we calculated the SSwith directly instead of determining it by subtraction, we can now check for accuracy by adding its value to the SSbet. If the calculations are correct, SSwith 1 SSbet 5 SStot. For the isolation example, we have

8.504 1 33.168 5 41.67

In the initial calculation, SStot 5 41.67. The difference of .004 is round-off difference and is unimportant.

Although they were not called sums of squares, we have calculated an equivalent statis- tic since Chapter 1. At the heart of the standard deviation calculation is those repetitive x 2 M differences for each score in the sample. The difference values are then squared and summed much as they are for calculating SSwith and SStot. Further, the denominator in the standard deviation calculation is n 2 1, which should look suspiciously like some of the degrees of freedom values we will discuss in the next section.

Interpreting the Sums of Squares

The different sums-of-squares values are measures of data variability, which makes them like the standard deviation, variance measures, the standard error of the mean, and so on. But there is an important difference between SS and the other statistics. In addition to data variability, the magnitude of the SS value reflects the number of scores included. Because sums of squares are in fact the sum of squared values, the more values there are, the larger

E What will SStot 2 SSwith yield?

Try It!

suk85842_06_c06.indd 194 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

the value becomes. With statistics like the standard deviation, adding more values near the mean of the distribution actually shrinks its value. But this cannot happen with the sum of squares. Additional scores, whatever their value, will almost always increase the sum-of-squares.

Table 6.2: Calculating the sum of squares within (SSwith )

SSwith 5 a 1xa 2 Ma 2 2 1 a 1xb 2 Mb 2 2 1 a 1xc 2 Mc 2 2

3, 4, 4, 3

6, 6, 7, 8

6, 7, 7, 9

Ma 5 3.50, Mb 5 6.750, Mc 5 7.250

For the Town Data

x 2 M 1x 2 M2 2

3 2 3.50 5 20.50 0.250

4 2 3.50 5 0.50 0.250

3 2 3.50 5 20.50 0.250

For the Suburb Data

x 2 M 1x 2 M2 2

6 2 6.750 5 20.750 0.563

7 2 6.750 5 0.250 0.063

8 2 6.750 5 1.250 1.563

For the City Data

x 2 M 1x 2 M2 2

6 2 7.250 5 21.250 1.563

7 2 7.250 5 20.250 0.063

9 2 7.250 5 1.750 3.063

SSwith 5 8.504

suk85842_06_c06.indd 195 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

This characteristic makes the sum of squares difficult to interpret. What constitutes much or little variability depends not just on how much difference there is between the scores and the mean to which they are compared but also on how many scores there are. Fisher turned the sum-of-squares values into a “mean measure of variability” by dividing each sum-of-squares value by its degrees of freedom. The SS 4 df operation creates what is called the mean square (MS).

In the one-way ANOVA, there is a MS value associated with both the SSbet and the SSwith (SSerr). There is no mean squares total given in the table, but if this were to be calculated, it is the total variance (SSbet 1 SSwith) divided by the entire data set as a single sample minus one (N 2 1). Dividing the SStot by its degrees of freedom (N 2 1) would provide a mean level of overall variability, but that would not help answer questions about the ratio of between-groups variance to within-groups variance.

The degrees of freedom for each of the sums of squares calculated for the one-way ANOVA are as follows:

• Degrees of freedom total 5 N 2 1, where N is the total number of scores • Degrees of freedom for between (dfbet) 5 k 2 1, where k is the number of groups

SSbet 4 dfbet 5 MSbet

• Degrees of freedom for within (dfwith) 5 N 2 k

SSwith 4 dfwith 5 MSwith

Although there is no MStot, we need the sum of squares for total (SStot) and the degrees of freedom for total (dftot) because they provide an accuracy check:

a. The sums of squares between and within should equal total sum of squares:

SSbet 1 SSwith 5 SStot

b. The sum of degrees of freedom between and within should equal degrees of freedom total:

dfbet 1 dfwith 5 dftot

Remembering these relationships can help reveal errors. In other words, the concept of error is unexplained or unsystematic variance within groups (SSwith)

that is considered variance not caused by experimental manipulation, as opposed to explained or systematic variance due to experimental variance between groups (SSbet).

The F Ratio

The mean squares for between and within are the components of F, and the F ratio is the test statistic in ANOVA. As noted earlier in this chapter, the F is a ratio:

F 5 MSbet MSwith

Formula 6.4

suk85842_06_c06.indd 196 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

The issue is whether the MSbet, which contains the treatment effect and some error, is substantially greater than the MSwith, which contains only error. This is illustrated in Figure 6.3 by comparing the distance from the mean of the first distribution to the mean of the second distribution, the A variance, to the B and C variances, which indicate the differences within groups.

If the MSbet/MSwith ratio is large—it must be substantially greater than 1—the difference between groups is likely to be significant. When that ratio is small (close to 1), F is likely to be nonsignificant. How large F must be to be significant depends on the degrees of free- dom for the problem, just as it did for the t-tests.

Figure 6.3: The F-ratio: comparing variance between groups (A) to variance within groups (B 1 C)

The ANOVA Table

With the sums of squares and the degrees of freedom for the different values in hand, the ANOVA results are presented in a table often referred to as a source table, indicating the sources of variability that indicates

• the source of the variance, • the sums-of-squares values, • the degrees of freedom

for total degrees of freedom, dftot 5 N 2 1 (because N 5 12 dftot 5 11),

for between degrees of freedom, dfbet 5 k 2 1(because k, the number of groups, 5 3 dfbet 5 k 2 1, dfbet 5 2),

for within degrees of freedom, df 5 N 2 k (because N 5 12 and k 5 3, dfwith 5 9,

• the mean square values, which are SS/df, and • the F value, which is the MSbet/MSwith.

B C

suk85842_06_c06.indd 197 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

For the social isolation problem, the ANOVA table is

Source SS df MS F

Between 33.17 2 16.58 17.55

Within 8.50 9 .95

Total 41.67 11

The table makes it easy to check some of the results for accuracy. Check that

SSbet 1 SSwith 5 SStot

Also verify that

dfbet 1 dfwith 5 dftot

In the course of checking results, note the sums-of-squares values can never be negative. Because the SS values are literally sums of squares, a negative number indicates a calcula- tion error somewhere because there is no such thing as negative variability (Chapter 1). The smallest a sum-of-squares value can be is 0, and this can happen only if all scores in the sum-of-squares calculation have the same value.

Understanding F

The larger F is, the more likely it is to be statistically significant, but how large is large enough? In the preceding ANOVA table, F 5 17.551, which seems like a comparatively large value.

• The fact that F is determined by dividing MSbet by MSwith indicates that whatever the value of F is indicates the number of times MSbet is greater than MSwith.

• Here MSbet is 17.551 times greater than MSwith, which seems promising, but to be sure, it must be compared to a value from the critical values of F (Table 6.3, which is repeated in the Appendix as Table C).

As with the t-test, as degrees of freedom increase, the critical values decline. The difference is that with F two df values are involved: one for the MSbet and the other for the MSwith.

• In Table 6.3 (also Table C in the Appendix), the critical value is identified by moving across the top of the table to the dfbet (the df numerator) and then moving down that column to the dfwith (the df denominator). According to the social isola- tion test ANOVA table above, these are

the dfbet 5 2 and

the dfwith 5 9.

suk85842_06_c06.indd 198 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

• The intersection of the 2 at the top and the 9 along the left side of the table leads to two critical values, one in regular type, which is for a 5 .05 and is the default, and one in bold type, which is the value for testing at the critical a 5 .01.

• The critical value when testing at p 5 .05 is 4.26. • The critical value indicates that any ANOVA test with 2 and 9 df that has an F

value equal to or greater than 4.26 is statistically significant.

The social isolation differences between the three groups are probably not due to sampling variability. The statistical decision is to reject H0. The relatively large value of F—it is more than four times the critical value—indicates that of the differences in social isolation, much more of it is probably related to where respondents live than to the amount that is error variance.

Table 6.3: The critical values of F

Values in regular type indicate the critical value for p = .05. Values in bold type indicate the critical value for p = .01.

df denominator df numerator

1 2 3 4 5 6 7 8 9 10

2 18.51 98.49

19.00 99.01

19.16 99.17

19.25 99.25

19.30 99.30

19.33 99.33

19.35 99.36

19.37 99.38

19.38 99.39

19.40 99.40

3 10.13 34.12

9.55 30.82

9.28 29.46

9.12 28.71

9.01 28.24

8.94 27.91

8.89 27.67

8.85 27.49

8.81 27.34

8.79 27.23

4 7.71 21.20

6.94 18.00

6.59 16.69

6.39 15.98

6.26 15.52

6.16 15.21

6.09 14.98

6.04 14.80

6.00 14.66

5.96 14.55

5 6.61 16.26

5.79 13.27

5.41 12.06

5.19 11.39

5.05 10.97

4.95 10.67

4.88 10.46

4.82 10.29

4.77 10.16

4.74 10.05

6 5.99 13.75

5.14 10.92

4.76 9.78

4.53 9.15

4.39 8.75

4.28 8.47

4.21 8.26

4.15 8.10

4.10 7.98

4.06 7.87

7 5.59 12.25

4.74 9.55

4.35 8.45

4.12 7.85

3.97 7.46

3.87 7.19

3.79 6.99

3.73 6.84

3.68 6.72

3.64 6.62

8 5.32 11.26

4.46 8.65

4.07 7.59

3.84 7.01

3.69 6.63

3.58 6.37

3.50 6.18

3.44 6.03

3.39 5.91

3.35 5.81

9 5.12 10.56

4.26 8.02

3.86 6.99

3.63 6.42

3.48 6.06

3.37 5.80

3.29 5.61

3.23 5.47

3.18 5.35

3.14 5.26

10 4.96 10.04

4.10 7.56

3.71 6.55

3.48 5.99

3.33 5.64

3.22 5.39

3.14 5.20

3.07 5.06

3.02 4.94

2.98 4.85

11 4.84 9.65

3.98 7.21

3.59 6.22

3.36 5.67

3.20 5.32

3.09 5.07

3.01 4.89

2.95 4.74

2.90 4.63

2.85 4.54

12 4.75 9.33

3.89 6.93

3.49 5.95

3.26 5.41

3.11 5.06

3 4.82

2.91 4.64

2.85 4.50

2.80 4.39

2.75 4.30

13 4.67 9.07

3.81 6.70

3.41 5.74

3.18 5.21

3.03 4.86

2.92 4.62

2.83 4.44

2.77 4.30

2.71 4.19

2.67 4.10

F If the F in an ANOVA is 4.0 and the MSwith 5 2.0, what will be the value of MSbet?

Try It!

(continued)

suk85842_06_c06.indd 199 10/23/13 1:40 PM

CHAPTER 6Section 6.1 One-Way Analysis of Variance

Table 6.3: The critical values of F (continued)

Values in regular type indicate the critical value for p = .05. Values in bold type indicate the critical value for p = .01.

df denominator df numerator

1 2 3 4 5 6 7 8 9 10

14 4.60 8.86

3.74 6.51

3.34 5.56

3.11 5.04

2.96 4.69

2.85 4.46

2.76 4.28

2.70 4.14

2.65 4.03

2.60 3.94

15 4.54 8.68

3.68 6.36

3.29 5.42

3.06 4.89

2.90 4.56

2.79 4.32

2.71 4.14

2.64 4.00

2.59 3.89

2.54 3.80

16 4.49 8.53

3.63 6.23

3.24 5.29

3.01 4.77

2.85 4.44

2.74 4.20

2.66 4.03

2.59 3.89

2.54 3.78

2.49 3.69

17 4.45 8.40

3.59 6.11

3.20 5.19

2.96 4.67

2.81 4.34

2.70 4.10

2.61 3.93

2.55 3.79

2.49 3.68

2.45 3.59

18 4.41 8.29

3.55 6.01

3.16 5.09

2.93 4.58

2.77 4.25

2.66 4.01

2.58 3.84

2.51 3.71

2.46 3.60

2.41 3.51

19 4.38 8.18

3.52 5.93

3.13 5.01

2.90 4.50

2.74 4.17

2.63 3.94

2.54 3.77

2.48 3.63

2.42 3.52

2.38 3.43

20 4.35 8.10

3.49 5.85

3.10 4.94

2.87 4.43

2.71 4.10

2.60 3.87

2.51 3.70

2.45 3.56

2.39 3.46

2.35 3.37

21 4.32 8.02

3.47 5.78

3.07 4.87

2.84 4.37

2.68 4.04

2.57 3.81

2.49 3.64

2.42 3.51

2.37 3.40

2.32 3.31

22 4.30 7.95

3.44 5.72

3.05 4.82

2.82 4.31

2.66 3.99

2.55 3.76

2.46 3.59

2.40 3.45

2.34 3.35

2.30 3.26

23 4.28 7.88

3.42 5.66

3.03 4.76

2.80 4.26

2.64 3.94

2.53 3.71

2.44 3.54

2.37 3.41

2.32 3.30

2.27 3.21

24 4.26 7.82

3.40 5.61

3.01 4.72

2.78 4.22

2.62 3.90

2.51 3.67

2.42 3.50

2.36 3.36

2.30 3.26

2.25 3.17

25 4.24 7.77

3.39 5.57

2.99 4.68

2.76 4.18

2.60 3.85

2.49 3.63

2.40 3.46

2.34 3.32

2.28 3.22

2.24 3.13

26 4.23 7.72

3.37 5.53

2.98 4.64

2.74 4.14

2.59 3.82

2.47 3.59

2.39 3.42

2.32 3.29

2.27 3.18

2.22 3.09

27 4.21 7.68

3.35 5.49

2.96 4.60

2.73 4.11

2.57 3.78

2.46 3.56

2.37 3.39

2.31 3.26

2.25 3.15

2.20 3.06

28 4.20 7.64

3.34 5.45

2.95 4.57

2.71 4.07

2.56 3.75

2.45 3.53

2.36 3.36

2.29 3.23

2.24 3.12

2.19 3.03

29 4.18 7.60

3.33 5.42

2.93 4.54

2.70 4.04

2.55 3.73

2.43 3.50

2.35 3.33

2.28 3.20

2.22 3.09

2.18 3.00

30 4.17 7.56

3.32 5.39

2.92 4.51

2.69 4.02

2.53 3.70

2.42 3.47

2.33 3.30

2.27 3.17

2.21 3.07

2.16 2.98

Source: Richard Lowry. file://localhost/www.vassarstats.net. Retrieved from http/::vassarstats.net:textbook:apx_d.html

suk85842_06_c06.indd 200 10/23/13 1:40 PM

file://localhost/www.vassarstats.net

http/::vassarstats.net:textbook:apx_d.html

CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD

6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD

A significant t from an independent t-test provides for a simpler interpretation than a significant F from an ANOVA with three or more groups can provide. A significant t indicates that the two groups probably belong to populations with different means. A significant F indicates that at least one group is significantly different from at least one other group in the study, but unless there are only two groups in the ANOVA, it is not clear which group is significantly different from which. If the null hypothesis is rejected, there are a number of possible alternatives, as we noted when we listed all the possible HA outcomes earlier.

The point of a post hoc test (an “after this” test) conducted following an ANOVA is to determine which groups are significantly different from each other. So when F is signifi- cant, a post hoc test is the next step. Statisticians debate the practice of whether to run a post hoc if F is not significant, as there may be instances in which the overall F will be nonsignificant yet the post hoc tests detect a significant difference between two groups. With the ease of running the analysis in Excel or SPSS, researchers may run post hoc tests to determine whether there are significant differences in means between pairs of groups. In the latter case, a planned comparison is most prudent for specific detection of mean dif- ferences. Whether planned comparison or post hoc, the determination should be based on the purpose of the study. If the goal is to test the null hypotheses that the means are not significantly different, then a significant omnibus F is appropriate. On the other hand, if there are specific instances of detecting differences between means, then the F result is not necessary and going straight to the post hoc tests will be apropos as in a planned compari- son between means.

There are many post hoc tests that are used for different purposes and based on their own assumptions and calculations (18 of them in SPSS, named after their respective authors). Each of them has particular strengths, but one of the more common in the psychological disciplines, and also one of the easiest to calculate, is John Tukey’s HSD test, for “honestly significant difference.”

Many statisticians use the terms liberal and conservative to describe post hoc tests. A liberal test is one in which there is a greater chance of finding a significant difference between means but a higher chance of a type I error. Fisher’s least significant difference (LSD) test is an example of a liberal test. These are seldom used for the very concern of committing a type I error. Conversely, a conservative post hoc has a lower chance of finding a significant difference between means but also a lower chance of a type I error. One such conservative test is Bonferroni’s post hoc. By their very conservative nature, these post hoc tests are more widely used.

Formula 6.5 produces a value that is the smallest difference between the means of any two samples that can be statistically significant:

HSD 5 x Å MSwith

n Formula 6.5

suk85842_06_c06.indd 201 10/23/13 1:40 PM

CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD

Where

x 5 a table value indexed to the number of groups (k) in the problem and the degrees of freedom within (dfwith) from the ANOVA table

MSwith 5 the value from the ANOVA table

n 5 the number in one group when group sizes are equal.

In order to compute Tukey’s HSD, follow these steps:

1. From Table 6.4 locate the value of x by mov- ing across the top of the table to the number of groups/treatments (k 5 3), and then down the left side for the within degrees of freedom (dfwith 5 9). The intersecting values are 3.95 and 5.43. The smaller of the two is the value when p 5 .05, as it was in our test. The post hoc test is always conducted at the same probability level as the ANOVA. In this case, it is p 5 .05.

2. The calculation is 3.95 times the result of the square root of .945 (the MSwith) divided by 4 (n).

3.95 Å .954

4 5 1.920

3. This value is the minimum difference between the means of two significantly dif- ferent samples. The sign of the difference does not matter; it is the absolute value we need.

The means for social isolation in the three groups are the following:

Ma 5 3.50 for small town respondents

Mb 5 6.750 for suburban respondents

Mc 5 7.250 for city respondents

Small towns minus suburbs:

Ma 2 Mb 5 3.50 2 6.75 5 23.25—this difference exceeds 1.92 and is significant.

Small towns minus cities:

Ma 2 Mc 5 3.50 2 7.25 5 23.75—this difference exceeds 1.92 and is significant.

Suburbs minus cities:

Mb 2 Mc 5 6.75 2 7.25 5 20.50—this difference is less than 1.92 and is not significant.

When several groups are involved, sometimes it is helpful to create a table that presents all the differences between pairs of means. Table 6.5, which is repeated in the Appendix as Table D, is the Tukey’s HSD results for the social isolation problem.

Formula 6.5 is used when group sizes are equal. However, there is an alternate formula for unequal group sizes for the more adventurous:

HSD 5 Å a MSwith

2 b a 1 n1

1 1 n2 b

with a separate HSD value completed for each pair of means in the problem.

Try It!

suk85842_06_c06.indd 202 10/23/13 1:40 PM

CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD

Table 6.4: Tukey’s HSD critical values: q (Alpha, k, df )

* The critical value for q corresponding to alpha = .05 (top) and alpha = .01 (bottom)

df k 5 Number of Treatments

2 3 4 5 6 7 8 9 10

5 3.64 5.70

4.60 6.98

5.22 7.80

5.67 8.42

6.03 8.91

6.33 9.32

6.58 9.67

6.80 9.97

6.99 10.24

6 3.46 5.24

4.34 6.33

4.90 7.03

5.30 7.56

5.63 7.97

5.90 8.32

6.12 8.61

6.32 8.87

6.49 9.10

7 3.34 4.95

4.16 5.92

4.68 6.54

5.06 7.01

5.36 7.37

5.61 7.68

5.82 7.94

6.00 8.17

6.16 8.37

8 3.26 4.75

4.04 5.64

4.53 6.20

4.89 6.62

5.17 6.96

5.40 7.24

5.60 7.47

5.77 7.68

5.92 7.86

9 3.20 4.60

3.95 5.43

4.41 5.96

4.76 6.35

5.02 6.66

5.24 6.91

5.43 7.13

5.59 7.33

5.74 7.49

10 3.15 4.48

3.88 5.27

4.33 5.77

4.65 6.14

4.91 6.43

5.12 6.67

5.30 6.87

5.46 7.05

5.60 7.21

11 3.11 4.39

3.82 5.15

4.26 5.62

4.57 5.97

4.82 6.25

5.03 6.48

5.20 6.67

5.35 6.84

5.49 6.99

12 3.08 4.32

3.77 5.05

4.20 5.50

4.51 5.84

4.75 6.10

4.95 6.32

5.12 6.51

5.27 6.67

5.39 6.81

13 3.06 4.26

3.73 4.96

4.15 5.40

4.45 5.73

4.69 5.98

4.88 6.19

5.05 6.37

5.19 6.53

5.32 6.67

14 3.03 4.21

3.70 4.89

4.11 5.32

4.41 5.63

4.64 5.88

4.83 6.08

4.99 6.26

5.13 6.41

5.25 6.54

15 3.01 4.17

3.67 4.84

4.08 5.25

4.37 5.56

4.59 5.80

4.78 5.99

4.94 6.16

5.08 6.31

5.20 6.44

16 3.00 4.13

3.65 4.79

4.05 5.19

4.33 5.49

4.56 5.72

4.74 5.92

4.90 6.08

5.03 6.22

5.15 6.35

17 2.98 4.10

3.63 4.74

4.02 5.14

4.30 5.43

4.52 5.66

4.70 5.85

4.86 6.01

4.99 6.15

5.11 6.27

18 2.97 4.07

3.61 4.70

4.00 5.09

4.28 5.38

4.49 5.60

4.67 5.79

4.82 5.94

4.96 6.08

5.07 6.20

19 2.96 4.05

3.59 4.67

3.98 5.05

4.25 5.33

4.47 5.55

4.65 5.73

4.79 5.89

4.92 6.02

5.04 6.14

20 2.95 4.02

3.58 4.64

3.96 5.02

4.23 5.29

4.45 5.51

4.62 5.69

4.77 5.84

4.90 5.97

5.01 6.09

24 2.92 3.96

3.53 4.55

3.90 4.91

4.17 5.17

4.37 5.37

4.54 5.54

4.68 5.69

4.81 5.81

4.92 5.92

30 2.89 3.89

3.49 4.45

3.85 4.80

4.10 5.05

4.30 5.24

4.46 5.40

4.60 5.54

4.72 5.65

4.82 5.76

40 2.86 3.82

3.44 4.37

3.79 4.70

4.04 4.93

4.23 5.11

4.39 5.26

4.52 5.39

4.63 5.50

4.73 5.60

Source: Tukey’s HSD critical values (n.d.). Retrieved from http://www.stat.duke.edu/courses/Spring98/sta110c/qtable.html.

suk85842_06_c06.indd 203 10/23/13 1:40 PM

http://www.stat.duke.edu/courses/Spring98/sta110c/qtable.html

CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD

Table 6.5: Presenting Tukey’s HSD results in a table

HSD 5 x Å MSwith

5 3.95 Å .954

4 5 1.920 (x2)

Any difference between pairs of means 1.920 or greater is a statistically significant difference. Mean differences in orange are statistically significant.

Small towns M 5 3.500

Suburbs M 5 6.750

Cities M 5 7.250

Small towns M 5 3.500

Diff 5 3.250 Diff 5 3.750

Suburbs M 5 6.750

Diff 5 0.500

Cities M 5 7.250

The values entered in the cells in Table 6.5 indicate the differences between each pair of means in the study. Comparing the mean scores from each of the three groups indicates that the respondents from small towns expressed a significantly lower level of social isolation than those in either the suburbs or cities. Comparing the mean scores from the suburban and city groups indicates that social isolation scores are higher in the city, but the difference is not large enough to be statistically significant.

The significant F from the ANOVA indicated that at least one group had a significantly different level of social isolation from at least one other group, but that is all a significant F can reveal. The result does not indicate which group is significantly different from which other group, unless there are only two groups. The post hoc test indicates which pairs of groups are significantly different from each other. Table 6.5 is an example of how to illus- trate the significant and the nonsignificant differences. One caveat in using Tukey’s HSD is that there is an assumption of equality of variances (homogeneity) between groups based on Levene’s test. This assumption applies here as well. Suppose there is a violation of homogeneity. In that instance, an adjusted post hoc that accounts for inequality of vari- ances (or heterogeneity) will need to be employed. To implement this in SPSS for instance there are four options under the Equal Variances Not Assumed heading when conducting a post hoc for ANOVA. One of these approaches is the Games-Howell post hoc, which is executed by checking that box in SPSS Post Hoc tests tab for ANOVA.

suk85842_06_c06.indd 204 10/23/13 1:40 PM

CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD

Apply It!

ANOVA and Product Development

A product development specialist in a major computer company decides that it would be a significant improvement to keyboards if they were designed to fit the shape of human hands. Instead of being flat, the new keyboard would

curve like the surface of a football. Before the company executives are willing to expend the resources necessary to produce and distribute such a product, they need to know whether it will sell and what the most comfortable curvature of the keyboard would be.

The company produces prototypes for four different keyboards, labeled Prototype A through D (see Table 6.6). Prototype A is a standard flat keyboard, and the others each have varying amounts of curve. Everything else about the keyboards is the same, so this is a one-way ANOVA. Forty different users are randomly assigned to test one of the four keyboards and rank them in comfort on a 100-point scale. The results are shown below Figure 6.4.

Table 6.6: Prototype A–D data set

Prototype A Prototype B Prototype C Prototype D

49 57 77 65

57 53 82 61

73 69 77 73

68 65 85 81

65 61 93 89

62 73 79 77

61 57 73 81

45 69 89 77

53 73 82 69

61 77 85 77

Next, the test results are analyzed in Excel, which produces the information in Figure 6.4.

(continued)

suk85842_06_c06.indd 205 10/23/13 1:40 PM

CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD

Apply It! (continued)

Figure 6.4: Excel results of comparison means and ANOVA of prototypes

The null hypothesis is that there is no difference among the four keyboards. From Table 6.6, we see that the F value is 16.72, which is larger than the critical value of F 5 2.87 at the critical a 5 .05. Therefore the null hypothesis is rejected at p , .05. At least one of the prototypes is significantly different from at least one other prototype.

Because there is a significant F, the marketers next compute HSD:

HSD 5 x Å MSwith

Where

x 5 3.81 (based on k 5 4, dfwith 5 36, and p 5 .05) MSwith 5 61.07, the value from the ANOVA table

n 5 10, the number in one group when group sizes are equal HSD 5 9.42

(continued)

Groups

Prototype A

Prototype B

Prototype C

Prototype D

Count

Sum

594

654

822

750

Average

59.4

65.4

82.2

75.0

Variance

73.82

65.60

36.40

68.44

Summary

Source of Variation

Between Groups

Within Groups

Total

3063.6

2198.4

5262

1021.20

61.07

16.72

p-value

5.71E–07

Fcrit

2.87

ANOVA

suk85842_06_c06.indd 206 10/23/13 1:40 PM

CHAPTER 6Section 6.3 Determining the Results’ Practical Importance

Apply It! (continued)

This value is the minimum difference between the means of two significantly different sam- ples. The difference in means between the groups is shown below:

A 2 B 5 26.0 A 2 C 5 222.8 A 2 D 5 215.6 B 2 C 5 216.8 B 2 D 5 29.6 C 2 D 5 7.2

The differences in comfort between Prototypes A-B and C-D are not statistically significant, because the absolute values are less than the Tukey’s HSD value of 9.42. However, the differ- ences in comfort between the remaining prototypes are statistically significant.

Based on analysis of the one-way ANOVA, the marketing team decides to produce and sell the keyboard configuration of Prototype C. This had the highest mean comfort level and will be a significant improvement over existing keyboards.

Apply It! boxes written by Shawn Murphy

6.3 Determining the Results’ Practical Importance Three questions can come up in an ANOVA. The second and third questions depend upon the answer to the first:

1. Are any of the differences statistically significant? The answer depends upon how the calculated F value compares to the critical value from the table.

2. If the F is significant, which groups are significantly different from each other? That question is answered by completing a post hoc test such as Tukey’s HSD.

3. If F is significant, how important is the result? The answer comes by calculating an effect size.

After addressing the first two questions, we now turn our attention to the third question, effect size. With the t-test in Chapter 5, Cohen’s d answered the question about how impor- tant the result was. Several effect-size statistics have been used to explain the importance of a significant ANOVA result. Omega squared (v2) and partial-eta-squared (partial-h2) are both quite common in the social science research literature, but the one we will use is called eta-squared (H2). The Greek letter eta (h pronounced like “ate a” as in “ate a grape”) is the equivalent of the letter h. Because some of the variance in scores is unexplained and is therefore error variance, eta-squared answers this question: How much of the score variance can be attributed to the independent variable?

suk85842_06_c06.indd 207 10/23/13 1:40 PM

CHAPTER 6Section 6.3 Determining the Results’ Practical Importance

In the social isolation problem, the question was whether residents of small towns, subur- ban areas, and cities differ in the amount of social isolation they indicate. The respondents’ location is the IV. Eta-squared estimates how much of the difference in social isolation is related to where respondents live.

There are only two values involved in the h2 calculation, both retrievable from the ANOVA table. Formula 6.6 shows the eta-squared calculation:

h2 5 SSbet SStot

Formula 6.6

Eta-squared is the ratio of between-groups variability to total variability. If there was no error variance, all variance would be due to the independent variable, and the sums of squares for between-groups variability and for total variability would have the same val- ues; the effect size would be 1.0. With human subjects, this never happens because scores fluctuate for reasons other than the IV, but it is important to know that 1.0 is the “upper bound” for this effect size. The lower bound is 0, of course—none of the variance is explained. But we also never see eta-squared values of 0 because the only time the effect size is calculated is when F is significant, and that can only happen when the effect of the IV is great enough that the ratio of MSbet to MSwith exceeds the critical value.

For the social isolation problem, SSbet 5 33.168 and SStot 5 41.672, so

h2 5 33.168 41.672

5 0.796.

According to this data, about 80% (79.6% to be exact) of the variance in social isolation scores is related to whether the respondent lives in a small town, a suburb, or a city. (Note that this amount of variance is unrealistically high, which can happen when numbers are contrived.)

G If the F in ANOVA is not significant, should the post hoc test or the effect-size calculation be made?

Try It!

Apply It!

Using ANOVA to Test Effectiveness

A pharmaceutical company has developed a new medicine to treat a skin condition. This medi- cine has been proven effective in previous tests, but now the company is trying to decide the best method to deliver the medicine. The options are

1. pills that are taken orally, 2. a cream that is rubbed into the affected area, or 3. drops that are placed on the affected area.

(continued)

suk85842_06_c06.indd 208 10/23/13 1:40 PM

CHAPTER 6Section 6.3 Determining the Results’ Practical Importance

Apply It! (continued)

To test the application methods, the company uses 24 volunteers who suffer from this skin condition. Each of the volunteers is randomly assigned to one of the three treatment methods. Note that each volunteer tests only one of the delivery methods. This satisfies the requirement that the categories of the IV must be independent. This is a one-way ANOVA test with the delivery method

being the only independent variable.

To evaluate the effectiveness of each delivery method, three different dermatologists exam- ine each patient after the course of treatment. They then rate the skin condition on a scale of 1 through 20, with 20 being a total absence of the condition. The scores from the three doctors are then averaged.

The null hypothesis is that all three delivery methods are equally effective:

H0: mpills 5 mcream 5 mdrops

The null hypothesis indicates that the three treatments were drawn from populations with the same mean. The alternate hypothesis for the ANOVA test is

Ha: mpills ? mcream ? mdrops

Data from the trial is shown in Table 6.7.

Table 6.7: Data from trial of skin treatment conditions

Pills Cream Drops

14 18 13

13 15 15

19 16 16

18 18 15

15 17 14

16 13 17

12 17 13

12 18 16

(continued)

suk85842_06_c06.indd 209 10/23/13 1:40 PM

CHAPTER 6Section 6.3 Determining the Results’ Practical Importance

Apply It! (continued)

Figure 6.5: Analysis of the data that was performed in Excel

Figure 6.5 shows the value for F is 1.72, which is less than the Fcrit value of 3.47 when testing at p 5 .05. Therefore, the null hypothesis is not rejected. We cannot say that the different delivery methods come from populations with different means. Looking at the p value gen- erated by Excel, we see that there is a 20% probability that a difference in means this large could have occurred by chance alone. Because the null hypothesis is not rejected, there is no need to perform either a Tukey’s HSD test or an h2 calculation.

The pharmaceutical company decides to offer the medicine as a cream because this is gen- erally their preferred delivery method. The ANOVA test has assured them that this is the correct choice, and that neither of the two alternate methods provided a more effective delivery option. In other words, the alternative hypothesis is not correct.

Apply It! boxes written by Shawn Murphy

Groups

Pills

Cream

Drops

Count

Sum

119

132

119

Average

14.88

16.50

14.88

Variance

6.98

3.14

2.13

Summary

Source of Variation

Between Groups

Within Groups

Total

14.08

85.75

99.83

7.04

4.08

1.72

p-value

0.20

Fcrit

3.47

ANOVA

suk85842_06_c06.indd 210 10/23/13 1:40 PM

CHAPTER 6Section 6.4 Conditions for the One-Way ANOVA

6.4 Conditions for the One-Way ANOVA As we saw with the t-tests, any statistical test requires that certain conditions (also referred to as assumptions) are met. The conditions might be characteristics such as the scale of the data, the way the data is distributed, the relationships between the groups in the analysis, and so on. In the case of the one-way ANOVA, the name indicates one of the conditions.

• This particular test can accommodate just one independent variable. • That one variable can have any number of categories, but there can be just one IV.

In the example of small-town, suburban, and city isolation, the IV was the loca- tion of the respondents’ residence. We might have added more categories such as small-town, semirural, small town, large town, suburbs of small cities, suburbs of large cities, and so on, all of which relate to the respondents’ place of residence, but like the independent t-test, there is no way to add another variable, such as the respondents’ gender, in a one-way ANOVA.

• The categories of the IV must be independent. • Like the independent t-test, the groups involved must be independent. Those

who are members of one group cannot also be members of another group involved in the same analysis.

• The IV must be nominal scale. • Because the IV must be nominal scale, sometimes data of some other scale is

reduced to categorical data to complete the analysis. If someone is interested in whether there are differences in social isolation related to age, age must be changed from ratio to nominal data prior to the analysis. Rather than using each person’s age in years as the independent variable, ages are grouped into catego- ries such as 20s, 30s, and so on. This is not ideal, because by reducing ratio data to nominal or even ordinal scale, the differences in social isolation between, for example, 20- and 29-year-olds are lost.

• The DV must be interval or ratio scale. • Technically, social isolation would need to be measured with something like the

number of verbal exchanges that one has daily with neighbors or co-workers, rather than asking on a scale of 1–10 to indicate how isolated one feels, which is probably an example of ordinal data.

• The groups in the analysis must be similarly distributed. The technical descrip- tion for this similarity of distribution is homogeneity of variance. For example, this condition means that the groups should all have reasonably similar standard deviations. This was discussed in Chapter 5 where the Levene’s test is used to test equality of variances.

• Finally, using ANOVA assumes that the samples are drawn from a normally dis- tributed population.

It may seem difficult to meet all these conditions. However, keep in mind that normality and homogeneity of variance in particular represent ideals more than practical necessities. As it turns out, Fisher’s procedure can tolerate a certain amount of deviation from these requirements; this test is quite robust.

suk85842_06_c06.indd 211 10/23/13 1:40 PM

CHAPTER 6Section 6.5 ANOVA and the Independent t-Test

6.5 ANOVA and the Independent t-Test

The one-way ANOVA and the independent t-test share several assumptions, although they employ distinct statistics, in that the sums of squares is used for ANOVA and the standard error of the difference is used for the t-test. For example, both tests will lead the analyst to the same conclusion. This consistency can be illustrated by completing ANOVA and the independent t-test for the same data.

Suppose an industrial psychologist is interested in how people from two separate divi- sions of a company differ in their work habits. The dependent variable is the amount of work completed after-hours at home per week for supervisors in marketing versus super- visors in manufacturing. The data is as follows:

Marketing: 3, 4, 5, 7, 7, 9, 11, 12

Manufacturing: 0, 1, 3, 3, 4, 5, 7, 7

Calculating some of the basic statistics yields the following:

M s SEM SEd MG

Marketing: 7.25 3.240 1.146

1.458 5.50

Manufacturing: 3.75 2.550 0.901

First, the t-test:

t 5 M1 2 M2

SEd 5

7.25 2 3.75 1.458

5 2.401; t.05(14) 5 2.145

The difference is significant. Those in marketing (M1) take significantly more work home than those in manufacturing (M2).

Now the ANOVA:

• SStot 5 a (x 2 MG)2 5 168 • Verify that the result of subtracting MG from each score in both groups, squaring

the differences, and summing the square 5 168. • SSbet 5 (Ma 2 MG)

2na 1 (Mb 2 MG) 2nb

• This one is not too lengthy to do here: (7.25 2 5.50)2(8) 1 (3.75 2 5.50)2(8) 5 24.5 1 24.5 5 49.

• SSwith 5 (xa 2 Ma) 2 1 (xb 2 Mb)

• Verify that the result of subtracting the group means from each score in the par- ticular group, squaring the differences, and summing the squares 5 119.

• Check that SSwith 1 SSbet 5 SStot : 119 1 49 5 168.

suk85842_06_c06.indd 212 10/23/13 1:40 PM

CHAPTER 6Section 6.6 Completing ANOVA with Excel

Source SS df MS F Fcrit

Between 49 1 49 5.765 F.05(1,14) 5 4.60

Within 119 14 8.5

Total 168 15

Like the t-test, ANOVA indicates that the difference in the amount of work completed at home is significantly different for the two groups, so at least both tests draw the same conclusion about whether the result is significant, but there is more similarity than this.

• Note that the calculated value of t 5 2.401, and the calculated value of F 5 5.765. • If the value of t is squared, it equals the value of F.

2.4012 5 5.765.

• The same is true for the critical values:

t.05(14) 5 2.145

F.05(1,14) 5 4.60

2.1452 5 4.60

Gosset’s and Fisher’s tests draw exactly equivalent conclusions when there are two groups. The ANOVA tends to be more work, and researchers ordinarily use the t-test for two groups, but the point is that the two tests are entirely consistent.

6.6 Completing ANOVA with Excel

The ANOVA by longhand involves enough calculated means, subtractions, squaring of differences, and so on that doing an ANOVA on Excel is beneficial. A researcher is comparing the level of optimism indicated by people in different vocations during an economic recession. The data is from laborers, clerical staff in professional offices, and the professionals in those offices. The data for the three groups follows:

Laborers: 33, 35, 38, 39, 42, 44, 44, 47, 50, 52

Clerical staff: 27, 36, 37, 37, 39, 39, 41, 42, 45, 46

Professionals: 22, 24, 25, 27, 28, 28, 29, 31, 33, 34

1. Create the data file in Excel. Enter Laborers, Clerical staff, and Professionals in cells A1, B1, and C1, respectively.

2. In the columns below those labels, enter the optimism scores, beginning in cell A2 for the laborers, B2 for the clerical workers, and C2 for the professionals. Once the data is entered and checked for accuracy, proceed with the following steps.

3. Click the Data tab at the top of the page.

H What is the relationship between the values of t and F if both are performed for the same two-group test?

Try It!

suk85842_06_c06.indd 213 10/23/13 1:40 PM

CHAPTER 6Section 6.6 Completing ANOVA with Excel

4. At the extreme right, choose Data Analysis. 5. In the Analysis Tools window, select ANOVA Single Factor and click OK. 6. Indicate where the data is located in the Input Range. In the example here, the

range is A2:C11. 7. Note that the default is “Grouped by Columns.” If the data is arrayed along rows

instead of columns, this would need to be changed.

Because we designated A2 instead of A1 as the point where the data begins, there is no need to indicate that labels are in the first row.

8. Select Output Range and enter a cell location where you wish the display of the output to begin. In the example in Figure 6.6, the location is A13.

9. Click OK.

Widen column A to make the output easier to read. It will look like the screenshot in Figure 6.6.

Figure 6.6: Performing an ANOVA on Excel

suk85842_06_c06.indd 214 10/23/13 1:40 PM

CHAPTER 6Section 6.7 Presenting Results

As you have already seen in the two Apply It! boxes, the results appear in two tables. The first provides descriptive statistics. The second table looks like the longhand table of results for the social isolation example, except that

• the figures shown for the total follow those for between and within instead of preceding them, and

• the P-value column indicates the probability that an F of this magnitude could have occurred by chance.

Note that the P value is 4.31E-06. The “E-06” is scientific notation. It is a shorthand way of indicating that the actual value is p 5 .00000431, or 4.31 with the decimal moved six deci- mals to the left, or negative exponent to the sixth power The probability easily exceeds the p 5 .05 standard for statistical significance.

6.7 Presenting Results

The previous analyses all used Excel, so we will now shift to using SPSS for the execution of these steps and the interpretation of the results. We will first use the data in Table 6.7 and then proceed with actual data gathered from published research. You will see that we use the same steps regardless of the sample size, and that using technology like Excel and SPSS makes hand calculations unnecessary. While hand cal- culations are instructive, they are also laborious and more prone to errors, especially with large data sets.

SPSS Example 1: Steps for ANOVA

After setting up the data in SPSS as seen in Figure 6.7 (data from Table 6.7), the steps in executing this analysis are as follows:

Analyze S Compare Means S One-Way ANOVA. Place Treatment into the Factor box and Skin Condition into the Dependent List. Click Post Hoc on the left and check Tukey and Games-Howell; then click Options and check Descriptive and Homogeneity of variance test. Click Continue and OK. (Note that the three treatment groups in the data set (Figure 6.7) are numerically coded: Pills 5 1, Creams 5 2, and Drops 5 3.)

suk85842_06_c06.indd 215 10/23/13 1:40 PM

CHAPTER 6Section 6.7 Presenting Results

Figure 6.7: Data set in SPSS

suk85842_06_c06.indd 216 10/23/13 1:40 PM

CHAPTER 6Section 6.7 Presenting Results

Figure 6.8: SPSS output from trial of skin treatment conditions

Levene Statistic

1.822

df1

df2

Sig.

.186

Test of Homogeneity of Variances SkinCondition

ANOVA SkinCondition

Between Groups

Within Groups

Total

Sum of Squares

14.083

85.750

99.833

Mean Square

7.042

4.083

1.724

Sig.

0.203

Descriptives SkinCondition

Pills

Creams

Total

Drops

16.50

14.88

15.42

Std. DeviationMean Std. Error

1.773

1.458

2.642

2.083

.627

.515

.934

.425

15.02

13.66

12.67

14.54

17.98

16.09

17.08

16.30

95% Confidence Interval for Mean

Minimum

Lower Bound

Upper Bound

Maximum

Multiple Comparisons Dependent Variable: SkinCondition

Tukey HSD

Games- Howell

Pills

Creams

Drops

Pills

Creams

Drops

(I) Treatment

1.625

�1.625

1.625

.000

�1.625

1.625

.000

1.625

�1.625

.000

1.010

1.125

1.010

.811

1.125

.811

1.067

.264

1.000

.264

.350

1.000

.149

.350

.149

1.000

�.92

�4.17

�.92

�2.55

�4.17

�1.37

�2.55

�.51

�4.62

�3.76

�2.89

4.17

.92

4.17

2.55

.92

4.62

2.55

3.76

1.37

.51

2.89

(J) Treatment

Mean Difference

(I-J)

95% Confidence Interval

Std. Error

Lower Bound

Upper Bound

Sig.

Creams

Drops

Pills

Drops

Pills

Creams

Drops

Pills

Drops

Pills

Creams

suk85842_06_c06.indd 217 10/23/13 1:40 PM

CHAPTER 6Section 6.7 Presenting Results

As seen in the SPSS output (Figure 6.8), the ANOVA results are the same as when exe- cuted in Excel earlier in the chapter. Here SPSS allows execution of the ANOVA including descriptive statistics, tests of homogeneity of variance, post hoc tests, and a line graphs— all simultaneously executed using the SPSS steps outlined earlier. The results begin with the Descriptives table where you can see that each group has an even number of partici- pants (n 5 8). Here you can see difference in the means with Pills (M 5 16.50) highest of the three treatments. The Test of Homogeneity of Variance shows a favorable result in that it is not significant (p . .05), specifically p 5 .186. This indicates that there is no significant difference in the variance of the three treatments indicating equal variances. As you recall in earlier chapters, if there is inequality of variance across groups an adjustment is needed to compare groups. Next, the ANOVA table shows a nonsignificant F statistic, p 5 .203. At this stage since F is not significant we do not need to interpret the post hoc tests as there will be no significance between groups. As noted earlier in the chapter, this is a debat- able topic in that with the ease of running post hoc tests, the analyst can easily look at the results of these regardless of the F statistic result. Findings may indicate significant differ- ences between any two groups even though there is a nonsignificant F. This is often a rar- ity and you can clearly see from the example that none of the post hoc tests is significant between groups.

SPSS Example 2: Steps for ANOVA

Using public data about higher education and housing from Pew Research (2010), Social and Demographic Trends, the steps in executing this analysis are as follows: Analyze S Compare Means S One-Way ANOVA. Place schl (currently enrolled in school) into the Factor box and age into the Dependent List. Click Post Hoc on the left and check Tukey and Games-Howell; then click Options and check Descriptive, Homogeneity of variance test, and Means plot. Click Continue and OK.

suk85842_06_c06.indd 218 10/23/13 1:40 PM

CHAPTER 6Section 6.7 Presenting Results

Figure 6.9: SPSS output from Pew research social and demographic trends (2010) education data set

Levene Statistic

44.884

df1

df2

1692

Sig.

.000

Test of Homogeneity of Variances AGE What is your age?

ANOVA AGE What is your age?

Between Groups

Within Groups

Total

Sum of Squares

72748.597

272338.706

345087.303

1692

1697

Mean Square

14549.719

160.957

90.395

Sig.

.000

Descriptives AGE What is your age?

Std. DeviationMean Std. Error

18 43

18 58

18 64

22 34

19.64

24.81

36.03

31.38

42.05

29.33

38.82

212

1336

1698

4.801

8.433

15.503

10.692

13.399

6.429

14.260

.836

.579

2.699

1.188

.367

3.712

.346

17.93

23.66

30.53

29.02

41.33

13.36

38.14

21.34

25.95

41.53

33.75

42.77

45.30

39.49 18 64

95% Confidence Interval for Mean

Minimum

Lower Bound

Upper Bound

Maximum

Yes, in High School

Yes, in Technical, trade, or vocational school

Yes, in College (undergraduate, including 2-year colleges)

Yes, in Graduate School

Don’t know/ Refused (VOL.)

Total

suk85842_06_c06.indd 219 10/23/13 1:40 PM

CHAPTER 6Section 6.7 Presenting Results

Figure 6.9: SPSS output from Pew research social and demographic trends (2010) education data set (continued)

Multiple Comparisons Dependent Variable: AGE What is your age?

Tukey HSD

(I) SCHL Are you

currently enrolled in school?

(J) SCHL Are you

currently enrolled in school?

Mean Difference

(I-J)

95% Confidence Interval

Std. Error

Lower Bound

Upper Bound

Sig.

Yes, in High School

Yes, in Technical, trade, or vocational school

Yes, in College (undergraduate, including 2-year colleges)

Don’t know/Refused (VOL.)

Yes, in Graduate School

Yes, in Technical, trade, or vocational school

Yes, in High School

Yes, in Technical, trade, or vocational school

Yes, in College (undergraduate, including 2-year colleges)

Yes, in High School

Yes, in Technical, trade, or vocational school

Don’t know/Refused (VOL.)

Yes, in Graduate School

Don’t know/Refused (VOL.)

Yes, in Graduate School

Don’t know/Refused (VOL.)

Yes, in Graduate School

Yes, in College (undergraduate, including 2-year colleges)

Yes, in High School

Yes, in Technical, trade, or vocational school

Yes, in Graduate School

�16.394*

�5.170

�11.746*

�22.417*

�9.697

16.394*

11.224*

4.648

�6.023

6.697

5.170

�11.224*

�6.576*

�17.247*

�4.527

11.746*

�4.648

6.576*

�10.670*

2.049

22.417*

6.023

17.247*

12.720

10.670*

9.697

�6.697

4.527

�2.049

�12.720

3.123

2.374

2.620

2.236

7.650

3.123

2.374

2.620

2.236

7.650

2.374

1.657

.938

7.376

2.620

1.657

1.452

7.459

2.236

.938

7.333

1.452

7.650

7.376

7.459

7.333

.000

.249

.000

.803

.000

.483

.077

.952

.249

.000

.001

.000

.990

.000

.483

.001

.000

1.000

.000

.077

.000

.509

.000

.803

.952

.990

1.000

.509

�25.30

�11.94

�19.22

�28.79

�31.52

7.48

4.45

�2.83

�12.40

�15.13

�1.60

�18.00

�11.30

�19.92

�25.57

4.27

�12.12

1.85

�14.81

�19.23

16.04

�.36

14.57

�8.20

6.53

�12.13

�28.52

�16.52

�23.33

�33.64

�7.48

1.60

�4.27

�16.04

12.13

25.30

18.00

12.12

.36

28.52

11.94

�4.45

�1.85

�14.57

16.52

19.22

2.83

11.30

�6.53

23.33

28.79

12.40

19.92

33.64

14.81

31.52

15.13

25.57

19.23

8.20

* The mean difference is significant at the 0.05 level.

Yes, in College (undergraduate, including 2-year colleges)

Yes, in Graduate School

Yes, in High School

Yes, in Technical, trade, or vocational school

Don’t know/ Refused (VOL.)

suk85842_06_c06.indd 220 10/23/13 1:40 PM

CHAPTER 6Section 6.7 Presenting Results

Figure 6.9: SPSS output from Pew research social and demographic trends (2010) education data set (continued)

Source: Data from Pew Research: Social and Demographic Trends. (2011). Higher Education/Housing. Retrieved from http://www .pewsocialtrends.org/category/datasets/.

Multiple Comparisons Dependent Variable: AGE What is your age?

Games- Howell

(I) SCHL Are you

currently enrolled in school?

(J) SCHL Are you

currently enrolled in school?

Mean Difference

(I-J)

95% Confidence Interval

Std. Error

Lower Bound

Upper Bound

Sig.

Yes, in High School

Yes, in Technical, trade, or vocational school

Yes, in College (undergraduate, including 2-year colleges)

Don’t know/Refused (VOL.)

Yes, in Graduate School

Yes, in Technical, trade, or vocational school

Yes, in High School

Yes, in Technical, trade, or vocational school

Yes, in College (undergraduate, including 2-year colleges)

Yes, in High School

Yes, in Technical, trade, or vocational school

Don’t know/Refused (VOL.)

Yes, in Graduate School

Don’t know/Refused (VOL.)

Yes, in Graduate School

Don’t know/Refused (VOL.)

Yes, in Graduate School

Yes, in College (undergraduate, including 2-year colleges)

Yes, in Graduate School

Yes, in High School

Yes, in Technical, trade, or vocational school

Yes, in College (undergraduate, including 2-year colleges)

Yes, in High School

Yes, in Technical, trade, or vocational school

Yes, in Graduate School

Don’t know/ Refused (VOL.)

* The mean difference is significant at the 0.05 level.

�16.394*

�5.170*

�11.746*

�22.417*

�9.697

16.394*

11.224*

4.648

�6.023

6.697

5.170*

�11.224*

�6.576*

�17.247*

�4.527

11.746*

�4.648

6.576*

�10.670*

2.049

22.417*

6.023

17.247*

12.720

10.670*

9.697

�6.697

4.527

�2.049

�12.720

2.825

1.017

1.453

.913

3.805

2.825

2.760

2.949

2.724

4.589

1.017

2.760

1.322

.685

3.757

1.453

2.949

1.322

1.243

3.897

.913

2.724

.685

3.730

1.243

3.805

4.589

3.757

3.897

3.730

.000

.374

.000

.003

.618

.260

.701

.000

.003

.000

.814

.000

.618

.000

.990

.000

.260

.000

.247

.000

.374

.701

.814

.990

.247

�24.87

�8.15

�15.96

�25.13

�38.01

7.92

2.91

�4.13

�14.25

�13.62

2.19

�19.54

�10.40

�19.21

�34.04

7.53

�13.42

2.75

�14.29

�24.33

19.70

�2.21

15.28

�17.54

7.05

�18.62

�27.02

�24.99

�28.43

�42.98

�7.92

�2.19

�7.53

�19.70

18.62

24.87

19.54

13.42

2.21

27.02

8.15

�2.91

�2.75

�15.28

24.99

15.96

4.13

10.40

�7.05

28.43

25.13

14.25

19.21

42.98

14.29

38.01

13.62

34.04

24.33

17.54

suk85842_06_c06.indd 221 10/23/13 1:40 PM

http://www.pewsocialtrends.org/category/datasets/

CHAPTER 6Section 6.7 Presenting Results

Figure 6.10: SPSS output graph from Pew research social and demographic trends (2010) education data set

Source: Data from Pew Research: Social and Demographic Trends. (2011). Higher Education/Housing. Retrieved from http://www .pewsocialtrends.org/category/datasets/.

The Descriptives table in Figures 6.9 and 6.10 show that each group has an unequal number of participants with No (not in school) participants with n 5 1,336 and the highest mean age (M 5 42.05). The Test of Homogeneity of Variance shows a nonfavorable result in that it is significant (p , .05). This indicates that there is a significant difference in the variance of the six education groups indicating unequal variances (or heterogeneity of variance). Next, the ANOVA table indicates a significant F statistic (p , .05). To determine which of the group comparisons is significant using a post hoc test when there is a violation of homogeneity, equal variance will not be assumed. Therefore, we will interpret the Equal variances not assumed post hoc tests, which is Games-Howell. Here, the Don’t Know/ Refused group is not significant with any of the other education groups. You can also see significant difference with several groups such as Yes, in High School and Yes, in Technical, trade, or vocational school. All comparisons can be made in a similar manner based on the significance value using the Multiple Comparisons table. The line graph or means plot shows the mean age of each group with the No group having the highest mean age and the Yes, in High School group having the lowest.

suk85842_06_c06.indd 222 10/23/13 1:40 PM

http://www.pewsocialtrends.org/category/datasets/

CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H-Test

6.8 Interpreting Results

Though you should refer to the most recent edition of the APA manual for specific detail on formatting statistics, the following may be used as a quick guide in presenting the statistics covered in this chapter.

Table 6.8: Guide to APA formatting of F statistic results

Abbreviation or Term Description

F F test statistic score

h2 Eta-squared: an effect size

v2 Omega-squared: an effect size

HSD Honestly significant difference: a Tukey’s post hoc test

SS Sum of squares

MS Mean square

Note that all of the terms in Table 6.8 are italicized, while HSD is not. The following are some examples of how to present results using these abbreviations, though you may use different combinations of results.

Using the data from SPSS Example 2, Figures 6.9 and 6.10, we could present the results in the following way:

• The overall difference between treatment and skin condition was not signifi- cantly different F(2,21) 5 1.724, p 5 .203. (Note that the df is listed for both the between- and within-group lines in the ANOVA table.)

• The overall difference between school and age was significantly different F(5,1692) 5 90.39, p , .05.

• The No [school] group were significantly older (M 5 42.05, SD 5 13.39) than the Yes, in High School group (M 5 19.64, SD 5 4.80), the Yes, In College. . . group (M 5 24.81, SD 5 8.43), and the Yes, in Graduate School group (M 5 31.38, SD 5 10.69), whereas there were no significant differences with the Yes, in Tech- nical, trade. . . group (M 5 36.03, SD 5 15.50), and the Don’t Know/Refused group (M 5 29.33, SD 5 6.43).

6.9 Nonparametric Test: Kruskal-Wallis H-test

The one-way ANOVA nonparametric equivalent is the Kruskal-Wallis H-test, also known as the Kruskal Wallis ANOVA. Like the Mann-Whitney U-test, the Kruskal- Wallis H-test is based on ranked (ordinal) data. It is used as an alternative to its para- metric counterpart when violations of assumptions have occurred. In fact, Kruskal was

suk85842_06_c06.indd 223 10/23/13 1:40 PM

CHAPTER 6

not a proponent of significance testing, as Bradburn (2007) has quoted him as saying, “I am thinking these days about the many senses in which relative importance gets consid- ered. Of these senses, some seem reasonable and others not so. Statistical significance is low on my ordering.” That said, his derived equivalent of a parametric technique is very apropos.

As in the Mann-Whitney U-test, the rank of each group is determined and then summed. The H is calculated as a pro- portion of the summed ranks divided by their respective sample sizes.

H 5 12

N1N 1 12 a a 1Tg 2 2

ng b 2 31N 1 12 Formula

6.7

Where

N 5 total sample size

Tg 5 sum of ranks across

ng 5 sample of group

To illustrate the calculation of the H-test, we will use the same data from Table 6.7 with a few modifications as seen in Table 6.9. Here the initial rank is to rank all the values across treatments with 1 being the lowest rank. If there are tied ranks, then an average of the ranks is taken. For instance in the Pills column, the two values of 12 have an initial rank of 1 and 2. The average of them is 1.5 as seen in the Rank column. The same is true for values of 13, where there are four ranks with the average rank of 4.5, and so on with the other ties. Once all of these are complete, then the ranks are summed as seen in the last row of the table.

Table 6.9: Data from trial of skin treatment conditions

Pills Initial Rank

Rank Cream Initial Rank

Rank Drops Initial Rank

Rank

14 7 7.5 18 21 21.5 13 3 4.5

13 4 4.5 15 10 10.5 15 11 10.5

19 24 24 16 14 14.5 16 15 14.5

18 20 21.5 18 22 21.5 15 12 10.5

15 9 10.5 17 17 18 14 8 7.5

16 13 14.5 13 5 4.5 17 19 18

12 1 1.5 17 18 18 13 6 4.5

12 2 1.5 18 23 21.5 16 16 14.5

85.5 130 84.5

There are several websites that will help in these calculations. One well-used statistical calculator for various analyses, such as the Kruskal-Wallis H-test, can be done using the resources available at the VassarStats website via the link provided below. Use the data provided in this chapter section to see if you get the same results.

http://vassarstats.net /index.html

Try It!

Section 6.9 Nonparametric Test: Kruskal-Wallis H-Test

suk85842_06_c06.indd 224 10/23/13 1:40 PM

http://vassarstats.net/index.html

CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H-Test

Next each of the summed ranks are divided by their respective sample sizes, completing Formula 6.7.

H 5 12

24124 1 12 1 c 185.52 2

8 1

11302 2 8

1 184.52 2

8 d 2 3124 1 12

H 5 12 3 17,310.252 1 116,9002 1 17,140.252 4 2 3124 1 12 H 5 0.022 (913.78 1 2,112.50 1 892.53) 2 69

H 5 0.022 (3,968.31) 2 69

H 5 18.30

The H statistic approximates a chi-square (x2) distribution, which will be discussed in Chapter 11, based on k 2 1 degrees of freedom where k is the number of comparison groups. The chi-square distribution table in Table 6.10 has the critical values based on the degrees of freedom, that is N 2 1 5 23. Therefore, using the table, the x2critical 5 35.17 at the a 5 .05 level. As noted our x2observed value above 18.30 is less than this x

2 critical 5 35.17

value meaning that there is no significant difference between groups. As noted in the ANOVA conducted earlier in the chapter, it was expected that a nonsignificant outcome would occur. Nonparametric tests are more conservative compared to parametric ones in that there is a lower probability of finding a significant outcome compared to its paramet- ric counterpart. This also leads to a lower probability of a type I error.

Table 6.10: Chi-square distribution

Area to the right of critical value

Degrees of freedom

0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01

1 — 0.001 0.004 0.016 2.706 3.841 5.024 6.635

2 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210

3 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345

4 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277

5 0.554 0.831 1.145 1.610 9.236 11.071 12.833 15.086

6 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812

7 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475

8 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090

9 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666

10 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209

11 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725

12 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217

(continued)

suk85842_06_c06.indd 225 10/23/13 1:40 PM

CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H-Test

Table 6.10: Chi-square distribution (continued)

Area to the right of critical value

Degrees of freedom

0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01

13 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688

14 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141

15 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578

16 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000

17 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409

18 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805

19 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191

20 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566

21 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932

22 9.542 10.982 12.338 14.042 30.813 33.924 36.781 40.289

23 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638

24 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980

25 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314

26 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642

27 12.879 14.573 16.151 18.114 36.741 40.113 43.194 46.963

28 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278

29 14.257 16.047 17.708 19.768 39.087 42.557 45.722 49.588

30 14.954 16.791 18.493 20.599 40.256 43.773 46.979 50.892

As you will see in the next section, when this analysis is performed in SPSS a x2 value is given and not an H value per se.

SPSS Steps for the Kuskal-Wallis H-test

Reexamining the data set used Figure 6.6, but rearranging the data as depicted in Figure 6.11, the employee groups (Position) are categorically coded with 1 5 Laborers, 2 5 Cleri- cal, and 3 5 Professional. To execute, go to Analyze S Nonparametric Tests S Legacy Dialogs S K Independent Samples. As shown in Figure 6.12, input Optimism (DV) into the Test Variable List box and Position (IV) into the Grouping Variable box, then click the Define Range button just below to input the range of codes for the Position variable— this will be 1 and 3 for the minimum and maximum codes, respectively. Then click OK.

suk85842_06_c06.indd 226 10/23/13 1:40 PM

CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H-Test

Figure 6.11: Data set in SPSS

Figure 6.12: The Kruskal-Wallis H-test steps in SPSS

suk85842_06_c06.indd 227 10/23/13 1:40 PM

CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H-test

Interpreting Results

The output in Figure 6.13 shows the results of the Kuskal-Wallis H-test. The x2 value in the Test Statistics table shows a result as KW—x2(3) 5 7.17, p , .05 as there is an overall sta- tistical difference in optimism amongst the three employee groups. This can be seen in the Ranks table where Laborers’ mean ranks (MR 5 21.80) is highest compared to the lowest by Professionals (MR 5 6.30). Consequently, the post hoc tests are not readily available as they were for the ANOVA, so follow-up Mann-Whitney U or Wilcoxon rank-sum tests of all possible combinations will have to be performed (see Chapter 5 for these procedures).

The conclusion to these results would read as follows:

Based on the Kruskal-Wallis H-test there is a significant difference in the level of optimism of the three groups (KW—x2(3) 5 17.17, p , .05). Labor- ers reported the highest level of optimism (MR 5 21.80) followed by Cleri- cal positions (MR 5 18.40), and then Professionals (MR 5 6.30), which reported the lowest level of optimism.

Figure 6.13: The Kruskal-Wallis H-test output

Groups

Optimism

Laborers

Clerical

Professionals

Total

Position N

Mean Rank

21.80

18.40

6.30

Ranks

Chi-Square 17.166

2df

Asymp. Sig. .000

Test Statisticsa,b Optimism

a. Kruskal Wallis Test

b. Grouping Variable: Position

suk85842_06_c06.indd 228 10/23/13 1:40 PM

CHAPTER 6Summary

Summary This chapter is the natural extension of Chapters 4 and 5. Like the z- and t-tests, analysis of variance is a test of significant differences. Also like the z- and t-tests, the IV in ANOVA is nominal and the DV is interval or ratio. With each procedure—whether z, t, or F—the test statistic is a ratio of the differences between groups to the differences within groups (Objective 3).

There are differences between ANOVA and the earlier procedures, of course. The vari- ance statistics are sums of squares and mean squares values. But perhaps the most impor- tant difference is that ANOVA can accommodate any number of groups (Objectives 2 and 3). Remember that trying to deal with multiple groups in a t-test introduces the problem of mounting type I error when repeated analyses with the same data indicate statistical significance. One-way ANOVA lifts the limitation of a one-pair-at-a-time comparison (Objective 1).

The other side of multiple comparisons, however, is the difficulty of determining which comparisons are statistically significant when F is significant. This problem is solved with the post hoc test. In this chapter, we used Tukey’s HSD (Objective 4). There are other post hoc tests, each having their strengths and drawbacks, but HSD is one of the most widely used.

Years ago, the emphasis in the scholarly literature was on whether a result was statisti- cally significant. Today, the focus is on measuring the effect size of a significant result, a statistic that in the case of analysis of variance can indicate how much of the variability in the dependent variable can be attributed to the effect of the independent variable. We answered that question with eta-squared (h2). But neither the post hoc test nor eta-squared is relevant if the F is not significant (Objective 5). Then, further ANOVAs were executed in SPSS, and the results were presented (Objective 6) in APA format and interpreted accord- ingly (Objective 7). Finally, the nonparametric equivalent of ANOVA, Kruskal-Wallis H-test, was discussed as an alternative method and compared to its parametric equivalent, the ANOVA. The same data set was used to compare outcomes. In addition, an appropri- ate example in SPSS was provided (Objective 8).

The independent t-test and the one-way ANOVA both require that groups be indepen- dent. What if they are not? What if we wish to measure one group twice over time, or perhaps more than twice? Such dependent-groups procedures are the focus of Chapter 7. Rather than different thinking, it is more of an elaboration of familiar concepts. For this reason, consider reviewing Chapter 5 and the independent t-test discussion before start- ing Chapter 7.

The one-way ANOVA dramatically broadens the kinds of questions the researcher can ask. The procedures in Chapter 7 for nonindependent groups represent the next incre- mental step.

suk85842_06_c06.indd 229 10/23/13 1:40 PM

CHAPTER 6Chapter Exercises

Key Terms

analysis of variance Fisher’s test that allows one to detect significant differences among any number of groups. The acro- nym is ANOVA.

error variance The variability in a measure unrelated to the variables being analyzed.

Eta-squared A measure of effect size for ANOVA. It estimates the amount of vari- ability in the DV explained by the IV.

F ratio The test statistic calculated in an analysis of variance problem. It is the ratio of the variance between the groups to the variance within the groups.

factor Refers to an IV, particularly in pro- cedures that involve more than one.

family-wise error An inflated type I error rate in hypothesis testing when doing mul- tiple tests with the assumption of different sets of data. Specifically, when comparing multiple groups in dyad combinations using a series of t-tests instead of executing one omnibus ANOVA.

homogeneity of variance When multiple groups of data are distributed similarly.

mean square The sum of squares divided by its degrees of freedom. This division allows the mean square to reflect a mean, or average, amount of variability from a source.

omnibus test A test of the overall sig- nificance of the model based on difference between sample means when there are more than two groups to compare. The test will not tell you which two means are sig- nificantly different, which is why follow-up post hoc comparisons are executed.

one-way ANOVA The ANOVA in its sim- plest form, this model has only one inde- pendent variable.

post hoc test A test conducted after a sig- nificant ANOVA or some similar test that identifies which among multiple possibili- ties is statistically significant.

sum of squares (SS) The variance measure in analysis of variance. They are literally the sum of squared deviations between a set of scores and their mean.

sum of squares between The variability related to the independent variable and any measurement error that may occur.

sum of squares total Total variance from all sources.

sum of squares within Variability stem- ming from different responses from indi- viduals in the same group. It is exclusively error variance. Is also referred to as the sum of squares error or the sum of squares residual.

Chapter Exercises

Answers to Try It! Questions The answers to all Try It! questions introduced in this chapter are provided below.

A. The “one” in one-way ANOVA refers to the fact that this test accommodates just one independent variable.

B. There is no gender variable in the analysis and consequently, gender-related variance emerges as error variance. The same would be true for any variability in scores stemming from any variable not being analyzed in the study.

suk85842_06_c06.indd 230 10/23/13 1:40 PM

CHAPTER 6Chapter Exercises

C. It would take 15 comparisons! The answer is the number of groups (6) times the number of groups minus 1 (5), with the product divided by 2: 6 3 5 5 30/2 5 15.

D. The only way SS values can be negative is if there has been a calculation error. Because the values are all squared values, if they have any value other than 0, they have to be positive.

E. The difference between SStot and SSwith is the SSbet. F. If F 5 4 and MSwith 5 2, then MSbet 5 8 because F 5 MSbet 4 MSwith. G. The answer is neither. If F is not significant, there is no question of which group

is significantly different from which other group because any variability may be nothing more than sampling variability. By the same token, there is no effect to calculate because, as far as we know, the IV does not have any effect on the DV.

H. F 5 t2

Review Questions The answers to the odd-numbered items can be found in the answers appendix.

1. Several people selected at random are given a story problem to solve. They take 3.5, 3.8, 4.2, 4.5, 4.7, 5.3, 6.0, and 7.5 minutes. What is the total sum of squares for this data?

2. Identify the following symbols and statistics in a one-way ANOVA: a. The statistic that indicates the mean amount of difference between groups. b. The symbol that indicates the total number of participants. c. The symbol that indicates the number of groups. d. The mean amount of uncontrolled variability.

3. The theory is that there are differences by gender in manifested aggression. With data from Measuring Expressed Aggression Numbers (MEAN), a researcher has the following:

Males: 13, 14, 16, 16, 17, 18, 18, 18

Females: 11, 12, 12, 14, 14, 14, 14, 16

Complete the problem as an ANOVA. Is the difference statistically significant?

4. Complete Exercise 3 as an independent t-test and demonstrate the relationship between t2 and F.

5. Even with a significant F, there is never a need for a post hoc in a two-group ANOVA. Why?

6. A researcher completes an ANOVA in which the number of years of education completed is analyzed by ethnic group. If h2 5 .36, how should that be interpreted?

suk85842_06_c06.indd 231 10/23/13 1:40 PM

CHAPTER 6Chapter Exercises

7. Three groups of clients involved in a program for substance abuse attend weekly sessions for 8, 12, and 16 weeks. The DV is the number of days drug free.

8 weeks: 0, 5, 7, 8, 8

12 weeks: 3, 5, 12, 16, 17

16 weeks: 11, 15, 16, 19, 22

a. Is F significant? b. What is the location of the significant difference? c. What does the effect size indicate?

8. Regarding Exercise 7, a. what is the IV? b. what is the scale of the IV? c. what is the DV? d. what is the scale of the DV?

9. For an ANOVA problem, k 5 4 and n 5 8.

If SSbet 5 24.0

and SSwith 5 72,

a. what is F? b. is the result significant?

10. Consider this partially completed ANOVA table:

Source SS df MS F Fcrit Total 94

Between 2

Within 63 3

a. What must be the value of N 2 k? b. What must be the value of k? c. What must be the value of N? d. What must SSbet be? e. Determine MSbet. f. Determine F. g. What is Fcrit?

Analyzing the Research Review the article abstracts provided below. You can then access the full articles via your university’s online library portal to answer the critical thinking questions. Answers can be found in the answers appendix.

Using ANOVA for an Emotions Study

Carolan, L. A., & Power, M. J. (2011). What basic emotions are experienced in bipolar disorder? Clinical Psychology & Psychotherapy, 18(5), 366–378.

suk85842_06_c06.indd 232 10/23/13 1:40 PM

CHAPTER 6Chapter Exercises

Article Abstract

Aims: The aims of this study were to investigate the basic emotions experienced within and between episodes of bipolar disorder and, more specifically, to test the predictions made by the Schematic, Propositional, Analogical and Associative Representation Sys- tems (SPAARS) model that mania is predominantly characterized by the coupling of happiness with anger whereas depression (unipolar and bipolar) primarily comprises a coupling between sadness and disgust.

Design: Across-sectional design was employed to examine the differences within and between the bipolar, unipolar and control groups in the emotional profiles. Data were analyzed using one-way ANOVAs.

Method: Psychiatric diagnoses in the clinical groups were confirmed using the Structured Clinical Interview for DSM-IV (SCID). It was not administered in the control group. Cur- rent mood state was measured using the Beck Depression Inventory-II, the State–Trait Anxiety Inventory and the Bech–Rafaelsen Mania Scale. The Basic Emotions Scale was used to explore the emotional profiles.

Results: The results confirmed the predictions made by the SPAARS model about emo- tions in mania and depression. Out with these episodes, individuals with bipolar disorder experienced elevated levels of disgust.

Discussion: Evidence was found in support of the proposal of SPAARS that there are five basic emotions, which form the basis for both normal emotional experience and emotional disorders. Disgust is an important feature of bipolar disorder. Strengths and limitations are discussed, and suggestions for future research are explored.

Critical Thinking Questions

1. Why does this study use a one-way ANOVA instead of a t-test?

2. What means are being compared in the bipolar group in this study?

3. According to the following ANOVA results between bipolar and unipolar groups, which result(s) showed significance?

F(1,46) 5 0.00; p 5 .93

F(1,19.22) 5 9.81; p 5 .005

F(1,45) 5 1.26; p 5 .26

F(1,44) 5 0.02; p 5 .87

F(1,45) 5 0.13; p 5 .71

4. What types of post hoc test did the paper use as a follow-up to the F statistic?

suk85842_06_c06.indd 233 10/23/13 1:40 PM

CHAPTER 6Chapter Exercises

Using ANOVA for a Health and Physical Activity Study

Bize, R., & Plotnikoff, R. C. (2009). The relationship between a short measure of health status and physical activity in a workplace population. Psychology, Health & Medi- cine, 14(1), 53–61.

Article Abstract

Many interventions promoting physical activity (PA) are effective in preventing disease onset, and although studies have found a positive relationship between health-related quality of life (HRQL) and PA, most of these studies have focused on older adults and those with chronic conditions. Less is known regarding the association between PA level and HRQL among healthy adults. Our objective was to analyse the relationship between PA level and HRQL among a sample of 573 employees aged 20–68 taking part in a work- place intervention to promote PA. Measures included HRQL (using a single item) and PA (i.e., Godin Leisure-Time Questionnaire). The Modified Canadian Aerobic Fitness Test (MCAFT) was also completed by 10% of the employees. MET-minute scores (assess- ing energy expenditure over one week) were compared across HRQL categories using ANOVA. A multiple linear regression analysis was conducted to further examine the rela- tionship between HRQL and PA, controlling for potential covariates. Participants in the higher health status categories were found to report higher levels of energy expenditure (one-way ANOVA, p , 0.001). In the multiple linear regression model, each unit increase in health status level translated in a mean increase of 356 MET-minutes in energy expen- diture (p , 0.001). This single-item assessment of health status explained six percent of the variance in energy expenditure. The study concludes that higher energy expenditure through PA among an adult workplace population is positively associated with increased health status, and it also suggests that a single-item HRQL measure is suitable for com- munity- and population based studies, reducing response burden and research costs.

Critical Thinking Questions

1. Why did this study execute a Kruskal-Wallis H-test?

2. It was stated that the higher health status categories reported higher mean energy expenditure of the one-way ANOVA, and the Kruskal-Wallis yielded similar results. To make this plausible, what would the significance level of the Kruskal- Wallis have been?

3. After evaluating figure 1, we can see there is a difference in higher health status and higher energy expenditure. From this information, should they have run a post hoc test? Why or why not?

suk85842_06_c06.indd 234 10/23/13 1:40 PM