Order #371614 Topic: ANOVA
Chapter 11: Tests on three or more means using the a one way Anova
In Chapter 10, we considered various techniques used by researchers when they apply inferential statistics within studies focusing on one or two means. I now wish to extend that discussion by considering the main inferential technique used by researchers when their studies involve three or more means. The popular technique used in these situations is called analysis of variance and it is abbreviated as ANOVA. As I pointed out in Chapter 10, the analysis of variance can be used to see if there is a significant difference between two sample means. Hence, this particular statistical technique is quite versatile. It can be used when a researcher wants to compare two means, three means, or any number of means. It is also versatile in ways that will become apparent in Chapters 13, 14, and 19. The analysis of variance is an inferential tool that is widely used in many disciplines. Although a variety of statistical techniques have been developed to help applied researchers deal with three or more means, ANOVA ranks first in popularity. Moreover, there is a big gap between ANOVA and whatever ranks second! In the current chapter, we will focus our attention on the simplest version of ANOVA, something called a one-way analysis of variance. I begin with a discussion of the statistical purpose of a one-way ANOVA, followed by a clarification of how a one-way ANOVA differs from other kinds of ANOVA. Then, we turn our attention to the way researchers present the results of their one-way ANOVAs, with examples to show how the Bonferroni adjustment technique is used in conjunction with one-way ANOVAs, how the assumptions underlying a one-way ANOVA are occasionally tested, and how researchers sometimes concern themselves with power analyses and effect size. Finally, I offer a few tips that should serve to make you better able to decipher and critique the results of one-way ANOVAs.
The Purpose of a One-Way ANOVA When a study has been conducted in which the focus is centered on three or more groups of scores, a one-way ANOVA permits the researcher to use the data in the samples for the purpose of making a single inferential statement concerning the means of the study’s populations. Regardless of how many samples are involved, there is just one inference that extends from the set of samples to the set of populations. This single inference deals with the question, “Are the means of the various populations equal to one another?” Figure 11.1 illustrates what is going on in a one-way ANOVA. There are three things to notice about this picture. First, our picture illustrates the specific situation where there are three comparison groups in the study; additional samples and populations can be added to parallel studies that have four, five, or more comparison groups. Second, there is a single inference made from the full set of sample data to the group of populations. Finally, the focus of the inference is on the population means, even though each sample is described in terms of M, SD, and n. Although you may never come across a journal article that contains a picture like that presented in Figure 11.1, I hope my picture helps you understand what is going on when researchers talk about having applied a one-way ANOVA to their data. Consider, for example, Excerpt 11.1, which comes from a study focused on the use of multimedia technology to assist students as they read an electronic essay. All 69 of the research participants (college freshmen) individually read the same essay. However, the technology-based learning aids were different. Students in one group could click on any of 42 highlighted vocabulary words and see a 1n = 232
EXCERPT 11.1•
Typical Data Used in a One-Way ANOVA TABLE 4 Means and standard deviations for reading comprehension test Group Mean SD Definition (D) 15.30 3.57 Definition & picture (DP) 17.43 3.69 Definition & movie 16.09 3.55 Source: Akbulut, Y. (2007). Effects of multimedia annotations on incidental vocabulary learning and reading comprehension of advanced learners of English as a foreign language. Instructional Science, 35(6), 499–517.
definition. Members of the second group had the same words highlighted, but a click on any of them brought forth a picture along with the definition. Those in the third group , if they clicked a highlighted word, saw the definition plus a brief 10–15 second video. After reading the essay, all students were given a 34-item reading comprehension test covering the material in the essay. 1 With Figure 11.1 fresh in your mind, you should be able to look at Excerpt 11.1 and discern what the researcher was trying to accomplish by using a one-way ANOVA. Each row of data in this excerpt, of course, corresponds to one of the study’s three samples. Connected to each sample was an abstract population (i.e., a larger group of students, like the ones in the study, who theoretically could be given the essay and the reading comprehension test). The researchers’ goal was to use the data from all three samples to make a single inference concerning the means of those populations. The statistical question dealt with by the one-way ANOVA could be stated as: “In light of the empirical information available in the samples, is it reasonable to think that the mean score on the reading comprehension test is the same across the three populations?” As you can see, the sample means in Excerpt 11.1 turned out to be different from each other. Based on the fact that the Ms in this study were dissimilar, you might be tempted to think that there was an easy answer to the inferential question being posed. However, the concept of sampling error makes it impossible to simply look at the sample means, see differences, and then conclude that the population means are also different. Possibly, the population means are identical, with the sample means being dissimilar simply because of sampling error. Or, maybe the discrepancy between the Ms is attributable to dissimilarities among the population means. A one-way ANOVA helps researchers decide, in a scientific manner, whether their sample means are far enough apart to place their eggs into the second of these two proverbial baskets.
The Distinction between a One-Way ANOVA and Other Kinds of ANOVA In this chapter, we are focusing our attention on the simplest kind of ANOVA; the kind that is referred to as a one-way ANOVA, as a one-factor ANOVA, or as a simple ANOVA. Because there are many different kinds of analysis variance, it is important to clarify the difference between the kind that we are considering in this chapter and the more complex kinds of ANOVA that are discussed in Chapters 13, 14, and 19. (Some of the more complex kinds of analysis of variance have the labels two-way ANOVA, repeated measures ANOVA, and multivariate ANOVA.) Although all ANOVAs are alike in that they focus on means, they differ in three main respects: the number of independent variables, the number of dependent variables, and whether the samples are independent or correlated. In terms of these distinguishing characteristics, a one-way ANOVA has one independent variable, it focuses on one dependent variable, and it involves samples that are independent. It is worthwhile to consider each of these defining elements of a one-way ANOVA because researchers sometimes use the term ANOVA by itself without the clarifying adjective one-way. When we say that there is just one independent variable, this means that the comparison groups differ from one another, prior to the collection and analysis of any data, in one manner that is important to the researcher. The comparison groups can differ in terms of a qualitative variable (e.g., favorite TV show) or in terms of a quantitative variable (e.g., number of siblings), but there can be only one characteristic that defines how the comparison groups differ. Because the terms factor and independent variable mean the same thing within the context of analysis of variance, this first way in which a one-way ANOVA differs from other ANOVAs can be summed up in this manner: A one-way ANOVA has a single factor (i.e., one independent variable). Excerpt 11.2 comes from a study in which 904 school children in grades 3 through 6 completed a personality inventory called the SEARS-C. This instrument had been developed to measure a child’s opinion of his or her social–emotional strengths,
EXCERPT 11.2•
The Independent and Dependent Variables in a One-Way ANOVA Data on the SEARS-C were analyzed using a one-way, between-subjects analysis of variance to estimate the degree of difference across grade levels. . . . Grade was the independent variable with four levels: (a) 3rd grade, (b) 4th grade, (c) 5th grade, and (d) 6th grade. Total sum score on the SEARS-C was the dependent variable. Source: Cohn, B., Merrell, K. W., Felver-Grant, J., Tom, K., & Endrulat, N. R. (2009). Strengthbased assessment of social and emotional functions. Paper presented at the meeting of the National Association of School Psychologists, Boston.
with Likert items such as “I make friends easily” and “I stay calm when there is an argument or a problem.” A one-way ANOVA was used to see if differences existed among the four grade levels in terms of the children’s total score on the SEARS-C. As illustrated by Excerpt 11.2, some researchers identify explicitly the independent variable associated with their one-way ANOVA. However, many researchers choose not to do this and instead presume that their readers can figure out what the independent variable was based on a description of samples used in the study. By the end of this chapter, I believe that you will have little difficulty identifying the independent variable in any one-way ANOVA you encounter. As you might suspect, a two-wayANOVA has two independent variables, a threeway ANOVA has three independent variables, and so on. Beginning in Chapter 13, we consider some of these more complex ANOVAs. In this chapter, however, we restrict our focus to the kind of ANOVA that has a single independent variable. Even if there is just one independent variable within a study in which the analysis of variance is applied, the ANOVA may or may not be a one-way ANOVA. The second criterion that distinguishes one-way ANOVAs from many other kinds of ANOVAs has to do with the number of dependent variables involved in the analysis. With a one-way ANOVA, there is always just one dependent variable. (If there are two or more dependent variables involved in the same analysis, then you are likely to see the analysis described as a multivariate ANOVA, or MANOVA.) The dependent variable corresponds to the measured characteristic of people, animals, or things from whom or from which data are gathered. For example, in the study from which Excerpt 11.1 was taken, the dependent variable was the students’ reading comprehension as measured by the 34-item test administered after the students read the multimedia essay. In that excerpt, the table’s title lets us know what the dependent variable was. In Excerpt 11.2, the researchers came right out and told us what their dependent variable was. The third distinguishing feature of a one-way ANOVA concerns the fact that the comparison groups are independent (rather than correlated) in nature. As you may recall from the discussion in Chapter 10 of independent versus correlated samples, this means that (1) the people or animals who provide the scores in any given group are different from those who provide data in any other comparison group, and (2) there is no connection across comparison groups because of matching or because several triplets or litters were split up (with one member of each family being put into each of the comparison groups). It is possible for an ANOVA to be applied to the data that come from correlated samples, but I delay my discussion of that form of analysis until Chapter 14. As indicated in Excerpt 11.2, researchers sometimes refer to their study’s independent variable as being the between-subjects variable. The adjective betweensubjects is used to clarify that comparisons are being made with data that have come from independent samples. (Chapter 14 discusses one-way ANOVAs used in studies where the data come from correlated samples, and you will see that the independent variables in those studies are considered to be within-subjects each of the one-way ANOVAs discussed in this chapter involves data collected from separate groups of individuals who have not been matched in any way, every independent variable we encounter here is a between-subjects independent variable. Now, we turn our attention to the specific components of a one-way ANOVA, and we begin with a consideration of the one-way ANOVA’s null and alternative hypotheses.
The One-Way ANOVA’s Null and Alternative Hypotheses
The null hypothesis of a one-way ANOVA is always set up to say that the mean score on the dependent variable is the same in each of the populations associated with the study. The null hypothesis is usually written by putting equal signs between a set of s, with each representing the mean score within one of the populations. For example, if there were four comparison groups in the study, the null hypothesis would be . If you recall my claim (Chapter 7) that every null hypothesis must contain a pinpoint parameter, you may now be wondering how the symbolic statement at the end of the preceding paragraph qualifies as a legitimate null hypothesis because it does not contain a pinpoint number. In reality, there is a pinpoint number contained in that but it is simply hidden from view. If the population means are all equal to one another, then there is no variability among those means. Therefore, we can bring ’s pinpoint number into plain view by rewriting the null hypothesis as As we said earlier, however, you are more likely to see written with Greek mus and equal signs and no pinpoint number (e.g., ) rather than with a sigma squared (with as a subscript) set equal to zero. In Excerpts 11.3 and 11.4, we see examples of one-way ANOVA null hypotheses that have appeared in research summaries. Notice that these two excerpts m 0 : m1 = m2 = m3 0 H0 : s 2 m = 0. H0 H0 , H0 : m1 = m2 = m3 =. are similar in that each null hypothesis deals with its study’s population means. Moreover, both null hypotheses have been set up to say that there are no differences among the population means. These excerpts differ, however, in that the first study involved three populations, whereas there were four in the second study. The researchers associated with Excerpts 11.3 and 11.4 deserve high praise for taking the time to articulate the null hypothesis associated with their one-way ANOVAs. The vast majority of researchers do not do this. They tell us about the data they collected and what happened in terms of results, but they skip over the important first step of hypothesis testing. Perhaps they assume that readers will know what the null hypothesis was. In hypothesis testing, of course, the null hypothesis must be accompanied by an alternative hypothesis. This always says that at least two of the population means differ. Using symbols to express this thought, we get . Unfortunately, the alternative hypothesis is rarely included in technical discussions of research studies. Again, researchers evidently presume that their readers are familiar enough with the testing procedure being applied and familiar enough with what goes on in a one-way ANOVA to know what is without being told.
EXCERPTS 11.3–11.4• The Null Hypothesis in a One-Way ANOVA The null hypothesis is constructed as (the improvement are equal), which essentially implies that the three feedback strategies are identically effective. Source: Shen, Y. (2010). Evaluation of an eye tracking device to increase error recovery by nursing students using human patient simulation. Unpublished Master’s theses, University of Massachusetts, Amherst. ANOVA test was used to identify the relationship between BMI categories and age. The null-hypothesis is was tested. Source: Rosnah, M.Y., Mohd R. H., & Sharifah_Norazizan, S.A.R. (2009). Anthropometry dimensions of older Malaysians: Comparison of age, gender and ethnicity. Asian Social Science, 5(6), 133–140. 1H0 : m1 = m2 = m3 = m4 2 H0 : m1 = m2 =
Presentation of Results
The outcome of a one-way ANOVA is presented in one of two ways. Researchers may elect to talk about the results within the text of their report and to present an ANOVA summary table. However, they may opt to exclude the table from the report and simply describe the outcome in a sentence or two of the text. (At times, a researcher wants to include the table in the report but is told by the journal editor to delete it due to limitations of space.) Once you become skilled at deciphering the way results are presented within an ANOVA summary table, I am confident that you will have no difficulty interpreting results presented within a “tableless” report. For this reason, I begin the next section with a consideration of how the results of one-way ANOVAs are typically presented in tables. I divide this discussion into two sections because some reports contain the results of a single one-way ANOVA, whereas other reports present the results of many one-way ANOVAs. Results of a Single One-Way ANOVA In Excerpt 11.5, we see the ANOVA summary table for the data presented earlier in Excerpt 11.1. As you may recall, that earlier excerpt, and now this new one, come from a study wherein the researchers wanted to know if different kinds of multimedia “learning aids” differ in their ability to help students with their reading comprehension when confronted with an electronic essay. You might want to take a quick look at Excerpt 11.1 before proceeding.
EXCERPT 11.5•
Results from a One-Way ANOVA Presented in a Table TABLE 5 ANOVA summary table for the reading comprehension test Source SS df MS F Between groups 53.420 2 26.710 2.054 Within groups 858.348 66 13.005 Total 911.768 68 Source: Akbulut, Y. (2007). Effects of multimedia annotations on incidental vocabulary learning and reading comprehension of advanced learners of English as a foreign language. Instructional Science, 35(6), 499–517. The ANOVA summary table is provided in Table 5. As the table suggests, the reading comprehension scores of the groups did not differ significantly p = .1362. 1F2,66 = 2.054.
In Excerpt 11.5, the number 2.054 is the calculated value, and it is positioned in the column labeled F. That calculated value was obtained by dividing the mean square (MS) on the “Between groups”row of the table (26.710) by the mean square on the “Within groups”row of the table (13.005). Each row’s MS value was derived by dividing that row’s sum of squares (SS) value by its df value. 2 Those SS values came from an analysis of the sample data. The df values, however, came from simply counting the number of groups, the number of people within each group, and the total number of participants—with 1 subtracted from each number to obtain the df values presented in the table. 3 The first two df values determined the size of the critical value against which 2.054 was compared. (That critical value, at the .05 level of significance, was equal to 3.14.) I am confident that a computer used the df values of 2 and 66 to determine the critical value, and then the computer said that p was equal to .136. This is the standard way of getting a precise p-value like the one reported in Excerpt 11.5. (In a few cases, the researcher uses the two df values to look up the size of the critical value in a statistical table located in the back of a statistics book. Such tables allow researchers to see how big the critical values are at the .05, .01, and .001 levels of significance.) Whereas the df numbers in a one-way ANOVA have a technical and theoretical meaning (dealing with things called central and noncentral F distributions), those df numbers can be useful to you in a very practical fashion. To be more specific, you can use the first and the third df to help you understand the structure of a completed study.
To show you how this is done, let’s focus on Excerpts 11.5 and 11.1. By adding 1 to the between groups df, you can determine, or verify, there were groups in this study. By adding 1 to the total df, you can figure out that there were students who read the multimedia essay and then took the reading comprehension test. In the ANOVA summary table displayed in Excerpt 11.5, the second row of numbers was labeled “Within groups.” I would be remiss if I did not warn you that a variety of terms are used by different researchers to label this row of a one-way ANOVA summary table. On occasion, you are likely to see this row referred to as Within, Error, Residual, or Subjects within groups. Do not let these alternative labels throw you for a loop. If everything else about the table is similar to the table we have just examined, then you should presume that the table you are looking at is a one-way ANOVA summary table. Some of the one-way ANOVA summary tables you see are likely to be modified in two ways beyond the label given to the middle row of numbers. Some researchers delete entirely the bottom row (for Total), leaving just the rows for “Between groups” and “Within groups.” Other researchers either switch the location of the df and SS columns or delete altogether the SS column. These variations do not affect the main things being communicated by the ANOVA summary table, nor do they hamper your ability to use such tables to as you try to make sense out of a study’s structure and its results. Because the calculated F-value and the p-value are considered to be the two most important numbers in a one-way ANOVA summary table, those values are sometimes pulled out of the summary table and included in a table containing the comparison group means and standard deviations. By doing this, space is saved in the research report because only one table is needed rather than two. Had this been done in Excerpt 11.1, a note might have appeared beneath Table 4 saying “ ” Because this kind of note indicates the df for between groups and within groups, you can use these numbers to determine how many people were involved in the study if the sample sizes are not included in the table. A note like this does not contain SS or MS values, but you really do not need them. Although the results of a one-way ANOVA are sometimes presented in a table similar to the one we have just considered, more often the outcome of the statistical analysis is discussed in the text of the report, with no table included. In Excerpt 11.6, we see an illustration of a one-way ANOVA being summarized in five sentences.
EXCERPT 11.6• Results from a One-Way ANOVA Presented Without a Table The purpose of our experiment was to study how different output modalities of a GPS navigation guide affect drivers and driving performance during real traffic driving. We included three output modalities namely audio, visual, and audio-visual [and used]
EXCERPTS 11.6•(continued)
30 people ranging between 21–38 years of age. . . . We utilized a between-subject experimental design [and] assigned five GPS system users and five non-users to each of the three configurations (which constitute three groups of ten). . . . Our experiment showed that participants using the audio configuration on average had 8.8 [speeding] violations during the trails whereas visual participants on average had 17.9 violations and audio-visual participants had 19.3 violations. An ANOVA test showed significant difference among the three configurations, Source: Jensen, B. S., Skov, M. B., & Thiruravichandran, N. (2010). Studying driver attention and behaviour for three configurations of GPS navigation in real traffic driving. Proceedings of the 28th International Conference on Human Factors in Computing Systems, 1271–1280. F12,272 = 6.67, p 6 0.01. EXCERPTS
In the study associated with Excerpt 11.6, the independent variable was the kind of output produced by a car’s GPS navigation system. Ten of the study’s drivers had a NAV system that just talked, 10 others had a system that provided only visual output, and a third group of 10 had a system that provided both auditory and visual output. The dependent variable was the number of speeding violations recorded on videotape as the drivers drove 16 kilometers through both rural and urban sections of Denmark. The excerpt’s final sentence contains the ANOVA’s results, with the calculated value (6.67) being shown. The p-statement at the end of the excerpt indicates that the three samples had mean scores that were further apart than would be expected by chance. Accordingly, the null hypothesis was rejected. Excerpt 11.6 also contains two numbers in parentheses next to the letter F. These are the df values taken from the between groups and within groups rows of the one-way ANOVA summary table. By adding 1 to the first of these df values, you can verify or determine how many groups were compared. To figure out or verify how many people were involved in the study, you must add the two df values together and then add 1 to the sum. Thus, it is possible from the excerpt to determine that 30 drivers provided data for the ANOVA. As you can see, Excerpt 11.6 ends with the statement “ ” This small decimal number (0.01) was not the researchers’ level of significance. In their research report, the investigators did not indicate their alpha level; however, I can tell from their full set of results that it was set equal to .05. By reporting p as being less than .01 (rather than by saying ), the researchers wanted others to realize that the three population means, if identical, would have been very unlikely to yield sample means as dissimilar as those actually associated with this study’s three comparison groups.
variable. Although such data sets can be analyzed in various ways, many researchers choose to conduct a separate one-way ANOVA for each of the multiple dependent variables. The ANOVA results are sometimes presented in tables; however, more often than not the findings are presented only in the text of the research report. Excerpt 11.7 contains such a presentation. In the study from which Excerpt 11.7 was taken, each respondent to the online survey was put into one of three groups based on his or her engineering experience. Those in the “high” group had engineering degrees and first-hand engineering experience; those in the “intermediate” group were learners of engineering; those in the “low” group were non-engineers who had little or no interest in engineering. The survey instrument assessed one’s self-concept for conducting an engineering design task, and it had four scales: self-efficacy, motivation, expectancy, and anxiety. A separate one-way ANOVA was used to compare the means of the three groups on each scale of the self-confidence instrument. As you read the passage in Excerpt 11.7, you ought to be able to determine what the independent variable was, how many individuals were in the study, how many null hypotheses were tested, and what decision was made regarding each . You also should be able to indicate what level of significance was used and which numbers are the calculated values. Most important, you should be able to explain the meaning of each null hypothesis that was tested.
EXCERPT 11.7• Results from Several Separate One-Way ANOVAs A 36-item online instrument was developed and administered to [several] individuals with different levels of engineering experience. . . . A one-way ANOVA was conducted to compare mean scores on self-efficacy, motivation, outcome expectancy, and anxiety toward engineering design for the three groups. There were statistically significant effects on all four task-specific self-concepts at the level for the three groups . . . The results of this study demonstrate [that] engineering design self-concept is highly dependent on engineering experiences. This is evident in significant differences in taskspecific self-concepts among high, intermediate, and low engineering experience groups. Source: Carberry, A. R., Lee, S., & Ohland, M. W. (2010). Measuring engineering design selfefficacy. Journal of Engineering Education, 99(1), 71–79. Fexpectancy 12,1992 = 77.91, p 6 0.0012; Fanxiety 12,1992 = 8.76, p 6 0.0012]. [Fself-efficacy 12,1992 = 79.16, p 6 0.0012; F motivation 12,1992 = 71.73, p 6 0.0012;
If they had been presented, each null hypothesis in the engineering study would have looked the same: They would differ, however, with respect to the data represented by the For the first one-way ANOVA, the three population means would be dealing with self-efficacy; in the second null hypothesis, the three means would be dealing with motivation; and so on.
Thus, the four null hypotheses connected to the passage in Excerpt 11.7 were identical in terms of the number of and the group represented by each . Those population means differed, however, based on which of the four dependent variables was under consideration. Excerpt 11.7 is worth considering for one other reason. Note that the researchers indicate that the level of significance was set at .05, yet the result of each F-test is reported to be . This excerpt is like Excerpt 11.6 in the sense that both sets of researchers wanted to show that they rejected their null hypotheses “with room to spare.” This is a common practice, but it is not universal. A small number of researchers want their alpha level and their p-statements to be fully congruent. They do this by reporting if the null hypothesis is rejected at the .05 level of significance, even if the computer analysis reveals that the data-based p is smaller than .01 (or even smaller than .001).
The Bonferroni Adjustment Technique
In the preceding section, we looked at an example where separate one-way ANOVAs were used to assess the data from multiple dependent variables. In a situation such as this, there is an inflated Type I error risk unless something is done to compensate for the fact that multiple tests are being conducted. In other words, if the data associated with each of several dependent variables are analyzed separately by means of a one-way ANOVA, the probability of incorrectly rejecting at least one of the null hypotheses is greater than the common alpha level used across the set of tests. (If you have forgotten what an inflated Type I error risk is, return to Chapter 8 and read again the little story about the two gamblers, dice, and the bet about rolling, or not rolling, a six.) Several statistical techniques are available for dealing with the problem of an inflated Type I error risk. Among these, the Bonferroni adjustment procedure appears to be the most popular choice among applied researchers. As you may recall from our earlier consideration of this procedure, the researcher compensates for the fact that multiple tests are being conducted by making the alpha level more rigorous on each of the separate tests. In Excerpts 11.8 and 11.9, we see two cases where the Bonferroni technique was used. In the first of these excerpts, the researchers conducted two one-way ANOVAs, whereas five one-way ANOVAs were used in the second excerpt. In each of these studies, the researchers wanted the Type I error risk for their set of one-way ANOVAs to be no greater than .05. Therefore, they simply divided .05 by the numbers of tests being conducted. This caused alpha to change from .05 to .025 in Excerpt 11.8 and to .01 in Excerpt 11.9. Each modified in each study became the criterion against which its ANOVAs’p-values were compared. There is no law, of course, that directs all researchers to deal with the problem of an inflated Type I error risk when multiple one-way ANOVAs are used. Furthermore, there are circumstances where it would be unwise to take any form of
EXCERPTS 11.8–11.9• Use of the Bonferroni Adjustment Finally, we conducted one-way ANOVAs using a Bonferroni-corrected of to test for the two hypothesized rater differences [across the three groups] in mean fundamental and advanced MET adherence. Source: Martino, S., Ball, S., Nich, C., Frankforter, T. L. & Carroll, K. M. (2009). Correspondence of motivational enhancement treatment integrity ratings among therapists, supervisors, and observers. Psychotherapy Research, 19(2): 181–193. One-way ANOVA was used to detect differences [among the three gang groups]. The Bonferroni adjustment technique (alpha/number of statistical tests performed) was used to control for type I error due to alpha inflation. This was done for each individual question. Hence, the adjusted level of significance is Source: Lanier, M. M., Pack, R. P., & Akers, T. A. (2010). Epidemiological criminology: Drug use among African American gang members. Journal of Correctional Health Care, 16(1), 6–16. .05/5 = .01. .025 1.05/022
corrective action.
Nevertheless, I believe that you should value more highly those reports wherein the researcher either (1) does something (e.g., uses the Bonferroni procedure) to hold down the chances of a Type I error when multiple tests are conducted, or (2) explains why nothing was done to deal with the inflated Type I error risk. If neither of these things is done, you have a right to downgrade your evaluation of the study.
Assumptions of a One-Way ANOVA In Chapter 10, we considered the four main assumptions associated with t-tests, F-tests, and z-tests: independence, randomness, normality, and homogeneity of variance. My earlier comments apply as much now to cases in which a one-way ANOVA is used to compare three or more means as they did to cases in which two means are compared. In particular, I hope you recall the meaning of these four assumptions and my point about how these tests tends to be robust to the equal variance assumption when the sample sizes of the various comparison groups are equal. Many researchers who use a one-way ANOVA seem to pay little or no attention to the assumptions that underlie the F-test comparison of their sample means. Consequently, I encourage you to feel better about research reports that (1) contain discussions of the assumptions, (2) present results of tests that were conducted to check on the testable assumptions, (3) explain what efforts were made to get the data in line with the assumptions, or (4) point out that an alternative test having fewer assumptions
Assumptions of a One-Way ANOVA In Chapter 10
we considered the four main assumptions associated with t-tests, F-tests, and z-tests: independence, randomness, normality, and homogeneity of variance. My earlier comments apply as much now to cases in which a one-way ANOVA is used to compare three or more means as they did to cases in which two means are compared. In particular, I hope you recall the meaning of these four assumptions and my point about how these tests tends to be robust to the equal variance assumption when the sample sizes of the various comparison groups are equal. Many researchers who use a one-way ANOVA seem to pay little or no attention to the assumptions that underlie the F-test comparison of their sample means. Consequently, I encourage you to feel better about research reports that (1) contain discussions of the assumptions, (2) present results of tests that were conducted to check on the testable assumptions, (3) explain what efforts were made to get the data in line with the assumptions, or (4) point out that an alternative test having fewer assumptions.
than a regular one-way ANOVA F-test was used. Conversely, I encourage you to lower your evaluation of research reports that do none of these things. Consider Excerpts 11.10 and 11.11 (that deal with boars and bullfrogs, respectively). In both studies, the researchers used a one-way ANOVA. Before doing so, however, they screened their data regarding the normality and equal variances assumption. In the first excerpt, the Kolmogorov–Smirnov and Levene tests were applied to the sample data to see if the normality and homoscedasticity (i.e., equal variance) assumptions, regarding the populations, were untenable. Excerpt 11.11 illustrates how these same two assumptions can be dealt with by means of alternative procedures: the Shapiro–Wilk test and the Bartlett test. Whenever researchers use one of these tests to check an assumption, they hope that the test procedure will cause the null hypothesis (concerning the shape of and variability in the study’s populations) to be retained, not rejected. Assumptions should be met, not violated.
EXCERPTS 11.10–11.11• Attending to the Normality and Equal Variance Assumptions
These results were analysed using a one-way analysis of variance (ANOVA) with an independent factor (the treatment) and a dependent variable (a sperm quality parameter). . . . Before the ANOVA test was applied, the data were tested for normality and homoscedasticity using the Kolmogorov–Smirnov and Levene tests. Source: Yeste, Briz, M., Pinart, E., Sancho, S., Bussalleu, E., & Bonet, S. (2010). The osmotic tolerance of boar spermatozoa and its usefulness as sperm quality parameter. Animal Reproduction Science, 119(3–4), 265–274. Normality and homogeneity of variance of the data were tested using Shapiro–Wilk normality test and Bartlett’s test, respectively. Differences of target gene (StAR) expression were analyzed by one way analysis of variance (ANOVA). Source: Paden, N. E., Carr, J. A., Kendall, R. J., Wages, M., & Smith, E. E. (2010). Expression of steroidogenic acute regulatory protein (StAR) in male American bullfrog (Rana catesbeiana) and preliminary evaluation of the response to TNT. Chemosphere, 80(1), 41–45.
The researchers associated with Excerpts 11.10 and 11.11 set a good example by demonstrating a concern for the normality and equal variance assumptions of a one-way ANOVA. Unfortunately, many researchers give no indication that they thought about either of these assumptions. Perhaps they are under the mistaken belief that the F-test is always robust to violations of these assumptions. Or, perhaps they simply are unaware of the assumptions. In any event, I salute the researchers associated with Excerpts 11.10 and 11.11 for checking their data before conducting a one-way ANOVA.
Sometimes, preliminary checks on normality and the equal variance assumption suggest that the populations are not normal or have unequal variances. When this happens, researchers have four options: They can (1) identify and eliminate any existing outliers, (2) transform their sample data in an effort to reduce nonnormality or stabilize the variances, (3) use a special test that has a built-in correction for violations of assumptions, or (4) switch from the one-way ANOVA F-test to some other test that does not have such rigorous assumptions. In Excerpts 11.12, 11.13, and 11.14, we see cases in which researchers took the 2nd, 3rd, and 4th of these courses of action.
EXCERPTS 11.12–11.14• Options When Assumptions Seem Violated Assumptions of nonnormality were investigated graphically with the KolmogorovSmirnov test, and when significant, the distribution of log-transformed variables was also checked in the same way. The data were analyzed by one-way ANOVA. . . . Source: Kim, M-K., Tanaka, K, Kim, M-J., Matsuo, T., Tomita, T., Ohkubo, H., et al. (2010). Epicardial fat tissue: Relationship with cardiorespiratory fitness in men. Medicine and Science in Sports and Exercise, 42(3), 463–469. Homogeneity of variances was tested by Levene’s test and Welch’s ANOVA was used to compare group means when the group variances were unequal. Source: Shrestha, S., Ehlers, S. J., Lee, J., Fernandez, M., & Koo, S. I. (2009). Dietary Green Tea Extract Lowers Plasma and Hepatic Triglycerides and Decreases the Expression of Sterol Regulatory Element-Binding Protein-1c mRNA and Its Responsive Genes in Fructose-Fed, Ovariectomized Rats. Journal of Nutrition, 139(4), 640–645. [T]he assumption of homogeneity of variance failed for all variables, as the Levene’s tests turned out to be significant. Because parametric testing was not justified, nonparametric tests (Kruskal-Wallis) were conducted. Source: Terband, H., Maassen, B., Guenther, F. H., & Brumberg, J. (2009). Computational neural modeling of speech motor control in childhood apraxia of speech (CAS). Journal of Speech, Language & Hearing Research, 52(6), 1595–1609.
Of the four assumptions associated with a one-way ANOVA, the one that is neglected most often is the independence assumption. In essence, this assumption says that a particular person’s (or animal’s) score should not be influenced by the measurement of any other people or by what happens to others while the study is conducted. This assumption would be violated if different groups of students (perhaps different intact classrooms) are taught differently, with each student’s exam score being used in the analysis.
In studies where groups are a necessary feature of the investigation, the recommended way to adhere to the independence assumption is to have the unit of analysis (i.e., the scores that are analyzed) be each group’s mean rather than the scores from the individuals in the group. Many researchers shy away from using the group mean as the unit of analysis because doing this usually causes the sample size to be reduced dramatically. The solution, of course, is to use lots of groups!
Statistical Significance versus Practical Significance Researchers who use a one-way ANOVA can do either of two things to make their studies more statistically sophisticated than would be the case if they use the crude six-step version of hypothesis testing. The first option involves doing something after the sample data have been collected and analyzed. Here, the researcher can compute an estimate of the effect size. The other option involves doing something on the front end of the study, not the tail end. With this option, the researcher can conduct an a priori power analysis. Unfortunately, not all of the researchers who use a one-way ANOVA take the time to perform any form of analysis designed to address the issue of practical significance versus statistical significance. In far too many cases, researchers simply use the simplest version of hypothesis testing to test their one-way ANOVA’s They collect the amount of data that time, money, or energy allows, and then they anxiously await the outcome of the analysis. If their F-ratios turn out significant, these researchers quickly summarize their studies, with emphasis put on the fact that “significant findings” have been obtained. I encourage you to upgrade your evaluation of those one-way ANOVA research reports in which the researchers demonstrate that they were concerned about practical significance as well as statistical significance. Examples of such concern appear in Excerpts 11.15 and 11.16. In each case, the effect size associated with the ANOVA was estimated. As you can see, the effect size measures used in these studies were eta squared and Cohen’s f.
EXCERPTS 11.15–11.16• Estimating the Effect Size A one-way between groups ANOVA [indicated] that there were significant differences among marital status groups in the change score for sexual feelings and behavior toward men, The effect size calculated using eta squared was .08, indicating a moderate effect. Source: Karten, E. Y., & Wade, J. C. (2010). Sexual orientation change efforts in men: A client perspective. Journal of Men’s Studies, 18(1), 84–102.
EXCERPTS 11.15–11.16•(continued)
In order to examine the conversational differences among these three groups, a Oneway Analysis of variance was used [and revealed] significant overall differences among the groups on all three of these scales, with very large effect sizes . . . (Pragmatic Behaviors: ; Speech/Prosody Behaviors: ; Paralinguistic Behaviors: very large). Source: Paul, R., Orlovski, S. M., Marcinko, H. C., & Volkmar, F. (2009). Conversational behaviors in youth with high-functioning ASD and Asperger Syndrome. Journal of Autism & Developmental Disorders, 39(1), 115–125. p 6 .0001, f = 4.5 F = 14.8, p 6 .0001, f = 3.9, F = 34.2, p 6 .0001, f = 5.8 F = 20.2,
In Excerpts 11.15 and 11.16, notice the terms moderate effect and very large effect sizes. The researchers used these terms in an effort to interpret their effect size estimates. Most likely, they used the relevant information in Table 11.1 as a guide when trying to judge whether the estimated effect size number deserved the label small, medium, or large. (This table does not include any information about d, because this measure of effect size cannot be used when three or more means are compared.) In Excerpts 11.17 and 11.18, we see two cases in which teams of researchers performed a power analysis to determine the needed sample size for their one-way ANOVA. In the first of these excerpts, notice that the researchers specify an unstandardized effect size (10 degrees), a desired power level (80%), the standard deviation (15 degrees), and a level of significance (.05) as the ingredients of their power analysis. Doing this, they determined the needed sample size (35). The information in Excerpt 11.18 is less specific, but it is clear that those researchers also conducted an a priori power analysis to determine the needed sample size for each of the three comparison groups.
TABLE 11.1 Effect Size Criteria for a One-Way ANOVA Effect Size Measure Small Medium Large Eta ( ) .10 .24 .37 Eta Squared ( ) .01 .06 .14 Omega Squared ( ) .01 .06 .14 Partial Eta Squared ( ) .01 .06 .14 Partial Omega Squared ( ) .01 .06 .14 Cohen’s f .10 .25 .40 Note: These standards for judging relationship strength are quite general and should be changed to fit the unique goals of any given research investigation
EXCERPTS 11.17–11.18• An A Priori Power Analysis A priori power analysis showed that in order to have a power of 80% to detect a difference of as little as 10 degrees at the 0.05 level of significance assuming a standard deviation of 15 degrees, 35 women would be needed in each group. The increased enrolment improved the power of the study. Statistical analysis was performed using the one factor ANOVA. Source: Papadakis, M., Papadokostakis, G., Kampanis, N., Sapkas, G., Papadakis, S. A., & Katonis, P. (2010). The association of spinal osteoarthritis with lumbar lordosis. BMC Musculoskeletal Disorders, 11, 1–6. We carried out an a priori power analysis for one-way analysis of variance with 3 groups of participants, to calculate number of subjects needed to have sufficient power to detect effect sizes similar to those found in previous positive studies (ES 0.6 to 0.9 for amygdala and ES 0.9 for hippocampal volume decreases in pediatric or adolescent patients with BD) as statistically significant. Source: Hajek, T., Gunde, E., Slaney, C., Propper, L., MacQueen, G., Duffy, A., et al. (2009). Amygdala and hippocampal volumes in relatives of patients with bipolar disorder: A high-risk study. Canadian Journal of Psychiatry, 54(11), 726–733.
Cautions Before concluding this chapter, I want to offer a few tips that will increase your skills at deciphering and critiquing research reports based on one-way ANOVAs. Significant and Nonsignificant Results from One-Way ANOVAs When researchers say that they have obtained a statistically significant result from a one-way ANOVA, this means that they have rejected the null hypothesis. Because you are unlikely to see the researchers articulate the study’s (in words or symbols) or even see them use the term null hypothesis in discussing the results, it is especially important for you to remember (1) what the one-way ANOVA stipulates, (2) why is rejected, and (3) how to interpret correctly the decision to reject Although a one-way ANOVA can be used to compare the means of two groups, this chapter focuses on the use of one-way ANOVAs to compare three or more means. If the data lead to a significant finding when more than two means have been contrasted, it means that the sample data are not likely to have come from populations having the same This one-way ANOVA result does not provide any information as to how many of the values are likely to be dissimilar, nor does it provide any information as to whether any specific pair of populations are likely to have different values. The only thing that a significant result indicates is that the variability among the full set of sample means is larger than would be expected if all population means were identical. Usually, a researcher wants to know more about the likely state of the population means than is revealed by a one-way ANOVA. To be more specific, the typical researcher wants to be able to make comparative statements about pairs of population means, such as “ is likely to be larger than but and cannot be looked on as different based on the sample data.” To address these concerns, the researcher must move past the significant ANOVA F and apply a subsequent analysis. In Chapter 12 we consider such analyses, which are called, understandably, post hoc or follow-up tests.
In any research report discussing the results of a one-way ANOVA, you are given information about the sample means. If a significant result has been obtained, you may be tempted to conclude that each population mean is different from every other population mean. Do not fall prey to this temptation! To gain an understanding of this very important point, consider once again Excerpt 11.7. In that excerpt, the results of four one-way ANOVAs are presented from a study in which three groups differing in engineering experience responded to a survey that measured four components of self-confidence in performing an engineering design task.
The three means compared on anxiety were 38.77, 49.46, and 62.16, and the F-test yielded statistical significance with This result indicates that the null hypothesis of equal population means was rejected. Hence, it is legitimate to think that is probably not true. However, it is not legitimate to think that all three of the population means are different from each other. Perhaps two of the are the same, but different from the third 4 You must also be on guard when it comes to one-way ANOVAs that yield nonsignificant Fs. As I have pointed out on several occasions, a fail-to-reject decision should not be interpreted to mean that is true. Unfortunately, many researchers make this inferential mistake when different groups are compared in terms of mean scores on a pretest. The researchers’ goal is to see whether the comparison groups are equal to one another at the beginning of their studies, and they mistakenly interpret a nonsignificant F-test to mean that no group began with an advantage or a disadvantage. One-way ANOVAs, of course, can produce a nonsignificant F-value when groups are compared on things other than a pretest. Consider once again Excerpt 11.5. This ANOVA (comparing the mean reading comprehension scores of three multimedia groups) yielded a nonsignificant F of 2.054. This results should not be interpreted to mean that individuals like those involved in this study—computer savvy college freshmen from Turkey taking a course in English from the foreign language department—have the same mean reading comprehension scores regardH0 less of whether their multimedia learning aid is just definitions, definitions plus pictures, or definitions plus a brief movie when reading an online essay. In other words, neither you nor the researchers who conducted this study should draw the conclusion that because the p-value associated with the ANOVA F turned out to be larger than the selected level of significance. A one-way ANOVA, if nonsignificant, cannot be used to justify such an inference.
Confidence Intervals
In Chapter 10, you saw how a confidence interval can be placed around the difference between two sample means. You also saw how such confidence intervals can be used to test a null hypothesis, with rejected if the null’s pinpoint numerical value lies beyond the limits of the interval. As we now conclude our consideration of how researchers compare three or more means with a one-way ANOVA, you may be wondering why I have not said anything in this chapter about the techniques of estimation. When a study’s focus is on three or more means, researchers occasionally build a confidence interval around each of the separate sample means. This is done in situations where (1) there is no interest in comparing all the means together at one time or (2) there is a desire to probe the data in a more specific fashion after the null hypothesis of equal population means has been tested. Whereas researchers sometimes use interval estimation (on individual means) in lieu of or as a complement to a test of the hypothesis that all are equal, interval estimation is not used as an alternative strategy for testing the one-way ANOVA null hypothesis. Stated differently, you are not likely to come across any research studies where a confidence interval is put around the variance of the sample means in order to test Other Things to Keep in Mind If we momentarily lump together this chapter with the ones that preceded it, it is clear that you have been given a slew of tips or warnings designed to help you become a more discerning recipient of research-based claims. Several of the points are important enough to repeat here. 1. The mean is focused on in research studies more than any other statistical concept. In many studies, however, a focus on means does not allow the research question to be answered because the question deals with something other than central tendency. 2. If the researcher’s interest resides exclusively in the group(s) from which data are collected, only descriptive statistics should be used in analyzing the data. 3. The reliability and validity of the researcher’s data are worth considering. To the extent that reliability is lacking, it is difficult to reject even when is false. To the extent that validity is lacking, the conclusions drawn will be H0 H0 H0 : sm 2 = 0.
Other Things to Keep in Mind
If we momentarily lump together this chapter with the ones that preceded it, it is clear that you have been given a slew of tips or warnings designed to help you become a more discerning recipient of research-based claims. Several of the points are important enough to repeat here. 1. The mean is focused on in research studies more than any other statistical concept. In many studies, however, a focus on means does not allow the research question to be answered because the question deals with something other than central tendency. 2. If the researcher’s interest resides exclusively in the group(s) from which data are collected, only descriptive statistics should be used in analyzing the data. 3. The reliability and validity of the researcher’s data are worth considering. To the extent that reliability is lacking, it is difficult to reject even when is false. To the extent that validity is lacking, the conclusions drawn will be unwarranted because of a mismatch between what is truly being measured and what the researcher thinks is being measured with a one-way ANOVA. 4. With a one-way ANOVA, nothing is proved regardless of what the researcher concludes after analyzing the data. Either a Type I error or a Type II error always will be possible, no matter what decision is made about 5. The purpose of a one-way ANOVA is to gain an insight into the population means, not the sample means. 6. Those researchers who talk about (and possibly test) the assumptions underlying a one-way ANOVA deserve credit for being careful in their utilization of this inferential technique. 7. A decision not to reject the one-way ANOVA’s does not mean that all population means should be considered equal. 8. Those researchers who perform an a priori power analysis or compute effect size indices (following the application of a one-way ANOVA) are doing a more conscientious job than are those researchers who fail to do anything to help distinguish between statistical significance and practical significance. 9. The Bonferroni procedure helps to control the risk of Type I errors in studies where one-way ANOVAs are conducted on two or more dependent variables. 10. The df values associated with a one-way ANOVA (whether presented in an ANOVA summary table or positioned next to the calculated F-value in the text of the research report) can be used to determine the number of groups and the total number of participants involved in the study.