36086 2pgs within 6hrs

profileltdprinwival
Referencewk6p6251.docx

Reference/Module

Learning Objectives

•Explain what the x2 goodness-of-fit test is and what it does.

•Calculate a x2 goodness-of-fit test.

•List the assumptions of the x2 goodness-of-fit test.

•Calculate the x2 test of independence.

•Interpret the x2 test of independence.

•Explain the assumptions of the x2 test of independence.

The Chi-Square ( x 2) Goodness-of-Fit test: What It Is and What It Does

The  chi-square ( x 2) goodness-of-fit test  is used for comparing categorical information against what we would expect based on previous knowledge. As such, it tests what are called  observed frequencies  (the frequency with which participants fall into a category) against  expected frequencies  (the frequency expected in a category if the sample data represent the population). It is a nondirectional test, meaning that the alternative hypothesis is neither one-tailed nor two-tailed. The alternative hypothesis for a x2 goodness-of-fit test is that the observed data do not fit the expected frequencies for the population, and the null hypothesis is that they do fit the expected frequencies for the population. There is no conventional way to write these hypotheses in symbols, as we have done with the previous statistical tests. To illustrate the x2 goodness-of-fit test, let's look at a situation in which its use would be appropriate.

chi-square (x2) goodness-of-fit test A nonparametric inferential procedure that determines how well an observed frequency distribution fits an expected distribution.

observed frequencies The frequency with which participants fall into a category.

expected frequencies The frequency expected in a category if the sample data represent the population.

Calculations for the  x 2 Goodness-of-Fit Test

Suppose that a researcher is interested in determining whether the teenage pregnancy rate at a particular high school is different from the rate statewide. Assume that the rate statewide is 17%. A random sample of 80 female students is selected from the target high school. Seven of the students are either pregnant now or have been pregnant previously. The χ2goodness-of-fit test measures the observed frequencies against the expected frequencies. The observed and expected frequencies are presented in  Table 21.1 .

TABLE 21.1 Observed and expected frequencies for χ2 goodness-of-fit example

FREQUENCIES

PREGNANT

NOT PREGNANT

Observed

7

73

Expected

14

66

As can be seen in the table, the observed frequencies represent the number of high school females in the sample of 80 who were pregnant versus not pregnant. The expected frequencies represent what we would expect based on chance, given what is known about the population. In this case, we would expect 17% of the females to be pregnant because this is the rate statewide. If we take 17% of 80 (.17 × 80 = 14), we would expect 14 of the students to be pregnant. By the same token, we would expect 83% of the students (.83 × 80 = 66) to be not pregnant. If the calculated expected frequencies are correct, when summed they should equal the sample size (14 + 66 = 80).

Once the observed and expected frequencies have been determined, we can calculate χ2 using the following formula:

χ2=Σ(O−E)2Eχ2=Σ(O − E)2E

where O is the observed frequency, E is the expected frequency, and Σ indicates that we must sum the indicated fraction for each category in the study (in this case, for the pregnant and not-pregnant groups). Using this formula with the present example, we have

χ2=(7−14)214+(73−66)266=(−7)214+(7)266=4914+4966=3.5+0.74=4.24χ2=(7−14)214+(73−66)266=(−7)214+(7)266=4914+4966=3.5+0.74=4.24

Interpreting the  χ 2 Goodness-of-Fit Test

The null hypothesis is rejected if the χ2obtχobt2 is greater than the χ2CVχCV2. The χ2CVχCV2 is found in the x2 table in  Appendix A  at the back of the book ( Table A.7 ). In order to use the table, you need to know the degrees of freedom for the x2 test. This is the number of categories minus 1. In our example, we have two categories (pregnant and not pregnant); thus, we have 1 degree of freedom. At α =.05, then,χ2CVχCV2= 3.84, according to  Table A.7 . Ourχ2obtχobt2of 4.24 is larger than the critical value, so we can reject the null hypothesis and conclude that the observed frequency of pregnancy is significantly lower than expected by chance. In other words, the female teens at the target high school have a significantly lower pregnancy rate than would be expected based on the statewide rate. In APA style, this would be reported as x2 (1, N = 80) = 4.24, p <.05. You can see how to use SPSS or the TI-84 calculator to conduct this statistical test in the Statistical Software Resources section at the end of this chapter.

Assumptions and Appropriate Use of the  χ 2 Goodness-of-Fit Test

Although the χ2 goodness-of-fit test is a nonparametric test and therefore less restrictive than a parametric test, it does have its own assumptions. First, the test is appropriate for nominal (categorical) data. If data are measured on a higher scale of measurement, they can be transformed to a nominal scale. Second, the frequencies in each cell should not be too small. If the frequency in any cell is too small (<5), then the χ2 test should not be conducted. Lastly, to be generalizable to the population, the sample should be randomly selected and the observations must be independent. In other words, each observation must be based on the score of a different subject.

Chi-Square ( x 2) test of Independence: What It Is and What It Does

The logic of the  chi-square  χ 2 test of independence  is the same as for any χ2 statistic—we are comparing how well an observed breakdown of people over various categories fits some expected breakdown (such as an equal breakdown). In other words, a χ2 test compares an observed frequency distribution to an expected frequency distribution. If we observe a difference, we determine whether the difference is greater than what would be expected based on chance. The difference between the χ2 test of independence and the χ2 goodness-of-fit test is that the goodness-of-fit test compares how well an observed frequency distribution of one nominal variable fits some expected pattern of frequencies, whereas the test of independence compares how well an observed frequency distribution of two nominal variables fits some expected pattern of frequencies. The formula we use is the same as for the χ2goodness-of-fit test described previously:

chi-square χ2 test of independence A non-parametric inferential test used when frequency data have been collected to determine how well an observed breakdown of people over various categories fits some expected breakdown.

χ2=Σ(O−E)2Eχ2=Σ(O−E)2E

The null hypothesis and the alternative hypothesis are similar to those used with the t tests. The null hypothesis is that there are no observed differences in frequency between the groups we are comparing; the alternative hypothesis is that there are differences in frequency between the groups and that the differences are greater than we would expect based on chance.

Calculations for the  x 2 Test of Independence

As a means of illustrating the x2 test of independence, imagine that teenagers in a randomly chosen sample are categorized as having been employed as babysitters or never having been employed in this capacity. The teenagers are then asked whether they have ever taken a first aid course. In this case, we would like to determine whether babysitters are more likely to have taken first aid than those who have never worked as babysitters. Because we are examining the observed frequency distribution of two nominal variables (babysitting and taking a first aid class), the x2 test of independence is appropriate. We find that 65 of the 100 babysitters have had a first aid course and 35 of the babysitters have not. In the non-babysitter group, 43 out of 90 have had a first aid course, and the remaining 47 have not.  Table 21.2  is a contingency table showing the observed and expected frequencies.

To determine the expected frequency for each cell, we use this formula:

E=(RT)(CT)NE=(RT)(CT)N

where RT is the row total, CT is the column total, and N is the total number of observations. Thus, the expected frequency for the upper left cell in  Table 21.2  would be

E=(100)(108)190=10,800190=56.84E=(100)(108)190=10,800190=56.84

The expected frequencies appear in parentheses in  Table 21.2 . Notice that the expected frequencies when summed equal 190, the N in the study. Once we have the observed and expected frequencies, we can calculate x2:

χ2=Σ(O−E)2E=(65−56.84)256.84+(35−43.16)243.16+(43−51.16)251.16+(47−38.84)38.84=1.17+1.54+1.30+1.71=5.72χ2=Σ(O−E)2E=(65−56.84)256.84+(35−43.16)243.16+(43−51.16)251.16+(47−38.84)38.84=1.17+1.54+1.30+1.71=5.72

TABLE 21.2 Observed and expected frequencies for babysitters and non-babysitters having taken a first aid course

TAKEN FIRST AID COURSE

Yes

No

Row Totals

Babysitters

65 (56.84)

35 (43.16)

100

Non-babysitters

43 (51.16)

47 (38.84)

90

Column Totals

108

82

190

Interpreting the  χ 2 Test of Independence

The degrees of freedom for this x2 test are equal to (r − 1)(c − 1), where r stands for the number of rows and c stands for the number of columns. In our example, this would be (2 − 1)(2 − 1) = 1. We now refer to  Table A.7  in  Appendix A  to identifyχ2CVχCV2for df = 1. At the .05 level,χ2CVχCV2= 3.841. Our χ2obtχobt2 of 5.72 exceeds the critical value, and we reject the null hypothesis. In other words, there is a significant difference between babysitters and non-babysitters in terms of their having taken a first aid class—significantly more babysitters have taken a first aid class. If you were to report this result in APA style, it would appear as x2(1, N = 190) = 5.72, p <.025. You can see how to use SPSS or the TI-84 calculator to conduct this statistical test in the Statistical Software Resources section at the end of this chapter.

Effect Size: Phi Coefficient

As with many of the statistics discussed in previous chapters, we can also compute the effect size for a χ2 test of independence. For a 2 × 2 contingency table, we use the  phi coefficient  (ϕ), where

phi coefficient An inferential test used to determine effect size for a chi-square test.

ϕ=√χ2Nϕ=χ2N

In our example, this would be

ϕ=√5.72190=√0.03=0.17ϕ=5.72190=0.03=0.17

Cohen's (1988) specifications for the phi coefficient indicate that a phi coefficient of .10 is a small effect, .30 is a medium effect, and .50 is a large effect. Our effect size is small. Hence, even though the χ2 was significant, there was not a large effect size. In other words, the difference observed in whether a teenager had taken a first aid class was not strongly accounted for by being a babysitter. Why do you think the χ2 was significant even though the effect size was small? If you attributed this to the large sample size, you were correct.

Assumptions of the  χ 2 Test of Independence

The assumptions underlying the χ2 test of independence are the same as those noted previously for the χ2 goodness-of-fit test:

•The sample must be random.

•The observations must be independent.

•The data are nominal.

x2 TESTS

x2 Goodness-of-Fit Test

x2 Test of Independence

What It Is

A nonparametric test comparing observed frequencies on one nominal variable to expected frequencies based on population data

A nonparametric test comparing observed to expected frequencies for a two-group between-subjects design

What It Does

Will identify how well an observed frequency distribution of one nominal variable fits some expected pattern of frequencies

Will identify differences in frequency on two variables between groups

Assumptions

•Random sample

•Independent observations

•Nominal data

•Random sample

•Independent observations

•Nominal data

1.How do the χ2 tests differ in use from a t test?

2.Why are the χ2 tests nonparametric tests, and what does this mean?

REVIEW OF KEY TERMS

chi-square  χ 2 goodness-of-fit test  (p.  355 )

chi-square  χ 2 test of independence  (p.  357 )

expected frequencies  (p.  355 )

observed frequencies  (p.  355 )

phi coefficient  (p.  359 )

MODULE EXERCISES

(Answers to odd-numbered questions appear in  Appendix B .)

1.When is it appropriate to use a χ2 test?

2.What is the difference between the χ2 goodness-of-fit test and the χ2 test of independence?

3.A researcher believes that the percentage of people who exercise in California is greater than the national exercise rate. The national rate is 20%. The researcher gathers a random sample of 120 individuals who live in California and finds that the number who exercise regularly is 31 out of 120.

a.What isχ2obtχobt2?

b.What is (are) the df for this test?

c.What isχ2CVχCV2?

d.What conclusion should be drawn from these results?

4.A teacher believes that the percentage of students at her high school who go on to college is greater than the rate in the general population of high school students. The rate in the general population is 30%. In the most recent graduating class at her high school, the teacher found that 90 students graduated and that 40 of those went on to college.

a.What isχ2obtχobt2?

b.What is (are) the df for this test?

c.What isχ2CVχCV2?

d.What conclusion should be drawn from these results?

5.You notice in your introductory psychology class that more women tend to sit up front and more men in the back. In order to determine whether this difference is significant, you collect data on the seating preferences for the students in your class. The data follow:

Males

Females

Front of the Room

15

27

Back of the Room

32

19

a.What isχ2obtχobt2?

b.What is (are) the df for this test?

c.What isχ2CVχCV2?

d.What conclusion should be drawn from these results?

6.What is the phi coefficient? Calculate phi for the previous x2 test.

CRITICAL THINKING CHECK ANSWERS

Critical Thinking Check 21.1

1.The χ2 test is a nonparametric test used with nominal (categorical) data. It examines how well an observed frequency distribution of one or two nominal variables fits some expected pattern of frequencies. The t test is a parametric test for use with interval and ratio data.

2.A nonparametric test is one that does not involve the use of any population parameters, such as the mean and standard deviation. In addition, a non-parametric test does not assume a bell-shaped distribution. The χ2 tests are nonparametric because they fit this definition.

MODULE 22

Tests for Ordinal Data

Learning Objectives

•Calculate Wilcoxon's rank-sum test.

•Interpret Wilcoxon's rank-sum test.

•Explain the assumptions of the Wilcoxon's rank-sum test.

•Calculate Wilcoxon's matched-pairs signed-ranks T test.

•Interpret Wilcoxon's matched-pairs signed-ranks T test.

•Explain the assumptions of the Wilcoxon's matched-pairs signed-ranks T test.

•Calculate the Kruskal-Wallis test.

•Interpret the Kruskal-Wallis test.

•Explain the assumptions of the Kruskal-Wallis test.

•Calculate the Friedman test.

•Interpret the Friedman test.

•Explain the assumptions of the Friedman test.

In this module, we discuss two Wilcoxon tests. The  Wilcoxon rank-sum test  is similar to the independent-groups t test, and the  Wilcoxon matched-pairs signed-ranks  T  test  is similar to the correlated-groups t test. The Wilcoxon tests, however, are nonparametric tests. As such, they use ordinal data rather than interval-ratio data and allow us to compare the medians of two populations instead of the means. In addition, we will discuss extensions of each of the above tests for use with more than two groups —the  Kruskal-Wallis test  for between-subjects designs and the  Friedman test  for correlated-groups designs.

Wilcoxon rank-sum test A nonparametric inferential test for comparing sample medians of two independent groups of scores.

Wilcoxon matched-pairs signed-ranks T test A nonparametric inferential test for comparing sample medians of two dependent or related groups of scores.

Kruskal-Wallis test An inferential nonparametric test used to determine differences between three or more groups on a ranked variable for a between-subjects design.

Friedman test An inferential nonparametric test used to determine differences between three or more conditions on a ranked variable for a correlated-groups design.

Wilcoxon Rank-Sum Test: What It Is and What It Does

Imagine that a teacher of fifth-grade students wants to compare the number of books read per term by female versus male students in her class. Rather than reporting the data as the actual number of books read (interval-ratio data), she ranks the female and male students, giving the student who read the fewest books a rank of 1 and the student who read the most books the highest rank. She does this because the distribution representing number of books read is skewed (not normal). She predicts that the girls will read more books than boys. Thus, H0 is that the median number of books read by girls is less than or equal to the number of books read by boys (Mdgirls ≤ Mdboys), and Ha is that the median number of books read is greater for girls than for boys (Mdgirls > Mdboys). The number of books read by each group and the corresponding rankings are presented in  Table 22.1 .

TABLE 22.1 Number of books read and corresponding rank for female and male students

GIRLS

BOYS

X

Rank

X

Rank

20

4

10

1

24

8

17

2

29

9

23

7

33

10

19

3

57

12

22

6

35

11

21

5

Σ = 24

Calculations for the Wilcoxon Rank-Sum Test

As a check to confirm that the ranking has been done correctly, the highest rank should be equal to n1 + n2; in our example, n1 + n2 = 12 and the highest rank is also 12. In addition, the sum of the ranks should equal N(N + 1)/2, where N represents the total number of people in the study. In our example, this is 12(12 + 1)/2. This calculates to 78. If we sum the ranks (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12), they also equal 78. Thus, the ranking was done correctly.

The Wilcoxon test is completed by first summing the ranks for the group expected to have the smaller total. As the teacher expects the males to read less, she sums their ranks. This sum, as seen in the  Table 22.1 , is 24.

Interpreting the Wilcoxon Rank-Sum Test

Using  Table A.8  in  Appendix A , we see that for a one-tailed test at the .05 level, if n1 = 6 and n2 = 6, the maximum sum of the ranks in the group expected to be lower is 28. If the sum of the ranks of the group expected to be lower (the males in this situation) exceeds 28, then the result is not significant. Please note that this is the only statistic that we have discussed so far where the obtained value needs to be equal to or less than the critical value in order to be statistically significant. When using this table, n1 is always the smaller of the two groups; if the n’s are equal, it does not matter which is n1 and which is n2. Moreover,  Table A.8  presents the critical values for one-tailed tests only. If a two-tailed test is used, the table can be adapted by dividing the alpha level in half. In other words, we would use the critical values for the .025 level from the table in order to determine the critical value at the .05 level for a two-tailed test. We find that the sum of ranks of the group predicted to have lower scores (24) is less than the cutoff for significance. Our conclusion is to reject the null hypothesis. In other words, we observed that the ranks in the two groups differed, there were not an equal number of high and low ranks in each group, or one group (the girls in this case) read significantly more books than the other. If we were to report this in APA style, it would appear as follows: Ws(n1 = 6, n2 = 6) = 24, p < .05. You can see how to use SPSS to conduct this statistical test in the Statistical Software Resources section at the end of this chapter.

Assumptions of the Wilcoxon Rank-Sum Test

The Wilcoxon rank-sum test is a nonparametric procedure that is analogous to the independent-groups t test. The assumptions of the test are as follows:

•The data are ratio, interval, or ordinal in scale, all of which must be converted to ranked (ordinal) data before the test is conducted.

•The underlying distribution is not normal.

•The observations are independent.

If the observations are not independent (a correlated-groups design), then the Wilcoxon matched-pairs signed-ranks T test should be used.

Wilcoxon Matched-Pairs Signed-Ranks  T  Test: What It Is and What It Does

Imagine that the same teacher in the previous problem wants to compare the number of books read by all students (female and male) over two terms. During the first term, the teacher keeps track of how many books each student reads. During the second term, the teacher institutes a reading reinforcement program through which students can earn prizes based on the number of books read. The number of books read by students is once again measured. As before, the distribution representing number of books read is skewed (not normal). Thus, a nonparametric statistic is necessary. However, in this case, the design is within-subjects —two measures are taken on each student, one before the reading reinforcement program is instituted and one after the program is instituted.

Table 22.2  shows the number of books read by the students across the two terms. Notice that the number of books read during the first term represents the data used in the previous Wilcoxon rank-sum test. The teacher uses a one-tailed test and predicts that students will read more books after the reinforcement program is instituted. Thus, H0 is that the median number of books read is the same or lower after the introduction of the reading reinforcement program (Mdbefore ≥ Mdafter), and Ha is that the median number of books read is greater after the reinforcement program is instituted (MDbefore<MDafter)

TABLE 22.2 Number of books read in each term

TERM 1 (NO REINFORCEMENT)

TERM 2 (REINFORCEMENT IMPLEMENTED)

DIFFERENCE SCORE (D) (TERM 1 – TERM 2)

RANK

SIGNED RANK

X

X

10

15

−5

4.5

−4.5

17

23

−6

6

−6

19

20

−1

1.5

−1.5

20

20

0

21

28

−7

8

−8

22

26

−4

3

−3

23

24

−1

1.5

−1.5

24

29

−5

4.5

−4.5

29

37

−8

10

−10

33

40

−7

8

−8

57

50

7

8

8

35

55

−20

11

−11

+Σ = 8 -Σ = 58

Calculations for the Wilcoxon Matched-Pairs Signed-Ranks  T  Test

The first step in completing the Wilcoxon signed-ranks test is to compute a difference score for each individual. In this case, we have subtracted the number of books read in Term 2 from the number of books read in Term 1. Keep in mind the logic of a matched-pairs test. If the reinforcement program had no effect, we would expect all of the difference scores to be 0 or very close to 0. Columns 1–3 in  Table 22.2  represent the number of books read in each term and the difference scores. Next, we rank the absolute value of each difference score. This is shown in column 4 of  Table 22.2 . Notice that the difference score of zero is not ranked. Also note what happens when ranks are tied— for example, there are two difference scores of 1. These difference scores take positions 1 and 2 in the ranking; they are each given a rank of 1.5 (halfway between the ranks of 1 and 2), and the next rank assigned is 3. As a check, the highest rank should equal the number of ranked scores. In our problem, we ranked 11 difference scores; thus, the highest rank should be 11, and it is.

Once the ranks have been determined, we attach to each rank the sign of the previously calculated difference score. This is represented in the last column of  Table 22.2 . The final step necessary to complete the Wilcoxon signed-ranks test is to sum the positive ranks and then sum the negative ranks. Once again, if there is no difference in number of books read across the two terms, we would expect the sum of the positive ranks to equal or be very close to the sum of the negative ranks. The sums of the positive and negative ranks are shown at the bottom of the last column in  Table 22.2 .

For a two-tailed test, the Tobt is equal to the smaller of the summed ranks. Thus, if we were computing a two-tailed test, our Tobt would equal 8. However, our test is one-tailed; the teacher predicted that the number of books read would increase during the reinforcement program. For a one-tailed test, we predict whether we expect more positive or negative difference scores. Because we subtracted Term 2 (the term in which they were reinforced for reading) from Term 1, we would expect more negative differences. The Tobt for a one-tailed test is the sum of the signed ranks predicted to be smaller. In this case, we would predict the summed ranks for the positive differences to be smaller than that for negative differences. Thus, Tobt for a one-tailed test is also 8.

Interpreting the Wilcoxon Matched-Pairs Signed-Ranks  T  Test

Using  Table A.9  in  Appendix A , we see that for a one-tailed test at the .05 level with N = 11 (we use N = 11 and not 12 because we ranked only 11 of the 12 difference scores), the maximum sum of the ranks in the group expected to be lower is 13. If the sum of the ranks for the group expected to be lower exceeds 13, then the result is not significant. Please note that, as with the previous Wilcoxon rank-sum test, the obtained value needs to be equal to or less than the critical value in order to be statistically significant. Our conclusion is to reject the null hypothesis. In other words, we observed that the sum of the positive versus the negative ranks differed, or the number of books read in the two conditions differed; significantly more books were read in the reinforcement condition than in the no-reinforcement condition. If we were to report this in APA style, it would appear as follows: T(N = 11) = 8, p < .05. You can see how to use SPSS to conduct this statistical test in the Statistical Software Resources section at the end of this chapter.

Assumptions of the Wilcoxon Matched-Pairs Signed-Ranks  T  Test

The Wilcoxon matched-pairs signed-ranks T test is a nonparametric procedure that is analogous to the correlated-groups t test. The assumptions of the test are as follows:

•The data are ratio, interval, or ordinal in scale, all of which must be converted to ranked (ordinal) data before the test is conducted.

•The underlying distribution is not normal.

•The observations are dependent or related (a correlated-groups design).

WILCOXON TESTS

TYPE OF TEST

Wilcoxon Rank-Sum Test

Wilcoxon Matched-Pairs Signed-Ranks T Test

What It Is

A nonparametric test for a two-group between-subjects design

A nonparametric test for a two-group correlated-groups (within- or matched-subjects) design

What It Does

Will identify differences in ranks on a variable between groups

Will identify differences in signed-ranks on a variable for correlated groups

Assumptions

•Ordinal data

•Distribution is not normal

•Independent observations

•Ordinal data

•Distribution is not normal

•Dependent or related observations

1.I have recently conducted a study in which I ranked my subjects (college students) on height and weight. I am interested in whether there are any differences in height and weight depending upon whether the subject is an athlete (defined by being a member of a sports team) or not an athlete. Which statistic would you recommend using to analyze these data? If the actual height (in inches) and weight (in pounds) data were available, what statistic would be appropriate?

2.Determine the difference scores and ranks for the following set of matched-pairs data. Finally, calculate T for these data, and determine whether the T score is significant for a two-tailed test.

Subject

Score 1

Score 2

1

12

15

2

10

9

3

15

14

4

17

23

5

17

16

6

22

19

7

20

30

8

22

25

Beyond the Wilcoxon Tests

Just as the ANOVA is an extension of a t test, the Wilcoxon tests have an analogous extension. For a between-subjects design with more than two groups, the Kruskal-Wallis test is appropriate. As with the Wilcoxon tests, the Kruskal-Wallis test is appropriate when the data are ordinal or if the involved populations are not normally distributed or homogeneity of variance is in question. The analogous test (more than two groups, ordinal data, and/or skewed distributions) for use with a correlated-groups design is the Friedman test.

The Kruskal-Wallis Test: What It Is and What It Does

As with a one-way between-subjects ANOVA, the Kruskal-Wallis test is used in situations where there is one independent variable with three or more levels and the design is between-subjects. However, unlike the ANOVA, the Kruskal-Wallis test is used either when the data are ordinal or when the assumptions of population normality or homogeneity of variance are questionable. The data used to calculate the Kruskal-Wallis test are always ordinal (ranked).

Let's use a study similar to that from  Module 14  in which a researcher is studying the difference in recall performance between three different types of rehearsal (rote, imagery, and story). Subjects are given a list of 15 words to learn utilizing one of three rehearsal types. Because the data are interval-ratio in scale, when we studied this problem in  Module 14 we used the one-way between-subjects ANOVA to analyze the data. In this situation, however, the researcher is worried that the data are somewhat skewed, thus affecting the normality and homogeneity of variances assumptions associated with the ANOVA test. The researcher therefore correctly decides to use the Kruskal-Wallis test. The number of words recalled by the subjects in each condition appears in  Table 22.3 .

Calculations for the Kruskal-Wallis Test  In order to use the Kruskal-Wallis test, the number of words recalled must be converted to an ordinal ranking scale. Thus, each of the subjects’ recall scores is converted to a rank, with the lowest score receiving a rank of 1, the second-lowest a rank of 2, etc. When two subjects recall the same number of words— for example, the two subjects who both recalled two words in the Rote condition— they take positions 1 and 2 in the ranking and are each given a rank of 1.5. This is illustrated in  Table 22.4 .

TABLE 22.3 Number of words recalled correctly in rote, imagery, and story rehearsal conditions

ROTE

IMAGERY

STORY

2

6

10

4

8

9

3

9

12

2

7

8

8

6

4

4

11

13

TABLE 22.4 Number of words recalled correctly and corresponding ranks in rote, imagery, and story rehearsal conditions

ROTE

RANK

IMAGERY

RANK

STORY

RANK

2

1.5

6

7.5

10

15

4

5

8

11

9

13.5

3

3

9

13.5

12

17

2

1.5

7

9

8

11

8

11

6

7.5

4

5

4

5

11

16

13

18

T = 27

T = 64.5

T = 79.5

As a check to confirm that the ranking has been completed correctly, the highest rank should be equal to n1 + n2 + n3; in our example n1 + n2 + n3 = 18, and the highest rank is also 18. In addition, the sum of the ranks should equal N(N + 1)/2, where N is the total number of people in the study. In our example, 18(18 + 1)/2 = 171, and if we add the ranks (1.5 + 1.5 + 3 + 5 + 5 + 5 + 7.5 + 7.5 + 9 + 11 + 11 + 11 + 13.5 + 13.5 + 15 + 16 + 17 + 18), they sum to 171 also. Thus the ranking was completed correctly. The null hypothesis for the Kruskal-Wallis test states that there are no systematic differences in the ranks between conditions, whereas the alternative hypothesis states that we expect systematic differences in the ranks between conditions.

Once the ranks have been determined, the ranks within each condition are summed or totaled, as can be seen in  Table 22.4  in which the symbol T has been used to indicate the sum of the ranks in each condition. The T values will be used in the calculation of the Kruskal-Wallis test. The formula used for the Kruskal-Wallis test produces a chi-square statistic and is calculated as follows:

χ2=12N(N+1)(∑T2n)−3(N+1)χ2=12N(N+1)(∑T2n)−3(N+1)

We can now use the data from  Table 22.4  to calculate the chi-square value.

χ2=1218(18+1)(2726+64.526+79.526)−3(18+1)=12(18)(19)(7296+4,160.256+6,320.256)−3(19)=12342(121.5+693.38+1,053.38)−57=.035(1,868.26)−57=.035(1,868.26)−57χ2=1218(18+1)(2726+64.526+79.526)−3(18+1)=12(18)(19)(7296+4,160.256+6,320.256)−3(19)=12342(121.5+693.38+1,053.38)−57=.035(1,868.26)−57=.035(1,868.26)−57

=65.39−57=8.39= 65.39−57=8.39

Interpreting the Kruskal-Wallis Test  The degrees of freedom for the Kruskal-Wallis test are calculated in a similar manner to the degrees of freedom for a chi-square test; in other words, the number of conditions minus 1. In this example we have df  =  3 − 1 = 2. In addition, the Kruskal-Wallis test is based on approximately the same distribution as the chi-square test, so we can use the chi-square probability table to determine whether the obtained statistic is significant. Thus, referring to  Table A.7  in  Appendix A , we find that theχ2CVχCV2when df = 2 is 5.992. Because ourχ2obtχobt2= 8.39 and exceeds the critical value, we can reject H0. Thus, there are significant differences in number of words recalled between the three rehearsal type conditions. In addition, because ourχ2obtχobt2is so large we should check the critical values at the .01 and .005 levels provided in  Table A.7 . When we check these we find that X2 = 8.39(2, N = 18), p <.005.

Assumptions of the Kruskal-Wallis Test  The assumptions of the Kruskal-Wallis test are that the data are ranked (ordinal), the distribution is not normal, there are at least five scores in each condition, and that the observations are independent (there are different subjects in each condition).

The Friedman Test: What It Is and What It Does

If the Kruskal-Wallis is the nonparametric version of the one-way between-subjects ANOVA, the Friedman test is the nonparametric version of the oneway repeated measures ANOVA and is used in situations where there is one independent variable with three or more levels and the design is correlated-groups. As with the Kruskal-Wallis, the Friedman test is used either when the data are ordinal or when the assumptions of population normality or homogeneity of variance have been violated. The data used to calculate the Friedman test are always ordinal (ranked).

Let's use the same study and data as we did earlier with the Kruskal-Wallis test, except in this situation imagine that we used the same subjects in each of the conditions—a within-subjects correlated-groups design. Thus, each subject serves in each of the three rehearsal conditions (rote, imagery, and story) and studies a list of equally difficult words in each condition. The null hypothesis for the Friedman test states that there are no systematic differences in the ranks between conditions, whereas the alternative hypothesis states that we expect systematic differences in the ranks between conditions. The number of words recalled by the subjects in each condition appears in  Table 22.5 .

Calculations for the Friedman Test  In order to use the Friedman test, the number of words recalled across the three conditions for each subject must be converted to an ordinal ranking scale. Thus, each subject's recall scores across the three conditions is converted to a rank, with the lowest score receiving a rank of 1, the second-lowest a rank of 2, and the highest score a rank of 3 for each subject. Therefore, in a study with three conditions, such as this one, the highest rank a single subject's score can receive is 3. If scores within a single subject's data are the same, they split the ranks. For example, if one subject recalled two words in both the Rote and Imagery conditions, that subject would receive a rank of 1.5 in each of those conditions. This situation has not occurred in our data. The ranks for each participant appear in  Table 22.6 .

TABLE 22.5 Number of words recalled correctly in rote, imagery, and story rehearsal conditions

ROTE

IMAGERY

STORY

2

6

10

4

8

9

3

9

12

2

7

8

6

4

8

4

11

13

TABLE 22.6 Number of words recalled correctly and corresponding ranks in rote, imagery, and story rehearsal conditions for each subject

SUBJECT

ROTE

RANK

IMAGERY

RANK

STORY

RANK

1

2

1

6

2

10

3

2

4

1

8

2

9

3

3

3

1

9

2

12

3

4

2

1

7

2

8

3

5

6

2

4

1

8

3

6

4

1

11

2

13

R = 7

R = 11

R = 18

Once the ranks have been determined, the ranks within each condition are summed or totaled, as can be seen in  Table 22.6  in which the symbol R (for Rank) has been used to indicate the sum of the ranks in each condition. The R values will be used in the calculation of the Friedman test. The formula used for the Friedman test produces a chi-square statistic and is calculated as follows:

χ2=12nk(k+1)(ΣR2)−3n(k+1),χ2=12nk(k+1)(ΣR2)−3n(k+1),

where n refers to the number of subjects and k refers to the number of conditions. We can now use the data from  Table 22.6  to calculate the chi-square value.

χ2=126×3(3+1)(ΣR2)−3×6(3+1)=1218(4)(72+112+182)−18(4)=1272(49+121+324)−72=.167(494)−72=82.5−72=10.5χ2=126 × 3(3+1)(ΣR2)−3×6(3+1)=1218(4)(72+112+182)−18(4)=1272(49+121+324)−72=.167(494)−72=82.5−72=10.5

Interpreting the Friedman Test  The degrees of freedom for the Friedman test are calculated in a similar manner to the degrees of freedom for any chi-square test; in other words, the number of conditions minus 1. In this example, we have df = 3 − 1 = 2. In addition, the Friedman test is based on approximately the same distribution as the chi-square test, so we can use the chi-square probability table to determine whether the obtained statistic is significant. Thus, referring to  Table A.7  in  Appendix A , we find that theχ2CVχCV2when df = 2 is 5.992. Because ourχ2obtχobt2= 10.5 and exceeds the critical value, we can reject H0. Thus, there are significant differences in number of words recalled between the three rehearsal type conditions. In addition, because ourχ2obtχobt2 is so large we should check the critical values at the .01 and .005 levels in  Table A.7 . When we check these we find that X2 = 10.50(2, N = 6), p < .005.

Assumptions of the Friedman Test  The assumptions of the Friedman test are that the data are ranked (ordinal), the distribution is not normal, there are at least five scores in each condition, and that the observations are not independent (a correlated-groups design was used).

NONPARAMETRIC TESTS FOR MORE THAN TWO GROUPS

TYPE OF TEST

Kruskal-Wallis Test

Friedman Test

What It Is

A nonparametric test for a between-subjects design with more than two groups

A nonparametric test for a correlated-groups (within- or matched-subjects) design with more than two groups

What It Does

Will identify differences in ranks on a variable between groups

Will identify differences in ranks on a variable for correlated groups

Assumptions

•Ordinal data

•Distribution is not normal

•Independent observations

•More than five observations per condition

•Ordinal data

•Distribution is not normal

•Dependent or related observations

•More than five observations per condition

1.Explain when you would use the Kruskal-Wallis test versus the Friedman test.

2.Determine the ranks for the data below, which represent subject preference for ketchups of differing colors. The preference scores are on a 1–5 scale with a score of 5 indicating the highest preference. In addition, there are five people in each condition for a total of 15 subjects in the experiment.

Ketchup Color

Red

Green

Blue

4

2

5

3

1

3

5

1

4

4

3

5

3

2

4

REVIEW OF KEY TERMS

Friedman test  (p.  362 )

Kruskall-Wallis test  (p.  362 )

Wilcoxon matched-pairs signed-ranks  T  test  (p.  362 )

Wilcoxon rank-sum test  (p.  362 )

MODULE EXERCISES

(Answers to odd-numbered questions appear in  Appendix B .)

1.Explain how the nonparametric tests in this module differ from the chi-square tests discussed in the previous module.

2.Explain when it would be appropriate to use the Wilcoxon rank-sum test versus the Wilcoxon matched-pairs signed-ranks T test.

3.What is the difference in interpretation when comparing the obtained Wilcoxon values to the Wilcoxon critical values compared to all of the other statistics covered in this text?

4.A researcher is interested in comparing the maturity level of students who volunteer for community service versus those who do not. The researcher assumes that those who complete community service will have higher maturity scores. Maturity scores tend to be skewed (not normally distributed). The maturity scores follow. Higher scores indicate higher maturity levels.

No Community Service

Community Service

33

41

41

48

54

61

13

72

22

83

26

55

a.What statistical test should be used to analyze these data?

b.Identify H0 and Ha for this study.

c.Conduct the appropriate analysis.

d.Should H0 be rejected? What should the researcher conclude?

5.Researchers at a food company are interested in how a new spaghetti sauce made from green tomatoes (and green in color) will compare to its traditional red spaghetti sauce. The company is worried that the green color will adversely affect the tastiness scores. It randomly assigns subjects to either the green or red sauce condition.

Subjects indicate the tastiness of the sauce on a 10-point scale. Tastiness scores tend to be skewed. The scores follow:

Red Sauce

Green Sauce

7

4

6

5

9

6

10

8

6

7

7

6

8

9

a.What statistical test should be used to analyze these data?

b.Identify H0 and Ha for this study.

c.Conduct the appropriate analysis.

d.Should H0 be rejected? What should the researcher conclude?

6.Imagine that the researchers in exercise 5 want to conduct the same study as a within-subjects design. Subjects rate both the green and red sauces by indicating the tastiness of the sauce on a 10-point scale. As in the previous problem, researchers are concerned that the color of the green sauce will adversely affect tastiness scores. Tastiness scores tend to be skewed. The scores follow:

Subject

Red Sauce

Green Sauce

1

7

4

2

6

3

3

9

6

4

10

8

5

6

7

6

7

5

7

8

9

a.What statistical test should be used to analyze these data?

b.Identify H0 and Ha for this study.

c.Conduct the appropriate analysis.

d.Should H0 be rejected? What should the researcher conclude?

7.Explain when you would use the Kruskal-Wallis test versus the Wilcoxon rank-sum test.

8.Explain when you would use the Friedman test versus the Wilcoxon matched-pairs signed-ranks T test.

9.Imagine that marketing researchers working for a food company want to determine whether children would prefer ketchup of a different color. They develop red, green, and blue ketchups that all taste the same and have children (N = 7) taste each of the ketchups (a correlated-groups design) and rank them on a 5-point scale with 5 indicating the highest preference. The data from the 7 subjects appear below.

Ketchup Color

Subject

Red

Green

Blue

1

4

2

5

2

3

1

3

3

5

1

4

4

4

3

5

5

3

2

4

6

5

1

3

7

3

1

5

a.What statistical test should be used to analyze these data?

b.Identify H0 and Ha for this study.

c.Conduct the appropriate analysis.

d.Should H0 be rejected? What should the researcher conclude?

CRITICAL THINKING CHECK ANSWERS

Critical Thinking Check 22.1

1.Because the subjects have been ranked (ordinal data) on height and weight, Wilcoxon's rank-sum test would be appropriate. If the actual height (in inches) and weight (in pounds) were reported, the data would be interval-ratio. In this case, the independent-groups t test would be appropriate.

2. 

Subject

Score 1

Score 2

Difference Score

Rank

Signed Rank

1

12

15

−3

5

−5

2

10

9

1

2

2

3

15

14

1

2

2

4

17

23

−6

7

−7

5

17

16

1

2

2

6

22

19

3

5

5

7

20

30

−10

8

−8

8

22

25

−3

5

−5

+Σ = 11 −Σ = 25

T (N = 8) = 11, not significant

Critical Thinking Check 22.2

1.Both tests are nonparametric tests that are used with ordinal data when the distribution is skewed. In addition, both tests are also used when there is one independent variable with more than two levels. The difference is that the Kruskal-Wallis test is used for between-subjects designs and the Friedman test is used for correlated-groups designs.

2. 

Ketchup Color

Red

Rank

Green

Rank

Blue

Rank

4

10.5

2

3.5

5

14

3

6.5

1

1.5

3

6.5

5

14

1

1.5

4

10.5

4

10.5

3

6.5

5

14

3

6.5

2

3.5

4

10.5

CHAPTER TEN SUMMARY AND REVIEW

Nonparametric Procedures

CHAPTER SUMMARY

In this chapter, we discussed nonparametric statistics. Nonparametric tests are those for which population parameters (μ and σ) are not known. In addition, the underlying distribution of scores is assumed to be not normal, and the data are most commonly nominal or ordinal.

We discussed six different nonparametric statistics. The x2 goodness-of-fit test examines how well an observed frequency distribution of one nominal variable fits some expected pattern of frequencies. The x2 test of independence once again compares observed frequencies to expected frequencies. The difference here is that it compares how well an observed frequency distribution of two nominal variables fits some expected pattern of frequencies.

The final four nonparametric statistics covered are used with ordinal data. The Wilcoxon rank-sum test compares ranked data for two groups of different subjects—a between-subjects design—in order to determine whether there are significant differences in the rankings in one group versus the other group. The Wilcoxon matched-pairs signed-ranks T test compares ranked data for a single group of subjects (or two groups of matched subjects) on two measures. In other words, the Wilcoxon matched-pairs signed-ranks T test is used with correlated-groups designs to determine whether there are differences in subjects’ scores across the two conditions in which they served. The Kruskal-Wallis test is used with between-subjects designs when there are more than two groups being compared, whereas the Friedman test is used with correlated-groups designs when there are more than two conditions being compared.

CHAPTER 10 REVIEW EXERCISES

(Answers to exercises appear in  Appendix B .)

Fill-in Self-Test

Answer the following questions. If you have trouble answering any of the questions, restudy the relevant material before going on to the multiple-choice self-test.

1.__________and__________frequencies are used in the calculation of the x2 statistic.

2.The nonparametric inferential statistic for comparing two groups of different people when ordinal data are collected is the__________.

3.When frequency data are collected, we use the__________to determine how well an observed frequency distribution of two nominal variables fits some expected breakdown.

4.Effect size for a chi-square test is determined by using the__________.

5.The Wilcoxon__________test is used with within-subjects designs.

6.The Wilcoxon rank-sum test is used with__________designs.

7.Chi-square tests use__________data, whereas Wilcoxon tests use __________data.

8.The extension of the Wilcoxon rank-sum test for use with more than two groups is the__________.

Multiple-Choice Self-Test

Select the single best answer for each of the following questions. If you have trouble answering any of the questions, restudy the relevant material.

1.Parametric is to nonparametric as__________is to__________.

a.z test; t test

b.t test; z test

c.X2 test; z test

d.t test; x2 test

2.Which of the following is an assumption of x2 tests?

a.It is a parametric test.

b.It is appropriate only for ordinal data.

c.The frequency in each cell should be less than 5.

d.The sample should be randomly selected.

3.The calculation of the df for the_____is (r − 1)(c − 1).

a.independent-groups t test

b.correlated-groups t test

c.x2 test of independence

d.Wilcoxon rank-sum test

4.The____is a measure of effect size for the______.

a.phi coefficient; x2 goodness-of-fit test

b.eta-squared; x2 goodness-of-fit test

c.phi coefficient; x2 test of independence

d.eta-squared; Wilcoxon rank-sum test

5.The Wilcoxon rank-sum test is used with_____data.

a.interval

b.ordinal

c.nominal

d.ratio

6.Wilcoxon rank-sum test is to_____design as Wilcoxon matched-pairs signed-ranks T test is to_____design.

a.between-subjects; within-subjects

b.correlated-groups; within-subjects

c.correlated-groups; between-subjects

d.within-subjects; matched-subjects

7.When using a between-subjects design, and comparing three or more groups on an ordinal variable we would use the:

a.Kruskal-Wallis test

b.Friedman test

c.Wilcoxon rank-sum test

d.Wilcoxon matched-pairs signed-ranks T test.

Self-Test Problems

1.A researcher believes that the percentage of people who smoke in the South is greater than in the nation as a whole. The national rate is 15%. The researcher gathers a random sample of 110 individuals who live in the South and finds that the number who smoke is 21 out of 110.

a.What statistical test should be used to analyze these data?

b.Identify H0 and Ha for this study.

c.Conduct the appropriate analysis.

d.Should H0 be rejected? What should the researcher conclude?

2.You notice at the gym that it appears more women tend to work out together, whereas more men tend to work out alone. In order to determine whether this difference is significant, you collect data on the workout preferences for a sample of men and women at your gym. The data follow:

Males

Females

Together

12

24

Alone

22

10

a.What statistical test should be used to analyze these data?

b.Identify H0 and Ha for this study.

c.Conduct the appropriate analysis.

d.Should H0 be rejected? What should the researcher conclude?

3.Researchers at a food company are interested in how a new ketchup made from green tomatoes (and green in color) will compare to their traditional red ketchup. They are worried that the green color will adversely affect the tastiness scores. They randomly assign subjects to either the green or red ketchup condition. Subjects indicate the tastiness of the sauce on a 20-point scale. Tastiness scores tend to be skewed. The scores follow:

Green Ketchup

Red Ketchup

14

16

15

16

16

19

18

20

16

17

16

17

19

18

a.What statistical test should be used to analyze these data?

b.Identify H0 and Ha for this study.

c.Conduct the appropriate analysis.

d.Should H0 be rejected? What should the researcher conclude?

CHAPTER TEN

Statistical Software Resources

If you need help getting started with Excel or SPSS, please see  Appendix C : Getting Started with Excel and SPSS.

MODULE 21 Chi-Square Tests

CHI-SQUARE GOODNESS-OF-FIT-TEST

Using SPSS

To begin using SPSS to calculate this chi-square test, we must enter the data into the Data Editor. We'll use the problem from  Module 21  to illustrate this test. The data appear in  Table 21.1  in  Module 21 . We have 80 individuals in the sample, thus there will be 80 scores entered. A score of 1 indicates that the individual was pregnant (there are 7 scores of 1 entered), and a score of 2 indicates that the individual was not pregnant (there are 73 scores of 2 entered). A portion of the Data Editor with the data entered appears next. I’ve named the variable Observedpregnancy to indicate that we are interested in the pregnancy rate observed in the sample.

Next, we click on  Analyze, then  Nonparametric Tests, Legacy Dialogs, and finally  Chi-square. This is illustrated in the following screen capture:

This will produce the following dialog box:

Our  Test Variable List is the data in the Observedpregnancy variable. Thus, move the Observedpregnancy variable over to the  Test Variable List using the arrow key. We also have to specify the expected values in the Expected Values box. Click on  Values and then type the first value (14) into the box and click on  Add. Next enter the second expected value (66) and enter it by clicking on  Add. The dialog box should now appear as follows:

Click OK and you should receive the output that appears next.

We can reject the null hypothesis and conclude that the observed frequency of pregnancy is significantly lower than expected by chance. In other words, the female teens at the target high school have a significantly lower pregnancy rate than would be expected based on the statewide rate. In APA style, this would be reported as x2(1, N = 80) = 4.24, p = .039.

Using the TI-84

Let's use the data from  Table 21.1  to conduct the calculation using the TI-84 calculator.

1.With the calculator on, press the STAT key.

2.Highlight EDIT and press ENTER.

3.Enter the Observed Scores in L1 and the Expected Scores in L2.

4.Press the STAT key and highlight TESTS.

5.Scroll down to D: x2 GOF Test and press ENTER.

6.The calculator should show Observed: L1, Expected: L2, and df = 1.

7.Scroll down and highlight Calculate and press ENTER.

The x2 value of 4.24 should appear on the screen, along with the df = 1 and p = .039.

CHI-SQUARE TEST OF INDEPENDENCE

Using SPSS

We'll use the problem from  Module 21  to illustrate this test. The data can be found in  Table 21.2  in  Module 21 . We begin by entering the data into SPSS. We enter the data by indicating whether the individual is a babysitter (1) or not a babysitter (2) and whether they have taken a first aid class (1) or not (2). Thus, a 1, 1 is a babysitter who has taken a first aid class whereas a 2, 1 is a non-babysitter who has taken a first aid class. This coding system is illustrated in the following screen capture of the Data Editor:

Please note that not all of the data are visible in the preceding screen capture, so enter the data from  Table 21.2  in which there is data for 190 individuals.

Next, click on  Analyze, Descriptive Statistics, and then  Crosstabs as indicated in the following screen capture:

This will produce the dialog box seen next.

Next move one of the variables to the Rows box using the arrow keys and move the second variable to the  Columns box (it does not matter which variable is entered into rows versus columns). The dialog box should now look as follows:

Click on  Statistics and check the Chi-square box and click Continue.

Next click on Cells and check the  Observed and  Expected boxes as seen next.

Click Continue and then OK. The output should appear as follows:

We are most interested in the data reported in the Chi-Square Tests box above. The chi-square test of independence is reported as Pearson Chi-Square by SPSS. Thus, based on this, we reject the null hypothesis. In other words, there is a significant difference between babysitters and non-babysitters in terms of their having taken a first aid class—significantly more babysitters have taken a first aid class. If you were to report this result in APA style, it would appear as x2(1, N = 190) = 5.727, p = .019.

Using the TI-84

Let's use the data from  Table 21.2  to conduct the calculation using the TI-84 calculator.

1.With the calculator on, press the 2nd key followed by the MATRIX [X−1] key.

2.Highlight EDIT and 1:[A] and press ENTER.

3.Enter the dimensions for the matrix. Our matrix is 2 × 2. Press ENTER.

4.Enter each observed frequency from  Table 21.2  followed by ENTER.

5.Press the STAT key and highlight TESTS.

6.Scroll down to C: x2-Test and press ENTER.

7.The calculator should show Observed: [A] and Expected: [B].

8.Scroll down and highlight Calculate and press ENTER.

The x2 value of 5.73 (the TI-84 does not round the expected frequencies to whole numbers, thus the chi-square it calculates is slightly larger than that calculated in  Module 21 and by SPSS) should appear on the screen, along with the df = 1 and p = .017.

MODULE 22 Tests for Ordinal Data

WILCOXON RANK-SUM TEST

Let's use the example from  Module 22  in which a teacher of fifth-grade students wants to compare the number of books read per term by female versus male students in her class. The distribution representing number of books read is skewed (not normal), thus a nonparametric test is used ( Table 22.1 ).

Using SPSS

To begin using SPSS to calculate this Wilcoxon rank-sum test, we must enter the data into the Data Editor. This is illustrated in the following screen capture of the Data Editor in which you can see that the first column is labeled Gender, with females as 1’s and males as 2’s. The number of books read by each student is indicated in the second column. These data can be found in  Table 22.1  in  Module 22 .

To conduct the statistical test, begin by clicking on  Analyze, then  Non-parametric Tests, Legacy Dialogs, and finally  2 Independent Samples as is indicated in the following screen capture:

This will produce the following dialog box:

Gender is the  Grouping Variable, so use the arrow key to enter it into the appropriate box. The  Test Variable is BooksRead, so enter it into that box. With the  Grouping Variable box highlighted, click on the  Define Groups…

box to receive the dialog box presented next.

Let SPSS know how the groups are defined, in other words that you have 1’s and 2’s defining your groups, and then click Continue. Your completed dialog box should appear as follows:

Please note that the Test Type selected should be Mann-Whitney; however, the output you receive will have both the Mann-Whitney test and the Wilcoxon test statistics. Now click on OK to receive the following analysis output:

Notice that in the Test Statistics box, both the Mann-Whitney U test and the Wilcoxon W test statistics are provided. We can see that the mean rank for girls is 9, whereas the mean rank for boys is 4. The easiest way to interpret both the Mann-Whitney and Wilcoxon tests is based on the z score provided. Based on this score we can conclude that girls read significantly more books than boys, z = −2.402, p = .0075 (the two-tailed significance level is provided, but because this was a one-tailed test, we divide that in half).

WILCOXON MATCHED-PAIRS SIGNED-RANKS T TEST

Let's use the example from  Module 22  in which the same teacher in the previous problem wants to compare the number of books read by all students (female and male) over two terms to illustrate this test.

Using SPSS

To begin using SPSS to calculate this test statistic, we must enter the data from  Table 22.2  in  Module 22  into the Data Editor, as illustrated in the following screen capture:

Notice that the variables are simply labeled TermI and TermII. Next, click on  Analyze, Nonparametric, Legacy Dialogs, and finally  2 Related Samples… as is indicated in the following screen capture:

This will produce the following dialog box:

Highlight TermI and use the arrow key to move it to the Variable 1 box. Do the same for the TermII variable. In addition, make sure that  Wilcoxon is selected in the Test Typebox. The dialog box should appear as follows:

Next, click on the  Options… box to receive the dialog box illustrated next.

Click  Descriptive and then Continue. Finally, click OK to receive the following analysis output:

We can see that the mean number of books read in Term I is 25.83, whereas the mean number of books read in Term II is 30.58. In order to conduct the Wilcoxon test, the number of books read in Term I is subtracted from the number of books read in Term II for each student. These difference scores are then ranked (an ordinal variable). We can see that there was 1 negative rank, 10 positive ranks, and 1 tie. Based on these results, we can conclude that students read significantly more books in Term II in comparison to Term I, z = −2.229, p = .013 (the two-tailed significance level is provided, but because this was a one-tailed test, we divide that in half).

KRUSKAL-WALLIS TEST

Let's use the example from  Module 22  in which we measured number of words recalled by subjects who used one of three different types of rehearsal.

Using SPSS

To begin using SPSS to calculate this test statistic, we must enter the data from  Table 22.3  in  Module 22  into the Data Editor as illustrated in the following screen capture. In the screen capture, the first variable, RehearsalType, indicates which type of rehearsal subjects used (1 for rote, 2 for imagery, and 3 for story). The variable WordsRecalled shows how many words each subject recalled.

After the data have been entered, click on  Analyze, Nonparametric Tests, Legacy Dialogs, K Independent Samples, as in the following screen capture:

This will produce the following dialog box:

Move the WordsRecalled variable to the  Test Variable List box by using the arrow button and then move the RehearsalType variable to the  Grouping Variable box by using the arrow button.  Kruskal-Wallis H should be checked in the Test Type box. The dialog box should appear as follows:

Now click on the  Define Range box to define the range of the grouping variable. You should receive the following pop-up window:

Enter 1 into the Minimum box and 3 in the Maximum box as appears below.

Click Continue and then OK to produce the following output in which we can see the reported chi-square test statistic—x2 = 8.65 (2, N = 18) p = .013, indicating that there were significant differences in numbers of words recalled across the three conditions.

FRIEDMAN TEST

Let's use the example from  Module 22  in which we measured number of words recalled by subjects who used one of three different types of rehearsal.

Using SPSS

To begin using SPSS to calculate this test statistic we must enter the data from  Table 22.5  in  Module 22  into the Data Editor, as illustrated in the following screen capture:

In the preceding screen capture the data from each condition appear in a separate column in the Data Editor. After the data have been entered, click on  Analyze, Nonparametric Tests, Legacy Dialogs, K Related Samples, as in the following screen capture:

The following dialog box will be produced:

Using the arrow button in the center of the dialog box, move the three levels of the independent variable over to the  Test Variables box. In addition, ensure that the  Friedmanbox is checked under Test Type. The dialog box should appear as below.

Click OK to receive the following output in which we see the chi-square test statistic reported [x2 = 10.33 (2, N = 18) p = 006], indicating that there were significant differences in numbers of words recalled across the three conditions.