Chi-s
iStockphoto/Thinkstock
chapter 11
Nominal Data and the Chi-Square Tests: What Occurs
Versus What Is Expected
Learning Objectives
After reading this chapter, you will be able to. . .
1. describe nominal data.
2. complete and explain the chi-square goodness-of-fit test.
3. complete and explain the chi-square test of independence.
4. present and interpret the results of the two types of chi-square test in proper APA format.
CN
CO_LO
CO_TX
CO_NL
CT
CO_CRD
suk85842_11_c11.indd 407 10/23/13 1:45 PM
CHAPTER 11Section 11.1 Nominal Data
When there was an important development in statistical analysis in the early part of the 20th century, more often than not Karl Pearson was associated with it. Many of those who made important contributions were members of the department that Pearson founded at University College London. William Sealy Gosset, who developed the t-tests; Ronald A. Fisher, who developed analysis of variance; and Charles Spearman, who developed factor analysis, all gravitated to Pearson’s department at some point. Although social relations among these men were not always harmonious, they were enormously productive schol- ars, and this was particularly true of Pearson. Besides the correlation coefficient named for him, Pearson also developed an analytical approach related to Spearman’s factor analysis called principal components analysis, and he developed the procedures that are the sub- jects of this chapter, the chi-square tests. The Greek letter chi (x) is pronounced “kie” like “pie.” Chi is the equivalent of the letter c, rather than the letter x, which it resembles.
11.1 Nominal Data
With the exception of Spearman’s rho in Chapter 9, the attention in Chapters 1 through 10 has been directed at procedures designed for interval or ratio data. Sometimes the data is not interval scale, nor is it the ordinal-scale data that Spearman’s rho accom- modates. When the data is nominal scale, often one of the chi-square (x2) tests is used.
It will be helpful to review what makes data nominal scale. Nominal data either fits into a category or it does not, which is why nominal data is sometimes called categorical or clas- sification data. Because the analysis is based on counting the data, it is also called count or frequency data. Compared to ratio, interval, and even ordinal data, nominal data provides relatively little information. It reveals only the presence or absence of a characteristic, not how much of some characteristic there is, and none of the detail that ordinal and interval data provides. When people are classified according to whether they are
• left-handed or right-handed or • Buddhist, Jewish, Muslim or • African American, Hispanic, or Native American or • blue-eyed or brown-eyed or • introverted or extroverted
they are placed into nominal data categories. In other words, no ranking is involved.
Parameters and Tests for Nominal Data
Any statistical procedure is developed to accommodate a particular kind of data. Because nominal data can only be counted, any tests for nominal data—chi-square tests in this instance—must be based on how many or what proportion of individuals there are in a particular category. Nominal data does not reflect the degree of what is counted, only whether that characteristic is present. The measurement procedure is counting.
H1
TX_DC
BLF
TX
BL
BLL
suk85842_11_c11.indd 408 10/23/13 1:45 PM
CHAPTER 11Section 11.2 The Chi-Square Tests
Counting the number that occurs in a particular category makes none of the traditional statistics relevant. Means, standard deviations, and sums-of-squares values all require that measurements reflect the relative amount of the characteristic measured. Data for those statistics must be continuous, with some individuals manifesting more of the char- acteristic than others. There is no issue made of normality, for example, because there are neither means nor medians to compare with the mode to determine skew, and neither the standard deviation nor the range to help determine kurtosis. The tests for nominal data are nonparametric tests; they are tests that do not need to meet the traditional parameters for normality, for example, or for linearity. In this regard, they are like Spearman’s rho (Chapter 9), which is also a nonparametric test. Such tests give the analyst great flexibility because the requirements that must be met are so few, although there is a cost for the flex- ibility, as shall be discussed.
11.2 The Chi-Square Tests There are two chi-square tests explained in this chapter. Both of them compare the fre- quency (count) that is observed to the frequency that is expected to occur.
The first test is called the 1 3 k (“one by kay”), or the goodness-of-fit chi-square test. Like the independent variable in the one-way ANOVA, this test accommodates just one vari- able, but it can be divided into two or more categories or groups.
The second chi-square test is called the r 3 k (“are by kay”), or the chi-square test of inde- pendence. This test accommodates two variables. Each of the two variables can be further divided into any number of categories.
The Goodness-of-Fit, or 1 3 k Chi-Square Test
The question with this test is whether what occurs is different enough from some initial hypothesis that the difference is not likely to have occurred by chance. In that regard, it sounds like most of the other statistical tests of significant difference and, except for the scale of the data involved, there are some important similarities. For the sake of illustra- tion, consider the following situations:
• People in the public relations office responsible for advertising university events wonder whether people attend men’s basketball and women’s basketball games in equal proportions.
• The variable is attendance at games, with two categories: men and women. • The null hypothesis is that attendance is equally distributed between the two
groups. • The test question is whether any differences in attendance between men’s and
women’s games are great enough that they probably did not occur by chance.
The two independent groups make this problem sound almost like an independent t-test. In this test, however, there is no comparison of means of the two groups because nominal data does not yield means. Instead, there is a comparison of proportions (or frequency) where the chi-square goodness-of-fit test is appropriate.
suk85842_11_c11.indd 409 10/23/13 1:45 PM
CHAPTER 11Section 11.2 The Chi-Square Tests
Those analyzing voting patterns wonder whether voter turnout among people of different ethnicities is similar.
• The one variable is ethnicity, which can have any number of categories. • The hypothesis is that voting percentages are equal for the various ethnic groups. • If the groups analyzed happen to be Asian Americans, Native Americans, and
Caucasians, the issue will be whether there is significant difference between the number voting in each group and the number expected if turnout is equal that the difference is not likely to have occurred by chance.
With several ethnic groups represented, this could be a one-way ANOVA, but the data is nominal, so the chi-square goodness-of-fit is also appropriate.
A fast-food chain develops a promotion to sell a particular sandwich at half price with the expectation that customers will buy not twice as many but three times as many of the sandwiches as when they were sold at the regular price last week.
• Here the variable is sandwich sales, which has only two categories—last week at regular price and this week at half price.
• The hypothesis in this case is that sales this week will be three times what they were last week.
• Here too, the test will compare actual sandwich sales to what is expected if the initial hypothesis is accurate.
This, too, looks like some variation on the t-test, although the expecta- tion that one week’s results will be three times the other week’s results is a twist.
Besides the scale of the data involved, the other factor that distin- guishes the preceding problems from t-tests and ANOVA problems is that each involves just one variable rather than an independent vari- able and a dependent variable. With just one variable, although that
variable can have multiple categories, each of these problems lends itself to chi-square analysis and to the goodness-of-fit test in particular. The measurement involved in each case is a matter of sorting those in the sample groups into their groups and then compar- ing the frequencies or counts in the various groups.
Frequencies Observed and Expected
Because the measurement involved in chi-square analysis is counting, the unit of analysis is the frequency with which something occurs. Rather than comparing sample means to population values, or sample means to each other, the comparison is between the fre- quency with which an outcome can be expected to occur and the frequency with which it actually occurs.
• The frequency expected is symbolized fe. • The frequency observed is symbolized fo.
A How many variables will the 1 3 k chi-square accommodate?
Try It!
suk85842_11_c11.indd 410 10/23/13 1:45 PM
CHAPTER 11Section 11.2 The Chi-Square Tests
The fe and fo values are simply the number of observations in each category; they are frequency counts. When what is expected varies sufficiently from what is observed, the result is statistically significant.
The Chi-Square Test Statistic
Studying the test statistic for the chi-square test is quite revealing. It is as follows:
x2 5 a 1 fo 2 fe2 2
fe Formula 11.1
Where
x2 5 the value of the chi-square statistic
fo 5 the frequency observed in the particular category
fe 5 the frequency expected in the particular category
In terms of calculating the value of this statistic, start with these two steps:
1. Count the number in each category ( fo). 2. Determine the number expected in each category ( fe). When the assumption is
that all categories are equal, this will be the total number of subjects divided by the number of categories.
And then perform these mathematical operations in the following order:
1. Subtract fe from fo. 2. Square the difference. 3. Divide the squared difference by fe. 4. Sum the squared differences divided by fe across the categories. 5. Compare to the critical value of chi-square for the number of categories, minus 1
degree of freedom. (The critical values of chi-square appear in Table 11.2.) 6. As a quick check before continuing, make sure that the sum of the fe categories
equals the sum of the fo categories.
A Goodness-of-Fit Problem
Using voting participation as an example, perhaps a researcher in an ethnically diverse part of the city wishes to test whether the voters who identify with different ethnic groups tend to vote in similar proportions. The researcher stands at a polling station in the most ethnically diverse part of town and administers a brief survey. One of the questions is about the respondents’ ethnic group membership. For the 18 people who completed the survey, the data follows:
Ethnic Group A: 5
Ethnic Group B: 3
suk85842_11_c11.indd 411 10/23/13 1:45 PM
CHAPTER 11Section 11.2 The Chi-Square Tests
Ethnic Group C: 2
Ethnic Group D: 8
Although the math involved is not difficult, there are several calculations. An easy way to keep track of them is to arrange the data into a table like Table 11.1. The rows are numbered to be consistent with the numbered steps listed here for calculating the statis- tic. The results from the survey are the frequency-observed values in the first line of the table. The frequency-expected values are n divided by the number of categories, which is 18 4 4 5 4.50. That value indicates that if the ethnic group membership of the voters in this group is exactly equivalent, 41⁄2 of the respondents will declare for each group. Do not be distracted by the 1⁄2 value in each fe. Although there is no chance of any such value in the fo numbers, that fe value is the same for all groups and the issue is whether the fo 2 fe differences are significantly different from category to category.
Table 11.1: A goodness-of-fit chi-square problem for voting patterns
Value Ethnic Group A Ethnic Group B Ethnic Group C Ethnic Group D
fo 5 3 2 8
fe 4.5 4.5 4.5 4.5
fo 2 fe 0.5 –1.5 –2.5 3.5
( fo 2 fe) 2 0.25 2.25 6.25 12.25
( fo 2 fe) 2/fe 0.06 0.5 1.39 2.72
a 5 0.06 1 0.5 1 1.39 1 2.72 5 4.67
Notes: Ethnic Group A: 5 voted; Ethnic Group B: 3 voted; Ethnic Group C: 2 voted; Ethnic Group D: 8 voted. Following the steps outlined.
Determining Significance
For this problem, x2 5 4.67. So now the researcher needs something with which to com- pare it, a critical value, and as with several of the other tests, the critical value is indexed to degrees of freedom for the problem.
• The degrees of freedom (df ) for a goodness-of-fit problem are determined by the number of categories in the problem minus 1.
• With subjects divided into four different ethnic groups, there are 4 2 1 5 3 df.
The critical values for chi-square in Table 11.2 (also Table G in the Appendix) are arranged by degrees of freedom down the left side, and the level at which the test is conducted across the top.
B Why are chi- square values never negative?
Try It!
suk85842_11_c11.indd 412 10/23/13 1:45 PM
CHAPTER 11Section 11.2 The Chi-Square Tests
Table 11.2: The critical values of chi-square
df p 5 0.05 p 5 0.01 p 5 0.001
1 3.84 6.64 10.83
2 5.99 9.21 13.82
3 7.82 11.35 16.27
4 9.49 13.28 18.47
5 11.07 15.09 20.52
6 12.59 16.81 22.46
7 14.07 18.48 24.32
8 15.51 20.09 26.13
9 16.92 21.67 27.88
10 18.31 23.21 29.59
11 19.68 24.73 31.26
12 21.03 26.22 32.91
13 22.36 27.69 34.53
14 23.69 29.14 36.12
15 25.00 30.58 37.70
16 26.30 32.00 39.25
17 27.59 33.41 40.79
18 28.87 34.81 42.31
19 30.14 36.19 43.82
20 31.41 37.57 45.32
21 32.67 38.93 46.80
22 33.92 40.29 48.27
23 35.17 41.64 49.73
24 36.42 42.98 51.18
25 37.65 44.31 52.62
26 38.89 45.64 54.05
27 40.11 46.96 55.48
28 41.34 48.28 56.89
29 42.56 49.59 58.30
30 43.77 50.89 59.70
Used by permission of Alexei A. Sharov, Ph.D. Virginia Tech. www.home.comcast.net/~sharov/PopEcol/
suk85842_11_c11.indd 413 10/23/13 1:45 PM
CHAPTER 11Section 11.2 The Chi-Square Tests
In order to keep the size of the table manageable, the values are carried to two decimals, which is congruent with APA guidelines. Consistent with that practice, with chi-square problems, final answers will be also rounded to two decimal places.
Using the table, and starting with the leftmost column, move down to find df 5 3, then move right to the first column “p 5 0.05;” the critical value for chi-square 5 7.82. To dis- tinguish the calculated value of chi-square from the critical value, follow the same pattern adopted for the other tests. First, calculate the value from the test results:
x2 5 4.67 for the calculated value
Then calculate the result from the table of critical values indicated by the subscripts for the level of probability at which the test is conducted and for the degrees of freedom for the particular problem:
x2.05(3) 5 7.82
With a calculated value less than the critical value from the table, any differences in the ethnicity of the voters in these four groups are not statistically significant. That seems like a strange conclusion because there are substantial differences in the fo values. Why are those differences not great enough to be significant? The answer goes back to the heart of what a goodness-of-fit test is designed to analyze. Pearson focused not on the differ- ences from group to group but on the differences within each group between what was observed and what could be expected to occur if the initial hypothesis is valid in each category. If the unit of analysis compared the two members of ethnic group C to the eight members of ethnic group D, it might have been different, but the issue is the difference between the fo and fe values within each category. Clearly the difference in ethnic group C between 2 ( fo) and 4.5 ( fe) is a different matter than the difference between 2 (ethnic group C) and 8 (ethnic group D). The result indicates that across the four groups there is not enough variation between fo and fe for the result to be significant. This much difference could have occurred by chance.
Another caveat to this example and the one issue with the chi-square test is the small sample sizes of fo and fe. When testing against the chi-square distribution to determine significance, the approximation becomes less accurate with smaller sample sizes in which fo and fe are 5 or less. Instead, the chi-square should not be used; instead, substituting for the Fisher exact test is more appropriate.
The Hypotheses in a Goodness-of-Fit Test
Consistent with the other tests of significant differences (z, t, F), the null hypothesis in the chi-square tests is the hypothesis of no difference. For chi-square tests, the no-difference assumption is analyzed in terms of the approximate equivalence of the fo and fe for a par- ticular category. The alternate hypothesis is that what is observed is significantly different from what is expected:
H0: fo 5 fe (There is no significant difference between frequency observed and frequency expected.)
Ha: fo ? fe (There is a significant difference between frequency observed and fre- quency expected.)
suk85842_11_c11.indd 414 10/23/13 1:45 PM
CHAPTER 11Section 11.2 The Chi-Square Tests
In the case of ethnic group voting behavior, the statistical decision is to fail to reject H0. The differences between what was observed and what was expected across the four groups were not great enough to be statistically significant.
A Goodness-of-Fit Problem with Nonequivalent Frequencies Expected
In the problem just worked, the fe values were the same for all four ethnic groups. The researcher tested the assumption that the voters were made up of about equal numbers of people from the four ethnic groups. Researchers do not always assume equivalent values when the categories are not equal. The fe values must be calculated differently. In addi- tion, fo and fe should be at least 5 per category. In programs such as SPSS, a message will be given when sample sizes are less than 5 per cell.
The last of the example problems from the beginning of the chapter speculated about sandwich sales and price. It suggested testing whether weekly sales have a three times greater frequency when the food is reduced to half its usual price.
The market analyst for the fast-food chain gathered data for sandwich sales the week before the price reduction and the week of the price reduction. The data follows:
Sales for the particular sandwich the week before the price reduction: 154
Sales for the particular sandwich the week of the price reduction: 275
This data makes the total sales for the two-week period, 154 1 275 5 429
With a hypothesis that reduced-price sandwiches will outsell the regular-price sandwiches 3 to 1, the problem cannot be set up with equal fe values. They must reflect the hypothesis and still sum to n, which is 429, but the fe values must reflect the 3 to 1 ratio.
The easy way to determine the fe values for such problems is to
• take the ratio, 3 to 1 in this example, • add those values together, • and then divide n by their sum:
3 1 1 5 4
429 4 4 5 107.25
The fe value for the regular-price week will be 1 times that value, or 107.250.
The fe value for the sale-price week will be 3 times that value, or 321.750.
The balance of the problem involves the same procedure used in Table 11.1 except that there are only two categories. The problem is completed in Table 11.3.
suk85842_11_c11.indd 415 10/23/13 1:45 PM
CHAPTER 11Section 11.2 The Chi-Square Tests
Table 11.3: A goodness-of-fit test with unequal frequencies expected
Value Normal price Half price
fo 154 275
fe 107.25 321.75
fo 2 fe 46.75 46.75
( fo 2 fe) 2 2,185.56 2,185.56
( fo 2 fe) 2/fe 20.38 6.79
a 27.17
x2 5 27.17
Because we assumed that sandwich sales during the reduced-price week would be 3 times sales for the regular-price week, any data that is consistent with that expectation is con- sistent with the null hypothesis (H0). With a calculated value of x
2 5 27.17, and a critical value from the table (x2.05(1)) 5 3.84, the result is clearly statistically significant, and the null hypothesis must be rejected. How should such a result be interpreted?
The market researcher already knew that the categories themselves—sales before and during price reduction—were unequal. In fact, the null hypothesis was that they were unequal by a ratio of 1:3, so it is not an inequality between the two categories that makes the result significant. Instead, the important inequality is the one between the sets of observed ( fo) and expected ( fe) values. In other words, the market researcher is testing whether the observed ( fo) values in the two categories differ enough from their expected ( fe) values that the difference is not likely to be a chance difference. In fact, the fo values for the regular-price week ( fo 5 154) and the sale week ( fo 5 275) reflect less than a 1:2 ratio, which is much less than the expected 1:3 ratio. Because those numbers are substantially short of the expected ratio, the chi-square value is significant at p , .05. Therefore, the market researcher decides that the modest increase in sales is not enough to justify the sale price of half off because actual sales fell short of the expected threefold increase.
The Chi-Square and Statistical Power
The differences between what is expected and what actually occurs must be substantial before a chi-square result will be significant. Nominal data cannot match the sophistica- tion of ratio, interval, or even ordinal data because count data does not contain enough information to indicate the subtle differences in measured qualities that data of the other scales can reflect. The analytical price paid for relying exclusively on nominal data is power. Recall that in statistical terms, power refers to the probability of detecting a signifi- cant difference when it is present.
Users of these tests gain great flexibility. The chi-square tests require no judgments about normality or linearity, but there are no statistical free lunches. Flexibility is an ever-present companion in that the probability of a type II error, or the failure to detect a significant
suk85842_11_c11.indd 416 10/23/13 1:45 PM
CHAPTER 11Section 11.3 The Chi-Square Test of Independence
difference when it is present, is higher with these tests than with the procedures in the earlier chapters. The departures from the fo 5 fe assumption must be quite extreme before they can be chalked up to anything except sampling variability. That was the situation in the first problem on voter turnout when it appeared that there were substantial differ- ences in the voting behavior of people of different ethnic backgrounds, but they were not large enough to be significant. This was probably also compromised by the small sample sizes of n , 5. Expected frequencies should be at least n 5 5 in order to properly approxi- mate a chi-square distribution.
Remember that type I and II errors are related, however. As conditions make one type of error more likely, the other error is less common. Although the chi-square tests have a higher incidence of type II error than some other tests manifest, at least the probability of type I error is lower than with many of the parametric alternatives, thereby making it a conservative option.
That said, the problem of the power loss that comes from using a chi-square test is often a nonissue. If the data is nominal scale to begin with, there is no decision to be made about what kind of test to use. The only choice can be to use one that accommodates nominal data. When power is an issue is when data is not nominal scale but because it fails to meet some requirements, such as normality, the intended test cannot be used unless some sort of bootstrapping or transformation is performed. The other alternative is the use of non- parametric tests.
For example, suppose that the voting-survey researcher also asked respondents how many times in the last 15 years they had participated in elections. The researcher may have intended to use analysis of vari- ance to determine whether there are ethnic group differences in the level of participation in elections. However, 18 respondents divided among four ethnic groups is a very small sample size. The 2 people in ethnic group C provide little basis for completing an ANOVA. That sample is simply too small. Furthermore, just one or two extremely low or extremely high scores will skew a sample this small, making normality a problem. In such a case, a shift to a nonparametric test like the goodness-of-fit test, where neither the normality of the data nor the sample size is central, is likely to be more appropriate.
11.3 The Chi-Square Test of Independence
Both of the chi-square problems in this chapter have been goodness-of-fit (1 3 k) tests. The first problem had four categories, but there was just one variable involved: the ethnicity of the voter. The second problem had two categories, the number from the regular-price week and the number from the sale-price week, but again a single variable: weekly sales. The tests work well for any number of nominal data categories that use a single variable.
Sometimes the question is more complex. Maybe the issue is of the ethnicity of the respon- dent and whether the individual voted in the last election. Or perhaps the question about sandwiches is the number sold per week and whether the purchasers were men or women.
C Which type of decision error is most characteristic of chi- square problems?
Try It!
suk85842_11_c11.indd 417 10/23/13 1:45 PM
CHAPTER 11Section 11.3 The Chi-Square Test of Independence
Both of those examples involve two variables. In any statistical analysis, the reason for adding variables is that one variable fails to provide an adequate explanation for what is studied. Although z, t, one-way ANOVA, and simple regression procedures are extremely important, they, like the goodness-of-fit test, are all restricted to a single independent vari- able, and there are relatively few outcomes, particularly related to human subjects, that a single variable can adequately explain.
In the chi-square test of independence, the additional variable provides a more detailed picture of the phenomenon studied, but the procedure also tests whether the two vari- ables operate independently of each other. This is what gives rise to the name of the pro- cedure, which is also called the r 3 k chi-square.
The Hypotheses in the Chi-Square Test of Independence
The null and alternate hypotheses look the same as they do in the 1 3 k:
H0: fo 5 fe Ha: fo ? fe
The hypotheses are reminders that the issue is how the frequencies observed compare to the frequencies expected. H0 is rejected for calculated values of chi-square that are larger than the table value. However, in an r 3 k chi-square problem, the null hypothesis also indicates that the two variables are unrelated, that one variable does not affect the other, or that they are independent of one another (hence the test of independence). So the hypoth- eses should read:
H0: There is a nonsignificant relationship to each other (i.e., the variables are independent).
Ha: There is significant relationship to each other (i.e., the variables are dependent).
If the null hypothesis is rejected (indicating that the two variables are related or depen- dent), there is another step to the analysis: determining the strength of the relationship between the variables, as demonstrated in the following example.
A Chi-Square Test-of-Independence Problem
Refer to the ethnicity and voting behavior problem. The researcher decides to expand the study to gain a more comprehensive view of how ethnicity and the tendency to vote might be related. With a list of registered voters in hand, the researcher sends out several questionnaires asking, among other things, the individual’s ethnicity and whether the person voted in the last national election. With 36 responses, the researcher has the fol- lowing data:
Ethnic Group A: Of the 12 respondents, 8 voted
Ethnic Group B: Of the 8 respondents, 2 voted
Ethnic Group C: Of the 8 respondents, 3 voted
Ethnic Group D: Of the 8 respondents, 7 voted
suk85842_11_c11.indd 418 10/23/13 1:45 PM
CHAPTER 11Section 11.3 The Chi-Square Test of Independence
The Contingency Table
The two variables involved, ethnicity and whether the individual voted, make up the rows and columns of a table into which the fo data will be placed. Keep in mind that it is arbitrary in the placement of variables as either rows or columns. The table, called a con- tingency table, is organized with all the categories of one variable reflected in the rows of the table, and the categories of the other variable listed in the columns. The r and k in the alternate name for this test refer to the rows for one variable and the categories of the other, which are reflected in the columns of the contingency table. A contingency (or cross- tabulation) table for the data looks like this:
Ethnic Group Voted in Last Election
Yes No Total
A 8 a 4 b 12
B 2 c 6 d 8
C 3 e 5 f 8
D 7 g 1 h 8
Totals 20 16 36
The ethnicity of the individual is indicated in the rows. The columns include the data on the individuals who voted. Each of the eight cells is identified with a letter, which the researcher will use when calculating the chi-square value. Note that there are both row and column totals. The sum of the row totals must have the same value as the sum of the column totals.
The hypotheses of this scenario will be:
H0: There is no significant relationship of ethnic groups and voting in the last election.
Ha: There is a significant relationship of ethnic groups and voting in the last election.
Calculating the Frequency-Expected Values, fe Similar to the 1 3 k chi-square test, the frequency-observed ( fo) values for the r 3 k test are simply the values in the preceding table cells. However, including a second variable changes the way the frequencies expected ( fe) are calculated. Because each cell value reflects the influence of two variables, the researcher cannot just divide the number of subjects by the number of cells and use the same fe value for each cell. Notice that each cell is at the intersection of a row and a column: the fe value must reflect the impact that both variables have on the outcome. The fe value for cell a in the r 3 k chi-square test is completed this way:
• The total of the entire row for cell a 5 12 • The total of the entire column in which cell a is found 5 20 • The total number of subjects in the study 5 36
suk85842_11_c11.indd 419 10/23/13 1:45 PM
CHAPTER 11Section 11.3 The Chi-Square Test of Independence
• The fe value for cell a is the row total (12) times the column total (20),with the product of those two values divided by the total number of subjects (36), or
• (12 3 20) 4 36 5 6.67
These calculations are made for each cell in the contingency table. Continuing the fe cal- culations for cells b through h, the researcher finds the following expected frequencies:
fe for cell b is (12 3 16) 4 36 5 5.33
fe for cell c is (8 3 20) 4 36 5 4.44
fe for cell d is (8 3 16) 4 36 5 3.56
fe for cell e is (8 3 20) 4 36 5 4.44
fe for cell f is (8 3 16) 4 36 5 3.56
fe for cell g is (8 3 20) 4 36 5 4.44
fe for cell h is (8 3 16) 4 36 5 3.56
With the fo values indicated in the cells of the contingency table and the fe values calcu- lated, the researcher can create a table similar to that used in the goodness-of-fit problems at the beginning of the chapter and follow the same steps that were followed for the 1 3 k problems:
For each of the eight cells,
1. subtract fe from fo, 2. square the difference, 3. divide the squared difference by fe, which allows equivalent comparison across
all eight cells using both variables by having a standardized value 4. sum the results from each of the 8 cells, which is the value of chi-square.
The ethnicity and voting behavior problem is completed in Table 11.4. As stated earlier in the chapter, frequencies should be greater than 5. Fisher exact test should be employed if n , 5.
Table 11.4: The chi-square test of independence: Ethnicity and voting behavior
Value a b c d e f g h
fo 8 4 2 6 3 5 7 1
fe 6.67 5.33 4.44 3.56 4.44 3.56 4.44 3.56
fo 2 fe 1.33 21.33 22.44 2.44 21.44 1.44 2.56 22.56
( fo 2 fe) 2 1.77 1.77 5.95 5.95 2.07 2.07 6.55 6.55
( fo 2 fe) 2/fe 0.27 0.33 1.34 1.67 0.47 0.58 1.48 1.84
a 7.98
x2 5 7.98
suk85842_11_c11.indd 420 10/23/13 1:45 PM
CHAPTER 11Section 11.3 The Chi-Square Test of Independence
Degrees of Freedom in the Chi-Square Test of Independence
In this test, the degrees of freedom are based on the contingency table. They are deter- mined by the number of rows minus 1 times the number of columns minus 1. For this problem, where there are four rows and two columns in the contingency table, that is
(4 2 1) 3 (2 2 1) 5 3
Using the value for 3 df to determine the critical value from Table 11.2, the researcher can summarize results as follows:
x2 5 7.98
x2.05(3) 5 7.82
Interpreting the r 3 k Result
Recall that the r 3 k result is designed to indicate whether the two variables in the analysis are independent of each other. By conducting the test of independence, the researcher is asking, Does ethnicity affect whether someone votes in a national election? Karl Pearson’s approach for this test was to compare what actually occurs in a particular situation ( fo) to what is most likely to occur if the variables involved are not correlated ( fe). In other words, he compared what occurred to what would occur if the null hypothesis holds. In order to do that, the fe values are calculated to indicate what to expect when the variables are inde- pendent of each other. It is the substantial variations of fo from fe that prompt larger values of chi-square. If the variations between fo and fe are great enough that they meet or exceed the critical value, the response is to reject the null hypothesis.
For the ethnicity and voting behavior problem, the calculated value of chi-square exceeds the critical value from Table 11.2 for p 5 .05 and 3 degrees of freedom. The result is statisti- cally significant p , .05. In this case, this means that voting behavior for some ethnic groups is different than for others. Ethnicity and the tendency to vote in elections are not vari- ables that function independently of each other Going back to the previous statement in regards to substantial variations
of fo from fe that prompt larger values of chi-square, we can see from Table 11.4 that Ethnic group D, which are g and h, had the highest fo 2 fe difference with Ethnic group B, which are c and d, having the second highest. This indicates that these two groups have signifi- cantly different voting/not voting frequencies compared to Ethnic groups A and C.
Classifying the r 3 k Test
In earlier chapters, statistical tests were organized according to whether they addressed the hypothesis of difference or the hypothesis of association. Tests like z, t, and ANOVA (F) are analyses of significant differences between samples and populations or between samples. The Pearson and Spearman correlation procedures address the hypothesis of association. They are procedures for determining the strength of the relationship between variables. The chi-square test of independence is not an either-or test. The initial ques- tion for the researcher was whether there are significant differences in voting behavior
D How many categories in either variable can the r 3 k chi- square accommodate?
Try It!
suk85842_11_c11.indd 421 10/23/13 1:45 PM
CHAPTER 11Section 11.3 The Chi-Square Test of Independence
among the different ethnic groups. That makes r 3 k sound a lot like an ANOVA. How- ever, another way to describe the test is to ask whether ethnicity and voting behavior are related, a question that makes the test more of a correlation analysis. The r 3 k test addresses both of those main hypotheses. It straddles the ground between the hypotheses of difference and association.
Phi Coefficient and Cramér’s V
Because the researcher’s results indicate that ethnicity and voting behavior are not inde- pendent, it means that the variables are related. How related are they? The next step in the problem is for the researcher to determine the strength of that relationship.
There are several correlation procedures for nominal data. We will rely on either the phi coefficient (symbolized by the Greek letter phi, F, or rF ), or a variation of the phi coef- ficient called Cramér’s V. The phi coefficient is used when at least one of the variables has only two levels. That is the case with the ethnicity and voting behavior problem, in which one of the variables is whether the individual voted (yes/no). If both variables have three or more levels, Cramér’s V is used. The following problem calls for Cramér’s V.
Part of the appeal of these correlation procedures is that they rely directly on the value of chi-square, so most of the calculations have been done. Furthermore, there are no sepa- rate significance tests for the correlation values. Correlations are calculated only when chi-square is statistically significant, which means that the r 3 k procedure is always com- pleted first. If the x2 value is significant, then the F or V coefficient will be significant as well.
The formula for the phi coefficient is
F 5 Å a x2
n b Formula 11.2
Where
x2 5 the value already available from the r 3 k problem
n 5 the total number of observations (the sum of the fo values)
The formula for Cramér’s V is
V 5 Å x2
n 3 1k 2 12 Formula 11.3
Where
x2 5 the value already available from the r 3 k problem
n 5 the total number of observations
k 2 1 5 the number of rows or columns—whichever is less—minus 1
If there are only two levels of one of the variables, then V 5 F because the denominator in V would be n 3 1.
suk85842_11_c11.indd 422 10/23/13 1:45 PM
CHAPTER 11Section 11.3 The Chi-Square Test of Independence
For the ethnicity and voting behavior problem, x2 5 7.98, and it is statistically significant.
F 5 Å a x2
n b
5 Å 7.98 36
5 .47
When the chi-square value is inserted into the formula, because the symbol has the expo- nent ( x2 ), sometimes there is an inclination to square the value once entered. Remember that the exponent indicates that it was squared when the chi-square value was initially calculated, before it became a part of the phi or Cramér’s V formula.
The phi value is interpreted like any other correlation value, except for two things: First, there is no need to check for significance because that was already established for the chi-square value. Second, because the calculation involves a square root, phi can never be negative. Correlations closer to 1.0 are stronger; values closer to 0 are weaker. The cor- relation here between ethnicity and the tendency to vote might be described as a moder- ate correlation.
Apply It!
Chi-Square Test in a City Program
A large metropolitan city is reviewing its recycling program. The city managers want to increase the rate of recycling in the city. In order to design incentives
for households to recycle more, they need to know more about people’s recycling habits. They want to know if people in different categories of housing recycle at different rates. The city sends out 3,000 questionnaires to people in different types of housing asking if they normally recycle and receives 1,434 completed questionnaires.
The null hypothesis is that people in all types of housing recycle at equal rates, and the alter- nate hypothesis is that they recycle at different rates. The null and alternate hypotheses are
H0: There is no significant relationship between housing category and regularly recycling.
Ha: There is a significant relationship between housing category and regularly recycling.
The null hypothesis will be rejected if the calculated value of chi-square is larger than the table value. If the null hypothesis is rejected, then the two variables are related or dependent.
(continued)
suk85842_11_c11.indd 423 10/23/13 1:45 PM
CHAPTER 11Section 11.3 The Chi-Square Test of Independence
Apply It! (continued)
The survey results follow:
Housing Category Regularly Recycling? Totals
Yes No
Detached single-family homes 248 128 376
Townhomes 212 143 355
Condominiums 198 95 293
Single-room-occupancy hotel 33 67 100
Apartments 133 177 310
Totals 824 610 1,434
The calculated chi-squared value is 75.47. The table value for p 5 .05 with (5 2 1) 3 (2 2 1) 5 4 degrees of freedom is 9.49.
x2 5 75.47 . x2 .05(4) 5 9.49
Therefore, the null hypothesis is rejected, and the result is statistically significant. In other words, recycling rates are significantly related to the type of housing categories. Because the variables are related, the next step is to determine the strength of the relationship. Phi coefficient will be used because one of the variables has only two levels. The value for phi coefficient is
F 5 Å a x2
n b
Where
x2 5 75.47
n 5 is the total number of subjects 5 1,434
F 5 Å a 75.47 1,434
b
F 5 .23
Based on the results, the city managers decide to target their recycling information campaign first on people who live in apartments and single-room-occupancy hotels because these two groups have much higher occurrences of not recycling compared to those who are recycling. This is the opposite of those in detached single-family homes, townhomes, and condomini- ums where there are more occurrences of recycling than nonrecycling. Therefore, city man- agers are likely to achieve the greatest amount of change by focusing on apartments and single-room-occupancy hotels.
Apply It! boxes written by Shawn Murphy.
suk85842_11_c11.indd 424 10/23/13 1:45 PM
CHAPTER 11Section 11.3 The Chi-Square Test of Independence
A 3 3 3 Problem
An earthquake and an accompanying tsunami have wreaked havoc in a particular coun- try. A psychologist is analyzing the appeal of three different outreach programs to people who describe their losses as primarily material, primarily emotional, or a combination. Data follows:
• Among 8 people describing their losses as primarily material, 6 opt for A, 2 for B, and 0 for C.
• Among 6 people describing their losses as primarily emotional, 0 opt for A, 2 for B, and 4 for C.
• Among 6 people describing their losses as a combination, 0 opt for A, 5 opt for B, and 1 for C.
The question is whether people who have different kinds of losses choose to be involved in different kinds of counseling programs. The contingency table for this data looks like this:
Type of Loss Program Totals
A B C
Material 6 2 0 8
a b c
Emotional 0 2 4 6
d e f
Material and Emotional
0 5 1 6
g h i
Totals 6 9 5 20
The solution for this problem is in Table 11.5.
Table 11.5: Another r 3 k problem: Outreach programs and the type of loss suffered
Value a b c d e f g h i
fo 6 2 0 0 2 4 0 5 1
fe 2.4 3.6 2 1.8 2.7 1.5 1.8 2.7 1.5
fo 2 fe 3.6 21.6 22 21.8 20.7 2.5 21.8 2.3 20.5
( fo 2 fe) 2 12.96 2.56 4 3.24 0.49 6.25 3.24 5.29 0.25
( fo 2 fe) 2/fe 5.4 0.71 2 1.8 0.18 4.17 1.8 1.96 0.17
a 18.19
x2 5 18.19
suk85842_11_c11.indd 425 10/23/13 1:45 PM
CHAPTER 11Section 11.3 The Chi-Square Test of Independence
For the problem about analyzing the program people opt for and the type of loss they have experienced, x2 5 18.19. With three rows and three columns, degrees of freedom are (3 2 1) 3 (3 2 1) 5 4. The critical value for df 5 4 when testing at p 5 .05 is 9.49.
x2 5 18.19
x2 .05(4) 5 9.49
The result is statistically significant, so the null hypothesis is rejected. The fo values are significantly different from the fe values, indicating that the type of loss people describe and the type of counseling they seek are not variables that operate independently of each other. Because there are more than two levels of both variables, Cramér’s V will allow the researcher to determine the strength of the relationship between the program people choose and the type of loss they have experienced.
V 5 Å x2
n 3 1k 2 12
With n 5 20 and the number of either the rows or columns 5 3, k 2 1 5 2, so Cramér’s V is calculated as follows:
V 5 Å 18.19
20 3 2 5 .67
The r 3 k result indicated that there are significant differences among the nine categories between what was observed and what would have been expected if there was no relation- ship between the type of program and the kind of loss victims suffered. Specifically, using Table 11.5, the standardized value of cell a (5.40)—Program A, material loss, followed by cell f (4.17)—Program C, emotional loss, had much higher values compared to other cells.
Cramér’s V quantified the strength of the relationship between program and type of loss. Besides being statistically significant (which the significant x2 value indicates), the strength of the relationship appears to be substantial (V 5 .67).
Apply It!
Public Policy Research
An environmental advocacy group with a broad base of supporters is develop- ing a new fund-raising campaign. This group strives to make their campaigns
appeal to people across the political spectrum. Before launching the campaign, they want to test acceptance of their message with existing donors who have described themselves as being conservative or liberal to various degrees. From each donor, they collect survey infor- mation gauging their response to the campaign in one of three categories: disagree, agree, or strongly agree. The respondents classify themselves as very conservative, conservative, moderate, liberal, or very liberal.
(continued)
suk85842_11_c11.indd 426 10/23/13 1:45 PM
CHAPTER 11Section 11.3 The Chi-Square Test of Independence
Apply It! (continued)
This is a 3 3 5 chi-squared test of independence. The environmental advocacy group wants to know if there is a statistically significant difference in the ways the different classifications of people respond to the campaign. The null hypothesis is that there is no difference and that the campaign does not appeal more strongly to any of the classifications.
H0: fo 5 fe
Campaign Disagree Agree Strongly Agree Totals
Very conservative 1 4 13 18
Conservative 2 5 9 16
Moderate 10 4 2 16
Liberal 11 5 1 17
Very liberal 13 3 0 16
Totals 37 21 25 83
Given this data, the environmental group calculates a chi-squared value of 42.04. The degrees of freedom for this 5 3 3 test is df 5 (5 2 1) 3 (3 2 1) 5 8. From Table 11.2, with p 5 .05 and df 5 8, we find
x2 .05(8) 5 15.51
x2 5 42.04 . x2 .05(8) 5 15.51
Because the calculated chi-squared value is greater than the table value, the null hypothesis is rejected. There were significant differences among the 15 categories between what was observed and what would have been expected if there was no relationship. Inspection of the data shows that people who described themselves as more conservative were more likely to strongly agree with the campaign. Conversely, people who described themselves as more liberal were more likely to disagree with the campaign.
The advocacy group next used Cramér’s V to determine the strength of the relationship between political views and response to the campaign. Cramér’s V is used because there are more than two levels in each of the variables.
V 5 Å x2
n 3 1k 2 12 In this case, k represents the variable with the fewest number of levels, which is 3, so k 2 1 5 2. n 5 the total number of responses, which is 83.
V 5 Å 42.04
83 3 2
V 5 .50 (continued)
suk85842_11_c11.indd 427 10/23/13 1:45 PM
CHAPTER 11Section 11.4 Completing the r 3 k With Excel
11.4 Completing the r 3 k With Excel
The Excel statistical package focuses on analyses for interval and ratio data. To illus-trate, here is how to complete the second r 3 k problem on Excel. To match Figure 11.1, take the following steps:
Figure 11.1: Completing the chi-square test of independence on Excel
Apply It! (continued)
This can be described as a moderate correlation. Clearly, the advocacy group did not devise a campaign that would apply to all types of contributors. If they want to reach a broad group of supporters, they will have to revise the campaign to appeal more to their liberal constituency.
Apply It! boxes written by Shawn Murphy.
suk85842_11_c11.indd 428 10/23/13 1:45 PM
CHAPTER 11Section 11.4 Completing the r 3 k With Excel
1. Leave cell A1 blank, and then label cells B1–F1 as follows: • fo • fe • fe 2 fo • (fe 2 fo)2 • (fe 2 fo)2/fe
2. Beginning in cell A2 and continuing to cell A10, enter the labels, as shown in Figure 11.1.
3. In column B under “fo,” enter the fo values for cells a through i from the con- tingency table.
4. Leave cell A11 blank, and then beginning in cell A12, type the label row 1 5. The label row 2 5 will go in cell A13.
5. Starting in cell B12, enter the formula for determining the sum of row 1 (as if you were adding cells a, b, and c in the contingency table): 5sum(B2:B4). • For the row 2 total in cell B13, the command will be 5sum(B5:B7). Click Enter. • For the row 3 total in cell B14 the command will be 5sum(B8:B10). Click Enter. • For the column 1 total in cell B15 the command will be 5sum(B231B531B8).
Click Enter. • For the column 2 total in cell B16 the command will be 5sum(B31B631B9).
Click Enter. • For the column 3 total in cell B17 the command will
be 5sum(B431B731B10). Click Enter. 6. In cell C2, enter the formula for determining the fe value for what is cell a in the
contingency table, with the formula 5(B12*B15)/20. For cells b–i in the contin- gency table, the commands are the following: • In cell C3, enter the command 5(B12*B16)/20. • In cell C4, enter the command 5(B12*B17)/20. • In cell C5, enter the command 5(B13*B15)/20. • In cell C6, enter the command 5(B13*B16)/20. • In cell C7, enter the command 5(B13*B17)/20. • In cell C8, enter the command 5(B14*B15)/20. • In cell C9, enter the command 5(B14*B16)/20. • In cell C10, enter the command 5(B14*B17)/20.
7. In cell D2, enter the command 5B2-C2. • Repeat this command for cells D3–D10 by placing the cursor on the result in
cell D2 and dragging the cursor down to cell D10 so that all the cells from D2 to D10 are highlighted.
• From the Home tab, select the small down arrow to the right of the larger down arrow at the top of the page on the extreme right just under the sum- mation ( a ) sign, and then select down.
suk85842_11_c11.indd 429 10/23/13 1:45 PM
CHAPTER 11Section 11.5 Presenting Results
8. In cell E2, enter the command 5D2^2, which will enter the square of the cell D2 contents in cell E2. • Repeat this command for cells E3–E10 by placing the cursor on the result in
cell E2 and dragging it down to cell E10 so that all the cells from E2 to E10 are highlighted. From the Home tab, select the small down arrow to the right of the larger down arrow at the top of the page on the extreme right just under
the summation a a b sign, and then select down. 9. In cell F2, enter the command 5E2/C2.
• Repeat this command for cells F3–F10 by placing the cursor on the result in cell F2 and dragging it down to cell F10 so that all the cells from F2 to F10 are highlighted. From the Home tab, select the small down arrow to the right of the larger down arrow at the top of the page on the extreme right just below
the summation a a b sign, then select down. 10. Sum the values from cells F2 to F10 by entering the command 5sum(F2:F10) in
cell F11.
11.5 Presenting Results
Using the data set from Pew Research (2010), Millenials.sav file, both types of chi-square tests—the goodness-of-fit and the test of independence—will be performed in SPSS. Interpretation of the results will be presented in Section 11.6.
SPSS Example 1: Goodness-of-Fit Chi-Square Test
A student has acquired public archival data and wishes to practice the goodness-of-fit chi-square test to determine significant differences in observed versus expected values of three independent variables: race-ethnicity (racethn), political party affiliation (party), and gender groups (sex).
Since there are three independent variables, there will be three sets of hypotheses as follows:
H01: There is no significant difference between frequency observed and frequency expected for race-ethnicity.
Ha1: There is a significant difference between frequency observed and frequency expected for race-ethnicity.
H02: There is no significant difference between frequency observed and frequency expected for political party affiliation.
suk85842_11_c11.indd 430 10/23/13 1:45 PM
CHAPTER 11Section 11.5 Presenting Results
Ha2: There is a significant difference between frequency observed and frequency expected for political party affiliation.
H03: There is no significant difference between frequency observed and frequency expected for gender.
Ha3: There is a significant difference between frequency observed and frequency expected for gender.
To perform the goodness-of-fit chi-square test, go to AnalyzeSNonparametric TestsSLegacy DialogsSChi-Square. Input racethn, party, and sex into the Test Variable List box (your screen should look like that in Figure 11.2). Then click OK. Output tables are presented in Figure 11.3.
Figure 11.2: SPSS steps for goodness-of-fit chi-square test
Source: Data from Pew Research Social & Demographic Trends. (2010). The Millennials. Retrieved from http://www.pewsocialtrends .org/category/datasets/
suk85842_11_c11.indd 431 10/23/13 1:45 PM
CHAPTER 11Section 11.5 Presenting Results
Figure 11.3: SPSS output for goodness-of-fit chi-square test
Source: Data from Pew Research Social & Demographic Trends. (2010). The Millennials. Retrieved from http://www.pewsocialtrends .org/category/datasets/
Race-Ethnicity
White non-Hispanic
Observed N
1369
Expected N Residual
404.0
Black non-Hispanic 228 404.0
Hispanic 237 404.0
Other 156 404.0
Don’t know/Refused (VOL.) 30 404.0
965.0
�176.0
�167.0
�248.0
�374.0
Total 2020
PARTY. In politics TODAY, do you consider yourself a Republican, Democrat, or Independent?
Republican
Observed N
465
Expected N Residual
336.7
Democrat 677 336.7
Independent 649 336.7
(VOL) No preference 125 336.7
(VOL) Other party 11 336.7
336.7
128.3
340.3
312.3
�211.7
�325.7
�243.7(VOL) Don’t know/Refused 93
Total 2020
SEX. (ENTER RESPONDENT’S SEX:)
Male
Observed N Expected N Residual
1010.0
Female 1010.0
Total
985
1035
2020
�25.0
25.0
Test Statistics
Chi-Square
Race-Ethnicity PARTY. In politics TODAY, do you consider yourself a Republican, Democrat,
or Independent?
SEX. (ENTER RESPONDENT'S
SEX:)
df
Asymp. Sig.
2949.183a
4
0.000
1307.178b
5
0.000
1.238c
1
0.266
a. 0 cells (0.0%) have expected frequencies less than 5. The minimum expected cell frequency is 404.0. b. 0 cells (0.0%) have expected frequencies less than 5. The minimum expected cell frequency is 336.7. c. 0 cells (0.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1010.0.
suk85842_11_c11.indd 432 10/23/13 1:45 PM
CHAPTER 11Section 11.5 Presenting Results
SPSS Example 2: Test-of-Independence Chi-Square Test
Using the same data set from Example 1, the student wishes to explore the relationship of the independent variables race-ethnicity (racethn) and gender (sex) on political party affil- iation (party) using the test-of-independence chi-square test. The specific hypotheses are:
H01: There is no significant relationship between gender and political party affiliation.
Ha1: There is a significant relationship between gender and political party affiliation.
H02: There is no significant relationship between race-ethnicity and political party affiliation.
Ha2: There is a significant relationship between race-ethnicity and political party affiliation.
To perform the test-of-independence chi-square test, go to AnalyzeSDescriptive StatisticsSCrosstabs. Input party into Row(s) and sex, and racethn into Columns, check Display clustered bar charts (your screen should look like that in Figure 11.4). Click on Statistics and check Chi-square, and Phi and Cramer’s V (your screen should look like that in Figure 11.5). Then click Continue and OK. Output tables and graphs are presented in Figures 11.6, 11.7, 11.8, and 11.9.
Figure 11.4: SPSS step 1 for test-of-independence chi-square test
Source: Data from Pew Research Social & Demographic Trends. (2010). The Millennials. Retrieved from http://www.pewsocialtrends .org/category/datasets/
suk85842_11_c11.indd 433 10/23/13 1:45 PM
CHAPTER 11Section 11.5 Presenting Results
Figure 11.5: SPSS step 2 for test-of-independence chi-square test
Source: Data from Pew Research Social & Demographic Trends. (2010). The Millennials. Retrieved from http://www.pewsocialtrends .org/category/datasets/
suk85842_11_c11.indd 434 10/23/13 1:45 PM
CHAPTER 11Section 11.5 Presenting Results
Figure 11.6: SPSS output 1 for test-of-independence chi-square test
Source: Data from Pew Research Social & Demographic Trends. (2010). The Millennials. Retrieved from http://www.pewsocialtrends .org/category/datasets/
Crosstab Count
Republican
Male
SEX. (ENTER RESPONDENT’S SEX:)
Female
Total
Democrat
Independent
(VOL) No preference
(VOL) Other party
217
403
288
61
4
62
1035
(VOL) Don’t know/Refused
Total
248
274
361
64
7
31
985
PARTY. In politics TODAY, do you consider yourself a Republican, Democrat, or Independent?
465
677
649
125
11
93
2020
Symmetric Measures
Nominal by Nominal Phi
Cramer’s V
N of Valid Cases
Value
0.149
0.149
2020
Approx. Sig.
0.000
0.000
a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis.
Chi-Square Tests
Pearson Chi-Square
Asymp. Sig. (2-sided)
Likelihood Ratio
df
5
5
1Linear-by-Linear Association
N of Valid Cases
Value
44.872a
45.225
3.099
2020
0.000
0.000
0.078
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 5.36.
suk85842_11_c11.indd 435 10/23/13 1:45 PM
CHAPTER 11Section 11.5 Presenting Results
Figure 11.7: Bar graph for the proportions of gender (sex) and political party affiliation (party)
Source: Data from Pew Research Social & Demographic Trends. (2010). The Millennials. Retrieved from http://www.pewsocialtrends .org/category/datasets/
suk85842_11_c11.indd 436 10/23/13 1:45 PM
CHAPTER 11Section 11.5 Presenting Results
Figure 11.8: SPSS output 2 for test-of-independence chi-square test
Source: Data from Pew Research Social & Demographic Trends. (2010). The Millennials. Retrieved from http://www.pewsocialtrends .org/category/datasets/
Crosstab Count
Republican
White non- Hispanic
Black non- Hispanic
Race-Ethnicity
Hispanic Other Don’t know/
Refused
Total
Democrat
Independent
(VOL) No preference
(VOL) Other party
(VOL) Don’t know/Refused
Total
396
381
472
62
7
51
1369
465
677
649
125
11
93
2020
7
160
36
13
0
12
228
33
81
81
26
2
14
237
PARTY. In politics TODAY, do you consider yourself a Republican, Democrat, or Independent?
24
53
56
18
1
4
156
5
2
4
6
1
12
30
Symmetric Measures
Nominal by Nominal Phi
Cramer’s V
N of Valid Cases
Value
0.406
0.203
2020
Approx. Sig.
0.000
0.000
a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis.
Chi-Square Tests
Pearson Chi-Square
Asymp. Sig. (2-sided)
Likelihood Ratio
df
20
20
1Linear-by-Linear Association
N of Valid Cases
Value
333.684a
286.869
73.007
2020
0.000
0.000
0.000
a. 6 cells (20.0%) have expected count less than 5. The minimum expected count is 0.16.
suk85842_11_c11.indd 437 10/23/13 1:45 PM
CHAPTER 11Section 11.6 Interpreting Results
Figure 11.9: Bar graph for the proportions of race-ethnicity (race) and political party affiliation (party)
Source: Data from Pew Research Social & Demographic Trends. (2010). The Millennials. Retrieved from http://www.pewsocialtrends .org/category/datasets/
11.6 Interpreting Results
Refer to the most recent edition of the APA manual for specific detail on formatting statistics; Table 11.6 may be used as a quick guide in presenting the statistics covered in this chapter.
suk85842_11_c11.indd 438 10/23/13 1:45 PM
CHAPTER 11Section 11.6 Interpreting Results
Table 11.6: Guide to APA formatting of statistics results
Abbreviation or Term Description
x2 Chi-square
fe Frequency of expected values
fo Frequency of observed values
F or r w
Phi coefficient
V Cramér’s coefficient
Source: Publication Manual of the American Psychological Association, 6th edition. © 2009 American Psychological Association, pp. 119–122.
Using the results from SPSS Example 1: Goodness-of-Fit Chi-Square Test, the results from Figure 11.3 show the following:
• Three x2 goodness-of-fit tests were executed simultaneously. The three variables were race (racethn), political party affiliation (party), and gender (sex). Using the Test Statistics table, race (racethn) was statistically significant in regards to race- ethnicity group frequencies, x2 (4) 5 2,949.18, p , .05; party affiliation (party) was also significant in proportions, x2 (5) 5 1,307.18, p , .05. Therefore, both H01 and H02 were rejected, meaning that there were significant differences in race- ethnicity group proportions and political party affiliations proportions. Gender (sex) was not statistically significant in proportions of males to females, x2
(1) 5 1.238, p 5 .266, and so H03 was retained. This result indicates that there were no significant difference in proportions between males and females.
Using the results from SPSS Example 2: Test-of-Independence Chi-Square Test, the results from Figure 11.6 show the following:
• Two x2 tests of independence were executed simultaneously between gender (sex) and race-ethnicity (racethn) with political party affiliation (party). Based on the Chi-Square Tests table in SPSS Output 1 for gender (sex) and political party affili- ation (party), there is a significant relationship between these two variables based on Pearson’s values, x2 (5) 5 44.872, p 5 .000, V 5 .149. This indicates that these two variables are dependent on one another. H01 can be rejected with support for Ha1. Judging from Figure 11.7, the graph shows proportional differences of gender and political party groups.
• Based on the Chi-Square Tests table in SPSS Output 2 for race (racethn) and politi- cal party affiliation (party), there is a significant association between these two variables: x2 (4) 5 333.68, p , .05, V 5 .203. This indicates that these two variables are dependent on one another. H02 can be rejected with support for Ha2. Judging from Figure 11.9, the graph shows proportional differences of race-ethnicity and political party groups.
suk85842_11_c11.indd 439 10/23/13 1:45 PM
CHAPTER 11Summary
Summary Benjamin Disraeli, a 19th-century British prime minister and apparently a skeptic, observed that what we expect usually does not occur and what we do not anticipate typi- cally happens. Karl Pearson, who happened to be one of Disraeli’s contemporaries, would have disagreed. In fact Pearson’s chi-square tests are based on the assumption that under normal circumstances, it is the expected thing that happens, and when the outcome differs significantly from what is expected, it is noteworthy.
The chi-square tests are the only procedures in this book that are designed for nominal scale data (Objective 1). This makes them stand apart from the others. But there are some important similarities as well. Both the goodness-of-fit (1 3 k) test and the test of indepen- dence (r 3 k) are tests of significant differences like the z-, t-, and ANOVA tests. In addi- tion, a significant finding with the r 3 k test leads to calculating either the phi coefficient or, when both variables have more than two categories, Cramér’s V. The phi coefficient and Cramér’s V are both correlation procedures, which places them in the same category as the Pearson and Spearman correlations. Although this chapter may seem very different from the first ten chapters, the differences are confined to the scale of the data involved and the kinds of values that are calculated as a result, not to the kinds of decisions that are made (Objectives 2 and 3).
Data of any scale can be reduced to nominal scale data and subjected to chi-square analy- sis. It is possible to turn ratio scale data into nominal scale data, for example. This is what occurs if a psychologist who has collected data on the number of times clients manifest compulsive behaviors elects to ignore the number of behaviors (ratio data) and focus only on whether or not people manifest compulsive behavior (yes/no, which is nominal data). It provides for a simpler analysis, but any opportunity to examine the degree of compulsiv- ity is forfeit. The difference between the client who is minimally compulsive and the one who is extremely compulsive is lost. Reductions in data scale result in lost data, and when they have a choice, most researchers avoid reducing data scale, in spite of the opportunity to execute the chi-square analysis with all its flexibility.
On the other hand, the chi-square tests do not require that the data meet any of the require- ments parametric tests must meet related to normality. Very small data sets are more acceptable in the chi-square tests than they are with the t-test, for example. The t-tests require that the data be normally distributed in a population, something that is very dif- ficult to determine with small samples. The chi-square tests involve no assumptions about how the data is distributed; there are no requirements for particular parameters. These nonparametric tests should have a place in every analyst's repertoire whether using Excel or SPSS (Objective 4).
The coverage of statistical procedures in this book certainly has not been exhaustive, but it is representative. The eleven chapters introduced some of the most important concepts in data analysis and some of the statistical procedures that provide the foundation for nearly all quantitative analysis. Someone with a grasp of the material in these chapters has a solid footing from which to conduct research, complete the data analysis that many reports require, and read the scholarly research. It might be helpful to think of the understand- ing gained of these topics as if it were a cognitive muscle. Ignore it for any length of time and it will atrophy. Occasional practice and frequent review will keep the understand- ing active. The final chapter (Chapter 12) will bring all of the aforementioned analyses
suk85842_11_c11.indd 440 10/23/13 1:45 PM
CHAPTER 11Chapter Exercises
together in what is known as a decision tree. This will help you determine the correct path to the proper analysis based on your variables and scales of measurement.
Key Terms
chi-square test of independence Also known as the r 3 k chi-square, a test of whether two nominal variables operate independently of each other.
contingency table The arrangement of data from a chi-square test of independence where the categories of one nominal vari- able are in rows and the categories of the other variable are in columns.
Cramér’s V A variation of the phi coef- ficient that is used if both variables have three or more levels.
goodness-of-fit chi-square test Also known as the 1 3 k chi-square, a test for significant differences in the frequency with which nominal data occurs in distinct categories.
phi coefficient Value used to determine the strength of the relationship when there are two categories of each variable in the chi-square test of independence. When there are more than two categories of both variables, the measure is Cramér’s V.
Chapter Exercises
Answers to Try It! Questions The answers to all Try It! questions introduced in this chapter are provided below.
A. The 1 3 k chi-square will accommodate just one nominal scale variable, but it may have any number of categories.
B. The fact that chi-square values are squared as they are calculated does away with any possibility of a negative value.
C. Type II errors are typically more of a problem with chi-square than type I errors are. If group or categories have an fe , 5, then this is another issue.
D. The r 3 k can accommodate just two variables, but theoretically, those two vari- ables can have any number of categories.
Review Questions The answers to the odd-numbered items are in the answers appendix.
1. A market researcher for an advertising agency wants to determine whether people listen to three local radio stations in similar proportions. The researcher conducts a telephone survey to find out which station people prefer. Over the course of 2 hours, 52 of the respondents indicated preference for one of the three. The data follows:
Station A: 22
Station B: 18
Station C: 12
Are the differences among their preferences statistically significant?
suk85842_11_c11.indd 441 10/23/13 1:45 PM
CHAPTER 11Chapter Exercises
2. What does it mean when an r 3 k problem is statistically significant?
3. Why are significant r 3 k problems followed by either a phi coefficient or Cramér’s V? a. What are the circumstances under which each procedure is used?
4. A counselor working with people who are developmentally disabled has read research relating accurate responses to the type of reinforcement they receive. For 3 weeks, the counselor provides verbal praise every time one of the clients completes a simple task correctly and then totals the number of simple problem-solving tasks that are completed successfully. For the next 3 weeks, the counselor provides a small piece of candy for each successfully completed task. The data follows:
After 3 weeks of verbal praise—17 tasks completed successfully in 1 hour.
After 3 weeks of tangible rewards—27 tasks completed successfully in 1 hour.
Are the differences statistically significant?
5. An armed services psychologist believes that among those who have been in a combat area, ground forces experience posttraumatic stress more frequently than Navy or Air Force personnel. Data from a number of armed services personnel who have been combat areas reveals that
Of 40 Army personnel, 22 have experienced some posttraumatic stress.
Of 30 Air Force personnel, 3 have experience some posttraumatic stress.
Of 30 Navy personnel, 12 have experienced some posttraumatic stress.
a. Are the branch of service and posttraumatic stress related? b. If so, what is the strength of the relationship?
6. A university counselor has a theory that there may be a relationship between employment among students and their grades. Among 20 employed students, 8 have a grade point average above 3.0. Among 15 not-employed students, 12 have a grade point average better than 3.0. Is the counselor correct? If so, what is the strength of the relationship?
7. A university administrator believes that undergraduate students of different majors attend the writing lab in different proportions. Test this with the follow- ing data:
Of 20 English majors, 2 attend the writing lab.
Of 18 engineering majors, 10 attend the lab.
Of 15 history majors, 6 attend the lab.
Do major and lab attendance operate independently?
suk85842_11_c11.indd 442 10/23/13 1:45 PM
CHAPTER 11Chapter Exercises
8. Juvenile offenders in court-ordered treatment can choose between community service activities and a series of group counseling sessions. The therapist believes that they will choose therapy 2 to 1 over community service. Among 35 offenders, 20 opt for therapy. Is the counselor correct?
9. In the effort to properly direct assistance of students, a high school counselor won- ders whether students who opt for college choose one institution with significantly greater frequency than the others. Data is as follows for 70 graduating seniors who are going to college:
37 apply to the local community college.
23 apply to the state university.
10 apply to the private liberal arts college.
Are the differences in institutional preference statistically significant?
10. A car salesperson attempts to determine whether there is a relationship between age and the type of car purchased.
Of 15 people in their 20s, 3 opt for sports cars, 8 for economy cars, and 4 for sedans.
Of 18 people in their 30s, 7 opt for sports cars, 4 for economy cars, and 7 for sedans.
Of 12 people in their 40s, 2 opt for sports cars, 4 for economy cars, and 6 for sedans.
Are customers’ ages and the type of car they select related?
Analyzing the Research Review the article abstract provided below. You can then access the full article via your university’s online library portal to answer the critical thinking questions. Answers can be found in the answers appendix.
Using Chi-square in a Stress Disorder Study
Armour, C., Elklit, A., & Shevlin, M. (2013). The latent structure of acute stress disorder: A posttraumatic stress disorder approach. Psychological Trauma: Theory, Research, Practice, and Policy, 5(1), 18–25.
Article Abstract
Acute stress disorder (ASD) was first included in the Diagnostic and Statistical Manual of Mental Disorders (4th ed.; DSM–IV; American Psychiatric Association, 1994) to account for the psychological symptoms present during the one-month period between trauma expo- sure and a posttraumatic stress disorder (PTSD) diagnosis. The diagnostic criteria sets of both ASD and PTSD are similar; however, ASD includes additional dissociative items. Factor analytic research into ASD is rare, whereas there is a plethora of research on the fac- tor structure of PTSD symptoms. This study tested whether the latent structure of ASD is
suk85842_11_c11.indd 443 10/23/13 1:45 PM
CHAPTER 11Chapter Exercises
similar to the latent structure of PTSD. Five models were tested by using data from Danish rape victims (N 5 380); a unidimensional model, the DSM–IV 4-factor ASD model, a King, Leskin, King, and Weathers (1998) replication model, a Simms, Watson, and Doebbeling (2002) replication model, and a 3-factor model. Model fit was assessed by using a num- ber of fit indices, including the root-mean-square error of approximation, comparative fit index, Tucker-Lewis index, and standardized root-mean-square residual. However, based on the fit indices, 3 models were deemed indistinguishable. Chi-square difference tests concluded that a 3-factor model and two 4-factor models did not differ in fit. Overall, the current 4-factor ASD latent structure proposed by the DSM–IV was not supported. A 3-factor structure was deemed preferential on the basis of parsimony. Furthermore, of all models, the unidimensional model provided the poorest fit to the data. These findings are pertinent given that the DSM-5 ASD task force is considering implementing either a 4-factor conceptualization or a unidimensional approach to the ASD diagnosis.
Critical Thinking Questions 1. What is the Chi-Square value of the King replication four-factor model?
2. What does the article mean when it states the differences in fit between models 2, 4, and 5 were not significant?
3. If the study wanted to figure out the critical value of Model 5 ( x2 5 2.98, df 5 3, p 5 .39), what piece of information do we need from this?
suk85842_11_c11.indd 444 10/23/13 1:45 PM