statistics Project
1
Chapter 15 Nonparametric Statistics Non-parametric statistics or distribution-free statistics are used when the populations from which the samples are selected are not normally distributed. Also, nonparametric statistics can be used to test hypotheses that do not involve specific population parameters. Advantages: a. They can be used to test population parameters when the variable is not normally distributed. b. They can be used when the data is nominal or ordinal. c. They can be used to test hypotheses that do not involve population parameters. d. The computations are easier in most cases than those for the parametric counterparts. e. They are easier to understand. Disadvantages: a. They are less sensitive than their parametric counterparts. Therefore, larger differences are needed before the null hypothesis can be rejected. b. They tend to use less information than their parametric counterparts. c. They are less efficient than their parametric counterparts. Larger samples are needed to overcome the loss of information. Some examples ranking data: Ex) DATA 21 31 34 41 41 61 65 72 RANK 1 2 3 4.5 4.5 6 7 8 Enter all the data into a list, say L1. [STAT] [1] for 1: Edit Sort the data: [STAT] Choose [2] for SortA( [2nd] [L1] [ENTER] View the data: [STAT] [1] for 1: Edit Now do the ranks: ranks 4 & 5 must be shared.
2
Ex) DATA 3 5 5 6 7 8 8 9 12 14 15 17 RANK 1 2.5 2.5 4 5 6.5 6.5 8 9 10 11 12 Enter all the data into a list, say L1. [STAT] [1] for 1: Edit Sort the data: [STAT] Choose [2] for SortA( [2nd] [L1] [ENTER] View the data: [STAT] [1] for 1: Edit Now do the ranks: ranks 2 & 3 and ranks 6 & 7 must be shared.
Ex) DATA 187 190 190 236 321 532 673 RANK 1 2.5 2.5 4 5 6 7 Enter all the data into a list, say L1. [STAT] [1] for 1: Edit Sort the data: [STAT] Choose [2] for SortA( [2nd] [L1] [ENTER] View the data: [STAT] [1] for 1: Edit Now do the ranks: ranks 2 & 3 must be shared.
3
15.1 Sign Test The sign test is the simplest nonparametric test because it uses only + or - signs. The test statistic is the smaller of the + or - signs. Sign Test for a single sample: test the claim of a median for a specific sample Ex) An oceanographer claims that the median height of the waves at Ocean City is 2.8 feet. The wave heights (in feet) are measured for a random sample of 20 days. The data are shown below. At alpha = .05, test the claim. 3.6 2.1 2.3 2.1 2.7 3.2 3.9 3.4 3.0 2.9 2.0 1.9 3.2 3.5 2.8 1.8 2.3 3.7 3.9 4.2 Enter data in L1: [STAT] [1] for 1:Edit Data entered from across rows in text. Generate L2 shown below as an easy way to determine the signs.
+ - - - - + + + + + - - + + 0 - - + + + Claim = H0: Median = 2.8 H1: Median ≠ 2.8 Use Table 8- Critical Values of the Sign Test- (see link on Timeline for Formulas and Tables) alpha= 0.05 2 tails n = 19 C.V. = 4
4
Number of – signs: 8 Number of + signs: 11 height Test statistic = 8 Note: The smaller number of + or – signs is the test statistic. Note: Reject the null hypothesis if the test statistic is less than or equal to the value in the table. Do not reject the null hypothesis. There is not enough evidence to reject the claim. Ho might be true. Ex) A real estate agent claims that the median rent for a one-bedroom apartment in Blue View is $325 per month. A sample of 12 one-bedroom apartments shows these monthly rents (in dollars) below, for a one-bedroom apartment. At alpha = .05, test the claim. 420 460 514 405 320 435 531 450 560 309 312 350 Enter data in L1: [STAT] [1] for 1:Edit Data entered from across rows in text. Generate L2 shown below as an easy way to determine the signs.
+ + + + - + + + + - - + Claim = H0: Median = $325 H1: Median ≠ $325
5
α = 0.05 2 tails n = 12 C.V. = 2 Number of – signs: 3 Number of + signs: 9 Test statistic = 3 Do not reject the null hypothesis. There is not enough evidence to reject the claim. Ho might be true. Paired Sample Sign Test (test sample means of 2 dependent samples, such as before/after) Ex) A study was conducted to test the claim that a certain diet medication had an effect (increase or decrease) on the weights (in pounds) of eight women. Their weights were taken before and six weeks after daily administration of the medication. The data are shown below. At alpha = .05, test the claim. Subject A B C D E F G H Before 187 163 201 158 139 143 198 154 After 178 162 188 156 133 150 175 150 Enter the before data in L1 and the after data in L2. Generate L3 as shown below to determine the signs
A B C D E F G H + + + + + - + + H0: The medication has no effect on weight loss. Claim = H1: The medication affects weight loss.
6
α = 0.05 2 tails n = 8 C.V. = 0 Number of – signs: 1 Number of + signs: 7 Test statistic = 1 Do not reject the null hypothesis. There is not enough evidence to support the claim. Ho might be true. Ex) An educator designed a reasoning skills course. Nine students were selected and given a pretest to determine their reasoning abilities. After completing the course, the same students were given an equivalent form of the test to see whether their reasoning skills had improved. The data are shown below. At alpha = .05, test the claim that the course improved their reasoning skills. Student 1 2 3 4 5 6 7 8 9 Pretest 80 76 74 83 92 78 91 74 88 Posttest 82 78 73 85 95 79 93 78 90 Enter the Pretest data in L1 and the Posttest data in L2. Generate L3 as shown below to determine the signs.
1 2 3 4 5 6 7 8 9 - - + - - - - - - H0: Reasoning ability will not be affected by the course. Claim = H1: Reasoning ability will be increased after the course. α = 0.05 one-tailed n = 9 C.V. = 1
7
Number of – signs: 8 Number of + signs: 1 Test statistic = 1 Reject the null hypothesis. There is enough evidence to support the claim. H1 is true. Ex) To test at theory that alcohol consumption can have an effect on test scores, a researcher conducts a study on 10 adults. Each is given a test. Then for one week, each subject is required to consume a certain amount of alcohol; then he or she is retested. The results are shown below At alpha = .10, test the claim that alcohol does not affect a person’s test score. Subject Score before Score after 1 105 106 2 109 105 3 98 94 4 112 109 5 109 105 6 117 115 7 123 125 8 114 114 9 95 98 10 101 100 Enter the Score before in L1 and the Score after in L2. Generate L3 as shown below to determine the signs.
1 2 3 4 5 6 7 8 9 10 - + + + + + - 0 - +
8
Claim = H0: Alcohol has no effect on a person's I.Q. test score. H1: Alcohol does affect a person's I.Q. test score. α = 0.10 2 tails n = 9 C.V. = 1 Number of – signs: 3 Number of + signs: 6 Test statistic = 3 Do not reject the null hypothesis. There is not enough evidence to reject the claim. Ho might be true. 15.2&15.3 The Wilcoxon Signed-Rank & Wilcoxon Rank-Sum Tests Wilcoxon Rank-Sum Test (2 independent samples) Ex) A researcher surveyed married women and single women to ascertain whether there was a difference in the number of books each had read during the past year. The data are shown below. At alpha = .10, test the claim that each group read the same number of books. Married 6 8 7 4 9 12 13 7 10 18 15 Single 2 3 5 11 3 5 11 12 16 4 0 1 Claim = H0: There is no difference in the number of books read by each group. H1: There is a difference in the number of books read by each group. C.V. Z = ± 1.64 invNorm(.05)= -1.64 invNorm(1-.05) = 1.64 Note: To make doing the ranks easier, one may enter all the data into one list, say L1 [STAT] [1] for 1: Edit Then sort the data: [STAT] [2] for 2:SortA( L1) [ENTER] To view the data: [STAT] [1] for 1:Edit
0 1 2 3 3 4 4 5 5 1 2 3 4.5 4.5 6.5 6.5 8.5 8.5 S S S S S M S S S 6 7 7 8 9 10 11 11 12 10 11.5 11.5 13 14 15 16.5 16.5 18.5 M M M M M M S S M
9
12 13 15 16 18 18.5 20 21 22 23 S M M S M
R married = 164 Note: R = sum of the ranks for the smaller sample n1; n1 = smaller of the sample sizes. If the sample sizes are the same, then either one may be used for n1.
Reject the null hypothesis. There is enough evidence to reject the claim. H1is true. Ex) Over the past 12 years, a statistician kept track of the total number of academic scholarships awarded to Valley View High School seniors and seniors at their rival school, Ocean View High School. The data are shown below. At alpha = .05, test the claim that there is a difference in the number of academic scholarships awarded to seniors at the schools. Valley View
4 4 1 8 7 9 3 7 4 6 11 10
Ocean View
4 5 2 7 8 6 3 9 3 5 6 12
H0: There is no difference in the number of scholarships for the two high schools. Claim = H1: There is a difference between the numbers of scholarships for the two schools.
1 1 2( 1) 2R
n n n µ
+ + =
1 2 1 2( 1) 12R
nn n n s
+ + =
R
R
R Z
µ s -
=
132 16.25 1.97
R
R
Z
µ s
= = =
10
C.V. Z = ± 1.96 invNorm(.025)= -1.96 invNorm(1-.025) = 1.96
1 2 3 3 3 4 4 4 4 5 5 6 1 2 4 4 4 7.5 7.5 7.5 7.5 10.5 10.5 13 V O V O O V V V O O O V 6 6 7 7 7 8 8 9 9 10 11 12 13 13 16 16 16 18.5 18.5 20.5 20.5 22 23 24 O O V V O V O V O V V O R = 156.5 for Valley View High School R = 143.5 for Ocean View High School
Do not reject the null hypothesis. There is not enough evidence to support the claim. Ho might be true.
1 1 2( 1) 2R
n n n µ
+ + =
1 2 1 2( 1) 12R
nn n n s
+ + =
R
R
R Z
µ s -
=
150 17.32 .38
R
R
Z
µ s
= = =
11
The Wilcoxon Signed-Rank Test (2 dependent samples) Ex) Eight students were given a pretest to measure their public speaking anxiety. They completed a workshop to reduce their anxiety and were then given a posttest. At alpha = .05, test the claim that the workshop reduced anxiety. The pretest and posttest scores are below. Pretest 23 26 30 31 39 23 28 27 Posttest 22 29 27 29 33 21 25 28 H0: The workshop did not reduce anxiety. Claim = H1: The workshop reduced anxiety. Note: To help find the signed ranks, one may do the following: Enter the Before data (B) in L1 and the After data (A) in L2. Generate L3 = L1 –L2 Generate L4 = abs(L3) To get the abs function: [MATH] > NUM choose [1] for 1:abs
12
B A B – A |B – A| Rank Signed
Rank 23 22 1 1 1.5 1.5 26 29 -3 3 6 -6 30 27 3 3 6 6 31 29 2 2 3.5 3.5 39 33 6 6 8 8 23 21 2 2 3.5 3.5 28 25 3 3 6 6 27 28 -1 1 1.5 -1.5 Sum of the – ranks: (-6) + (-1.5) = - 7.5 Sum of the + ranks: 1.5 + 6 + 3.5 + 8 + 3.5 + 6 = 28.5 TS ws = 7.5 n = 8 = .05 one tailed C.V. = 6 Note: n = number of pairs where the difference is not 0. Test statistic Ws = smaller sum in absolute value of signed ranks. Note: reject Ho if test statistic Ws is less than or equal to the value in the table. Since 7.5 > 6, do not reject the null hypothesis. There is not enough evidence to support the claim. Ho might be true. Ex) Using the data about police forces, test the claim that police forces have become larger for the 10-year period in the sample of cities shown in the data below. Use alpha = .05.
1983 23,339 6,886 12,353 3,716 7,218 1,376 3,808 2,084 1,635 1,159 1993 29,327 7,637 12,093 4,734 6,225 1,861 3,860 2,807 1,978 1,662
H0: The sizes of police forces have decreased or remained the same. Claim = H1: The sizes of the police forces have increased. Note: To help find the signed ranks, one may do the following: Enter the Before data (B) in L1 and the After data (A) in L2. Generate L3 = L1 –L2 Generate L4 = abs(L3) To get the abs function: [MATH] > NUM choose [1] for 1:abs
a
13
B A B– A |B – A| Rank Signed
Rank 23,339 29,327 -
5988 5988 10 -10
6886 7637 -751 751 7 -7 12,353 12,093 260 260 2 2 3716 4734 -
1018 1018 9 -9
7218 6225 993 993 8 8 1376 1861 -485 485 4 -4 3808 3860 -52 52 1 -1 2084 2807 -723 723 6 -6 1635 1978 -343 343 3 -3 1159 1662 -503 503 5 -5 Sum of the – ranks: (-10) + (-7) + (-9) + (-4) + (-1) + (-6) + (-3) + (-5) = -45 Sum of the + ranks: 2 + 8 = 10 TS ws = 10 n = 10 α = 0.05 one tailed C.V. = 11 Since 10 ≤ 11, reject the null hypothesis. There is enough evidence to support the claim. H1 is true. Ex) Test the claim at alpha = .10 that there is no change in the numbers for the 1995 lacrosse rosters compared to the 1994 rosters. The data are shown below. 1995 33 37 46 34 45 58 46 23 35 22 45 21 42 1994 33 29 42 32 28 55 46 26 17 30 45 21 42 Claim = H0: There is no change in the size of the lacrosse rosters. H1: The size of the lacrosse rosters has changed. *1994 data (before) and 1995 data (after) Note: To help find the signed ranks, one may do the following: Enter the Before data (B) in L1 and the After data (A) in L2. Generate L3 = L1 –L2 Generate L4 = abs(L3) To get the abs function: [MATH] > NUM choose [1] for 1:abs
14
B A B – A |B – A| Rank Signed Rank
3 3
3 3
0 0
2 9
3 7
-8 8 5.5 -5.5
4 2
4 6
-4 4 4 -4
3 2
3 4
-2 2 1 -1
2 8
4 5
-17 17 7 -7
5 5
5 8
-3 3 2.5 -2.5
4 6
4 6
0 0
2 6
2 3
3 3 2.5 2.5
1 7
3 5
-18 18 8 -8
3 0
2 2
8 8 5.5 5.5
4 5
4 5
0 0
2 1
2 1
0 0
4 2
4 2
0 0
Sum of the + ranks: 2.5 + 5.5 = 8 Sum of the – ranks: (-5.5) + (-4) + (-1) +(-7) + (-2.5) + (-8) = -28 TS ws = 8
n = 8 α = 0.10 two tailed C.V. = 6
Since 8 > 6, do not reject the null hypothesis. There is not enough evidence to reject the claim. Ho might be true.
15
The Kruskal-Wallis Test (not in the Navidi textbook) Ex) Samples of four different cereals show the following number of calories for the suggested serving of each brand. At alpha = .05, test the claim that there is a difference in the number of calories for the different brands.
A B C D 112 110 109 106 120 118 116 122 135 123 125 130 125 128 130 117 108 102 128 116 121 101 132 114
-------------------------------------------------------- H0: There is no difference in the number of calories each brand contains. Claim =H1: There is a difference in the number of calories each brand contains. C.V. = 7.815 α = 0.05 d.f. = 3 Please see Table 6-Chi-Square Distribution from the Tables and Formulas link on the WebStudy Timeline Note: df = the number of categories – 1 The Chi-Square Distribution is used only as a right tailed test. Note: To help doing the ranks, Enter all data in L1: [STAT] [1] for 1:Edit [STAT] [2] for 2:SortA( L1) [ENTER] To view data: [STAT] [1] for 1:Edit
A Rank B Rank C Rank D Rank 112 7 110 6 109 5 106 3 120 13 118 12 116 9.5 122 15 135 24 123 16 125 17.5 130 21.5 125 17.5 128 19.5 130 21.5 117 11 108 4 102 2 128 19.5 116 9.5 121 14 101 1 132 23 114 8
R1 = 79.5 R2 = 56.5 R3 = 96 R4 = 68
2c
22 2 31 2
1 2 3
12 3( 1)
( ( 1)) RR R
H N N N n n n
æ ö = + + - +ç ÷
+ è ø
16
TS H = 2.842 Do not reject the null hypothesis. There is not enough evidence to support the claim Ho might be true. Ex) A large grocery chain decides to advertise a product by three different methods (one method in each area): radio, television, and newspaper. One week’s sales (in dollars) from randomly selected stores in each area are recorded below. At alpha = .10, test the claim that there is a difference in sales for the different types of advertising. Radio TV Paper 832 1024 329 648 996 437 562 1011 561 786 853 329 452 471 382 975 495 262
------------------------------ H0: There is no difference in the sales of the stores. H1: There is a difference in sales of the stores. (claim) C.V. = 4.605 d.f. = 2 α = 0.10 Radio Rank TV Rank Paper Rank 832 13 1024 18 329 2.5 648 11 996 16 437 5 562 10 1011 17 561 9 786 12 853 14 329 2.5 452 6 471 7 382 4 975 15 495 8 262 1
R1 = 67 R2 = 72 R3 =32
2c
17
TS H = 10.8 Reject the null hypothesis. There is enough evidence to support the claim. H1 is true. Ex) In a large city, the number of crimes per week in five precincts is recorded for five weeks. The data are shown below. At alpha = .01, test the claim that there is a difference in the number of crimes. P1 P2 P3 105 87 74 108 86 83 99 91 78 97 93 74 92 82 60 P4 P5 56 103 43 98 52 94 ------------------------------------------------------ H0: There is no difference in the number of crimes in the five precincts. Claim = H1: There is a difference in the number of crimes in the five precincts. C.V. = 13.277 d.f. = 4 α = 0.01 P1 Rank P2 Rank P3 Rank 105 24 87 13 74 7.5 108 25 86 12 83 11 99 22 91 16 78 9 97 20 93 18 74 7.5 92 17 82 10 60 5 R1= 108 R2= 69 R3= 40
22 2 31 2
1 2 3
12 3( 1)
( ( 1)) RR R
H N N N n n n
æ ö = + + - +ç ÷
+ è ø
2c
18
P4 Rank P5 Rank 56 3 103 23 43 1 98 21 52 2 94 19 58 4 89 15 62 6 88 14 R4= 16 R5= 92
TS H = 20.753 Reject the null hypothesis. There is enough evidence to support the claim. H1 is true.
22 2 31 2
1 2 3
12 3( 1)
( ( 1)) RR R
H N N N n n n
æ ö = + + - +ç ÷
+ è ø