LAB 6 Exploring Independence and Probabilities
ChiSq Independence Test
| Observed | Promoted | Not Promoted | Totals | Theoretical | Promoted | Not Promoted |
| Male | 21 | 3 | 24 | Male | 17.5 | 6.5 |
| Female | 14 | 10 | 24 | Female | 17.5 | 6.5 |
| Totals | 35 | 13 | 48 | |||
| 5.1692307692 | Formula is entered for you, the sum of the ratios of squared differences to expected values: number sense discussion is below. | |||||
| 3.8414588207 | Compare to critical value, 5% chance of being this value or greater if null hypothesis is true: =CHISQ.INV.RT(0.05,1) The first value is 5% in decimal form and the 1 is degrees of freedom. Because the test value is greater than this critical value, the evidence is strong enough to reject independence of gender and promotion. The value computed would only happen less than 5% of the time if the null hypothesis conditions are tue. Promotion and gender are related variables, the probability of promotion is dependent on gender, that is the conditional probability rules as discussed in Chapter 2 pages 90-95, especially 2.2.5 Independence considerations section. | |||||
| This tests if the row and column variables are independent. It is equivalent to a test of equal proportions as the authors discuss their logical approach, although requirements for this are stricter in that no expected count can be less than 5, for independence test no expected (theoretical) count can be less than 1. | ||||||
| Number sense: The formula used is c2 = (observed frequency - expected frequency)2 /expected frequency. The question is, how different do they need to be for us to say that the row and column variables (in this case, gender and promotion) are not independent (reject independence)? For this example we use the 5% rule for identifying rare events, if our observed frequencies would be a rare event, when we compare to the expected frequencies computed based on the null hypothesis being true (independent variables) then we instead conclude that the null hypothesis in not likely true and we reject the null hypothesis (reject independence and generally for this type of test state that the two variables are dependent or not independent). | ||||||
| Degrees of freedom: Several distributions that you will encounter in statistics have what is called degrees of freedom associated with them. Without getting into details now, the formula for computing degrees of freedom for the Chi square test for independence is number of row levels minus one times number of column levels minus 1, so for our 2 by 2 case here it is simply 1 times 1 equals 1. | ||||||
The textbook simulates a set of values that adhere to the theoretical model as a test comparison. Hypothesis testing is explained in detail later, refer to page 181 for a look at defining the hypotheses. For this example in Chapter 1, the general logic of inference and hypothesis testing is introduced (perhaps a little early). Chapter 1 uses a simulation to show the logical reasoning. Now in Chapter 6 this is formally covered as a hypothesis test. Here is the formal test in Excel. This is set up to be used as a template. You type in the 4 values for the count summary in the table labeled "Observed" and then the formulas set up the theoretical values for comparison, just like the authors have done with the simulated values. The little notes in the function menu box tell you what to do for the theoretical values, but formulas have been entered for you here. This can be done for any number of rows and columns if there are more levels of the categorical variable. For the Chi Square c^2 (for Greek letter Chi c use symbol font or insert symbol) test for independence, the hypotheses are set up as on page 51 with independence being the null hypothesis or assumed true condition to be tested.
Gender and Promotion
| Observed | Promoted | Not Promoted | Totals | Theoretical | Promoted | Not Promoted |
| Male | 21 | 3 | 24 | Male | 17.5 | 6.5 |
| Female | 14 | 10 | 24 | Female | 17.5 | 6.5 |
| Totals | 35 | 13 | 48 | |||
| Using the built-in function for the Chi Squared test the process looks like this. | ||||||
| 0.0229903941 | Returns the p value associated with the Chi squared test statistic: =CHISQ.TEST(B2:C3,F2:G3) | |||||
| 5.1692307692 | Interpret p value: the probability of getting a result at least this extreme by chance alone, if the row and column variables are independent. Such extreme results are expected only 2.3% of the time by chance. This is considered a rare event, instead of assuming that a rare event occurred we reject independence and state that our data show evidence that promotion and gender are dependent, that is there is a significantly greater proportion of men that are promoted: proportion of male promotions is significantly greater than for females. | |||||
| Test Stat =CHISQ.INV.RT(A8,1) This matches our previous sheet. |
Second Example
| Observed | Left Handed | Right Handed | Totals | Theoretical | Left Handed | Right Handed |
| Female | 12 | 108 | 120 | Female | 14.4 | 105.6 |
| Male | 24 | 156 | 180 | Male | 21.6 | 158.4 |
| Totals | 36 | 264 | 300 | |||
| 0.7575757576 | Sum of ratio of squared differences to expected values | ] | ||||
| 3.8414588207 | Compare to critical value, 5% chance of being this value or greater if null hypothesis is true: =CHISQ.INV.RT(0.05,1) The first value is 5% in decimal form and the 1 is degrees of freedom. Because the test value is not greater than this critical value, the evidence is not strong enough to reject independence of gender and handedness. |
Template to Copy
| Observed | C1 | C2 | Totals | Theoretical | C1 | C2 |
| R1 | 0 | R1 | ERROR:#DIV/0! | ERROR:#DIV/0! | ||
| R2 | 0 | R2 | ERROR:#DIV/0! | ERROR:#DIV/0! | ||
| Totals | 0 | 0 | 0 | |||
| ERROR:#DIV/0! | Sum of ratio of squared differences to expected values | |||||
| 3.8414588207 | Compare to critical value, 5% chance of being this value or greater if null hypothesis is true: =CHISQ.INV.RT(0.05,1) The first value is 5% in decimal form and the 1 is degrees of freedom. |
What to do: Right click the name tab at the bottom of this sheet and select the option to copy the sheet. Create a copy to use for each problem. Be sure to type name for the tab. Enter the four frequency counts into the blue cells and edit the row and column labels.
SmallPox Example
| Observed | Inoculated Yes | Inoculated No | Totals | Theoretical | Inoculated Yes | Inoculated No |
| Lived | 238 | 5136 | 5374 | Lived | 210.677377892 | 5163.322622108 |
| Died | 6 | 844 | 850 | Died | 33.322622108 | 816.677377892 |
| Totals | 244 | 5980 | 6224 | |||
| 27.0051071566 | Sum of ratio of squared differences to expected values | |||||
| 3.8414588207 | Compare to critical value, 5% chance of being this value or greater if null hypothesis is true: =CHISQ.INV.RT(0.05,1) The first value is 5% in decimal form and the 1 is degrees of freedom. |
Refer to the smallpox data discussion on pages 92-96. The four values from table 2.15 have been inserted in the blue shaded cells, labels were updated and the formulas were left the same. Because the test value is greater than the critical value (27.01 > 3.84), the evidence is strong enough to reject independence of inoculation and result (surviving smallpox exposure). The inoculation and survival outcome are dependent, thus conditional probabilities apply and a tree diagram is useful for organizing the data. It would be ok to say the survival proportions are different for the two groups, inoculated and not inoculated.
On Own
Lab 6: Exploring Independence and Probabilities
#2
text
text
text
text
#2
text
text
text
text
On Your Own
| On Your Own |
| In the previous few guided Excel activities, you explored independence. Your assignment involves further analysis of whether two variables represent independent processes or not. You will refer back to the data from Lab 1 (download the data and open the CSV (comma separated values) file in Excel if desire to start fresh |
| http://www.openintro.org/stat/data/?data=cdc) |
| Step one should always be to save your lab with your name and lab number in the folder that you have created on your computer or memory device, cloud, etc. Choose the "save as" option and select to save these as Excel workbooks. |
| 1. Use the gender and smoke100 pivot table (contingency table) and conduct the test for independence. Are gender and having smoked at least 100 cigarettes independent? This test requires all expected values to be at least 1, are there any cells in the theoretical table that are zero? Are all the theoretical values at least 5? If so you can make a statement about the proportions as your authors do with the gender and promotion example at the end of Chapter 1. Answer these questions using sentences and include a statement about the decision to reject or not reject the null hypothesis of independence. |
| 2. Use the gender and smoke100 pivot table (contingency table) and create a tree diagram using gender as the primary branch. Click into each segment and enter brief titles and values, refer to examples in book. Tip: This template uses the Word SmartArt Graphic, Horizontal Hierarchy under the Insert menu. |
| 3. What other variables could be reviewed for independence in this manner, the 2 rows and 2 columns case? Conduct one such analysis, state the null and alternate hypotheses and include the Excel tab copied from the template provided. State the decision and write the statement of what this decision means: if you reject or do not reject and then what this says about the two variables. Several sentences are expected for this answer along with the completed Excel sheet. |
| 4. Sometimes categories are combined in a process of data reduction to be able to apply specific analysis that would be meaningful. Can you think of any ways to collapse the BMI data into two levels for analysis? Select one other possible reduction that would lead to a meaningful analysis of independence or equal proportions. This will require several thoughtful sentences for your answer. |
| This Excel lab accompanies OpenIntro Statistics, 3rd. edition and was developed by Linda Currie for Excel 2016. |
Lab 6: Exploring Independence and Probabilities
#2
text
text
text
text
#2
text
text
text
text
#2
text
text
text
text
On Your Own
In the previous few guided Excel activities, you explored independence. Your assignment involves further analysis of whether two variables represent independent processes or not. You will refer back to the data from Lab 1 (download the data and open the CSV (comma separated values) file in Excel if desire to start fresh
http://www.openintro.org/stat/data/?data=cdc)
Step one should always be to save your lab with your name and lab number in the folder that you have created on your computer or memory device, cloud, etc. Choose the "save as" option and select to save these as Excel workbooks.
1. Use the gender and smoke100 pivot table (contingency table) and conduct the test for independence. Are gender and having smoked at least 100 cigarettes independent? This test requires all expected values to be at least 1, are there any cells in the theoretical table that are zero? Are all the theoretical values at least 5? If so you can make a statement about the proportions as your authors do with the gender and promotion example at the end of Chapter 1. Answer these questions using sentences and include a statement about the decision to reject or not reject the null hypothesis of independence.
2. Use the gender and smoke100 pivot table (contingency table) and create a tree diagram using gender as the primary branch. Click into each segment and enter brief titles and values, refer to examples in book. Tip: This template uses the Word SmartArt Graphic, Horizontal Hierarchy under the Insert menu.
3. What other variables could be reviewed for independence in this manner, the 2 rows and 2 columns case? Conduct one such analysis, state the null and alternate hypotheses and include the Excel tab copied from the template provided. State the decision and write the statement of what this decision means: if you reject or do not reject and then what this says about the two variables. Several sentences are expected for this answer along with the completed Excel sheet.
4. Sometimes categories are combined in a process of data reduction to be able to apply specific analysis that would be meaningful. Can you think of any ways to collapse the BMI data into two levels for analysis? Select one other possible reduction that would lead to a meaningful analysis of independence or equal proportions. This will require several thoughtful sentences for your answer.
This Excel lab accompanies OpenIntro Statistics, 3rd. edition and was developed by Linda Currie for Excel 2016.
On Your Own
In the previous few guided Excel activities, you explored independence. Your assignment involves further analysis
of whether two variables represent independent processes or not. You will refer back to the data from Lab 1
(download the data and open the CSV (comma separated values) file in Excel if desire to start fresh
http://www.openintro.org/stat/data/?data=cdc )
Step one should always be to save your lab with your name and lab n umber in the folder that you have created on
your computer or memory device, cloud, etc. Choose the "save as" option and select to save these as Excel
workbooks.
1. Use the gender and smoke100 pivot table (contingency table) and conduct the test for indepen dence. Are
gender and having smoked at least 100 cigarettes independent? This test requires all expected values to be
at least 1, are there any cells in the theoretical table that are zero? Are all the theoretical values at least 5? If
so you can make a statement about the proportions as your authors do with the gender and promotion
example at the end of Chapter 1. Answer these questions using sentences and include a statement about the
decision to reject or not reject the null hypothesis of independence.
2. Use the gender and smoke100 pivot table (contingency table) and create a tree diagram using gender as the
primary branch. Click into each segment and enter brief titles and values, refer to examples in book. Tip:
This template uses the Word SmartArt Graphic , Horizontal Hierarchy under the Insert menu.
3. What other variables could be reviewed for independence in this manner, the 2 rows and 2 columns case?
Conduct one such analysis, state the null and alternate hypotheses and include the Excel tab copied from
the template provided. State the decision and write the statement of what this decision means: if you reject
or do not reject and then what this says about the two variables. Several sentences are expected for this
answer along with the completed Exce l sheet.
4. Sometimes categories are combined in a process of data reduction to be able to apply specific analysis that
would be meaningful. Can you think of any ways to collapse the BMI data into two levels for analysis?
Select one other possible reduction t hat would lead to a meaningful analysis of independence or equal
proportions. This will require several thoughtful sentences for your answer.
This Excel lab accompanies OpenIntro Statistics, 3rd. edition and was developed by Linda Currie for Excel 2016.