LAB 6 Exploring Independence and Probabilities

profileFinishing BS
Lab6ExploringIndependenceandProbabilities.xlsx

ChiSq Independence Test

Observed Promoted Not Promoted Totals Theoretical Promoted Not Promoted
Male 21 3 24 Male 17.5 6.5
Female 14 10 24 Female 17.5 6.5
Totals 35 13 48
5.1692307692 Formula is entered for you, the sum of the ratios of squared differences to expected values: number sense discussion is below.
3.8414588207 Compare to critical value, 5% chance of being this value or greater if null hypothesis is true: =CHISQ.INV.RT(0.05,1) The first value is 5% in decimal form and the 1 is degrees of freedom. Because the test value is greater than this critical value, the evidence is strong enough to reject independence of gender and promotion. The value computed would only happen less than 5% of the time if the null hypothesis conditions are tue. Promotion and gender are related variables, the probability of promotion is dependent on gender, that is the conditional probability rules as discussed in Chapter 2 pages 90-95, especially 2.2.5 Independence considerations section.
This tests if the row and column variables are independent. It is equivalent to a test of equal proportions as the authors discuss their logical approach, although requirements for this are stricter in that no expected count can be less than 5, for independence test no expected (theoretical) count can be less than 1.
Number sense: The formula used is c2 = (observed frequency - expected frequency)2 /expected frequency. The question is, how different do they need to be for us to say that the row and column variables (in this case, gender and promotion) are not independent (reject independence)? For this example we use the 5% rule for identifying rare events, if our observed frequencies would be a rare event, when we compare to the expected frequencies computed based on the null hypothesis being true (independent variables) then we instead conclude that the null hypothesis in not likely true and we reject the null hypothesis (reject independence and generally for this type of test state that the two variables are dependent or not independent).
Degrees of freedom: Several distributions that you will encounter in statistics have what is called degrees of freedom associated with them. Without getting into details now, the formula for computing degrees of freedom for the Chi square test for independence is number of row levels minus one times number of column levels minus 1, so for our 2 by 2 case here it is simply 1 times 1 equals 1.

The textbook simulates a set of values that adhere to the theoretical model as a test comparison. Hypothesis testing is explained in detail later, refer to page 181 for a look at defining the hypotheses. For this example in Chapter 1, the general logic of inference and hypothesis testing is introduced (perhaps a little early). Chapter 1 uses a simulation to show the logical reasoning. Now in Chapter 6 this is formally covered as a hypothesis test. Here is the formal test in Excel. This is set up to be used as a template. You type in the 4 values for the count summary in the table labeled "Observed" and then the formulas set up the theoretical values for comparison, just like the authors have done with the simulated values. The little notes in the function menu box tell you what to do for the theoretical values, but formulas have been entered for you here. This can be done for any number of rows and columns if there are more levels of the categorical variable. For the Chi Square c^2 (for Greek letter Chi c use symbol font or insert symbol) test for independence, the hypotheses are set up as on page 51 with independence being the null hypothesis or assumed true condition to be tested.

Gender and Promotion

Observed Promoted Not Promoted Totals Theoretical Promoted Not Promoted
Male 21 3 24 Male 17.5 6.5
Female 14 10 24 Female 17.5 6.5
Totals 35 13 48
Using the built-in function for the Chi Squared test the process looks like this.
0.0229903941 Returns the p value associated with the Chi squared test statistic: =CHISQ.TEST(B2:C3,F2:G3)
5.1692307692 Interpret p value: the probability of getting a result at least this extreme by chance alone, if the row and column variables are independent. Such extreme results are expected only 2.3% of the time by chance. This is considered a rare event, instead of assuming that a rare event occurred we reject independence and state that our data show evidence that promotion and gender are dependent, that is there is a significantly greater proportion of men that are promoted: proportion of male promotions is significantly greater than for females.
Test Stat =CHISQ.INV.RT(A8,1) This matches our previous sheet.

Second Example

Observed Left Handed Right Handed Totals Theoretical Left Handed Right Handed
Female 12 108 120 Female 14.4 105.6
Male 24 156 180 Male 21.6 158.4
Totals 36 264 300
0.7575757576 Sum of ratio of squared differences to expected values ]
3.8414588207 Compare to critical value, 5% chance of being this value or greater if null hypothesis is true: =CHISQ.INV.RT(0.05,1) The first value is 5% in decimal form and the 1 is degrees of freedom. Because the test value is not greater than this critical value, the evidence is not strong enough to reject independence of gender and handedness.

Template to Copy

Observed C1 C2 Totals Theoretical C1 C2
R1 0 R1 ERROR:#DIV/0! ERROR:#DIV/0!
R2 0 R2 ERROR:#DIV/0! ERROR:#DIV/0!
Totals 0 0 0
ERROR:#DIV/0! Sum of ratio of squared differences to expected values
3.8414588207 Compare to critical value, 5% chance of being this value or greater if null hypothesis is true: =CHISQ.INV.RT(0.05,1) The first value is 5% in decimal form and the 1 is degrees of freedom.

What to do: Right click the name tab at the bottom of this sheet and select the option to copy the sheet. Create a copy to use for each problem. Be sure to type name for the tab. Enter the four frequency counts into the blue cells and edit the row and column labels.

SmallPox Example

Observed Inoculated Yes Inoculated No Totals Theoretical Inoculated Yes Inoculated No
Lived 238 5136 5374 Lived 210.677377892 5163.322622108
Died 6 844 850 Died 33.322622108 816.677377892
Totals 244 5980 6224
27.0051071566 Sum of ratio of squared differences to expected values
3.8414588207 Compare to critical value, 5% chance of being this value or greater if null hypothesis is true: =CHISQ.INV.RT(0.05,1) The first value is 5% in decimal form and the 1 is degrees of freedom.

Refer to the smallpox data discussion on pages 92-96. The four values from table 2.15 have been inserted in the blue shaded cells, labels were updated and the formulas were left the same. Because the test value is greater than the critical value (27.01 > 3.84), the evidence is strong enough to reject independence of inoculation and result (surviving smallpox exposure). The inoculation and survival outcome are dependent, thus conditional probabilities apply and a tree diagram is useful for organizing the data. It would be ok to say the survival proportions are different for the two groups, inoculated and not inoculated.

On Own

Lab 6: Exploring Independence and Probabilities

#2

text

text

text

text

#2

text

text

text

text

On Your Own

On Your Own
In the previous few guided Excel activities, you explored independence. Your assignment involves further analysis of whether two variables represent independent processes or not. You will refer back to the data from Lab 1 (download the data and open the CSV (comma separated values) file in Excel if desire to start fresh
http://www.openintro.org/stat/data/?data=cdc)
Step one should always be to save your lab with your name and lab number in the folder that you have created on your computer or memory device, cloud, etc. Choose the "save as" option and select to save these as Excel workbooks.
1. Use the gender and smoke100 pivot table (contingency table) and conduct the test for independence. Are gender and having smoked at least 100 cigarettes independent? This test requires all expected values to be at least 1, are there any cells in the theoretical table that are zero? Are all the theoretical values at least 5? If so you can make a statement about the proportions as your authors do with the gender and promotion example at the end of Chapter 1. Answer these questions using sentences and include a statement about the decision to reject or not reject the null hypothesis of independence.
2. Use the gender and smoke100 pivot table (contingency table) and create a tree diagram using gender as the primary branch. Click into each segment and enter brief titles and values, refer to examples in book. Tip: This template uses the Word SmartArt Graphic, Horizontal Hierarchy under the Insert menu.
3. What other variables could be reviewed for independence in this manner, the 2 rows and 2 columns case? Conduct one such analysis, state the null and alternate hypotheses and include the Excel tab copied from the template provided. State the decision and write the statement of what this decision means: if you reject or do not reject and then what this says about the two variables. Several sentences are expected for this answer along with the completed Excel sheet.
4. Sometimes categories are combined in a process of data reduction to be able to apply specific analysis that would be meaningful. Can you think of any ways to collapse the BMI data into two levels for analysis? Select one other possible reduction that would lead to a meaningful analysis of independence or equal proportions. This will require several thoughtful sentences for your answer.
This Excel lab accompanies OpenIntro Statistics, 3rd. edition and was developed by Linda Currie for Excel 2016.

Lab 6: Exploring Independence and Probabilities

#2

text

text

text

text

#2

text

text

text

text

#2

text

text

text

text

http://www.openintro.org/stat/data/?data=cdc

On Your Own

In the previous few guided Excel activities, you explored independence. Your assignment involves further analysis of whether two variables represent independent processes or not. You will refer back to the data from Lab 1 (download the data and open the CSV (comma separated values) file in Excel if desire to start fresh

http://www.openintro.org/stat/data/?data=cdc)

Step one should always be to save your lab with your name and lab number in the folder that you have created on your computer or memory device, cloud, etc. Choose the "save as" option and select to save these as Excel workbooks.

1. Use the gender and smoke100 pivot table (contingency table) and conduct the test for independence. Are gender and having smoked at least 100 cigarettes independent? This test requires all expected values to be at least 1, are there any cells in the theoretical table that are zero? Are all the theoretical values at least 5? If so you can make a statement about the proportions as your authors do with the gender and promotion example at the end of Chapter 1. Answer these questions using sentences and include a statement about the decision to reject or not reject the null hypothesis of independence.

2. Use the gender and smoke100 pivot table (contingency table) and create a tree diagram using gender as the primary branch. Click into each segment and enter brief titles and values, refer to examples in book. Tip: This template uses the Word SmartArt Graphic, Horizontal Hierarchy under the Insert menu.

3. What other variables could be reviewed for independence in this manner, the 2 rows and 2 columns case? Conduct one such analysis, state the null and alternate hypotheses and include the Excel tab copied from the template provided. State the decision and write the statement of what this decision means: if you reject or do not reject and then what this says about the two variables. Several sentences are expected for this answer along with the completed Excel sheet.

4. Sometimes categories are combined in a process of data reduction to be able to apply specific analysis that would be meaningful. Can you think of any ways to collapse the BMI data into two levels for analysis? Select one other possible reduction that would lead to a meaningful analysis of independence or equal proportions. This will require several thoughtful sentences for your answer.

This Excel lab accompanies OpenIntro Statistics, 3rd. edition and was developed by Linda Currie for Excel 2016.

On Your Own

In the previous few guided Excel activities, you explored independence. Your assignment involves further analysis

of whether two variables represent independent processes or not. You will refer back to the data from Lab 1

(download the data and open the CSV (comma separated values) file in Excel if desire to start fresh

http://www.openintro.org/stat/data/?data=cdc )

Step one should always be to save your lab with your name and lab n umber in the folder that you have created on

your computer or memory device, cloud, etc. Choose the "save as" option and select to save these as Excel

workbooks.

1. Use the gender and smoke100 pivot table (contingency table) and conduct the test for indepen dence. Are

gender and having smoked at least 100 cigarettes independent? This test requires all expected values to be

at least 1, are there any cells in the theoretical table that are zero? Are all the theoretical values at least 5? If

so you can make a statement about the proportions as your authors do with the gender and promotion

example at the end of Chapter 1. Answer these questions using sentences and include a statement about the

decision to reject or not reject the null hypothesis of independence.

2. Use the gender and smoke100 pivot table (contingency table) and create a tree diagram using gender as the

primary branch. Click into each segment and enter brief titles and values, refer to examples in book. Tip:

This template uses the Word SmartArt Graphic , Horizontal Hierarchy under the Insert menu.

3. What other variables could be reviewed for independence in this manner, the 2 rows and 2 columns case?

Conduct one such analysis, state the null and alternate hypotheses and include the Excel tab copied from

the template provided. State the decision and write the statement of what this decision means: if you reject

or do not reject and then what this says about the two variables. Several sentences are expected for this

answer along with the completed Exce l sheet.

4. Sometimes categories are combined in a process of data reduction to be able to apply specific analysis that

would be meaningful. Can you think of any ways to collapse the BMI data into two levels for analysis?

Select one other possible reduction t hat would lead to a meaningful analysis of independence or equal

proportions. This will require several thoughtful sentences for your answer.

This Excel lab accompanies OpenIntro Statistics, 3rd. edition and was developed by Linda Currie for Excel 2016.