WEEK 7
· One way of seeing possible relationships between two variables is to look at the Conditional Distribution for each variable. A conditional distribution, looks at the distribution of one categorical variable within each of the categories of the other variable.
Example: Create a conditional distribution for the variable Had Depression.
Step 1: First, look at the column, Had Cancer? Yes. Calculate the proportion of those with cancer who also had depression. Then calculate the proportion of those with cancer who did not have depression. Notice that these proportions add up to 1 because they make up all of the people who had cancer.
· Conditional Distribution for “Had Depression”
|
|
Had Cancer? |
||
|
|
Yes |
No |
|
|
Had Depression? |
Yes |
170/220 0.773 |
|
|
|
No |
50/220 0.227 |
|
|
TOTAL |
1 |
|
· From this, we see that if a patient had cancer, he/she had a 77% probability of also having depression.
Step 2: Look at the column, Had Cancer?No. Calculate the proportion of those who did not have cancer, that had depression. Then, calculate the proportion of those who did not have cancer that did not have depression.
Conditional Distribution for “Had Depression”
|
|
Had Cancer? |
||
|
|
Yes |
No |
|
|
Had Depression? |
Yes |
170/220 0.773 |
1000/1850 0.541 |
|
|
No |
50/220 0.227 |
850/1850 0.459 |
|
TOTAL |
1 |
1 |
From this, we see that if a patient did not have cancer, he/she had a 54% probability of having depression. Depression is more evenly distributed for patients without cancer than for patients with cancer.
· Exercise B1: Complete the Conditional Distribution Table for the variable Had Cancer. The first entry has been done for you.
|
|
Had Cancer? |
|
||
|
|
Yes |
No |
TOTAL |
|
|
Had Depression? |
Yes |
170/1170 0.145 |
|
1 |
|
|
No |
|
|
1 |
· Exercise B2: Create the graph for the Conditional Distribution for Had Cancer?
Exercise B3: Interpret this conditional distributions for Had Depression? and Had Cancer?. Make at least two summary statements based on each distribution.
· This may help: https://www.youtube.com/watch?v=j4_SuB4qzWE
OR: http://www.excelfunctions.net/Excel-Chitest-Function.html
Exercise D3: Test for Independence for DoW #7:
· The p-value (the probability of getting a chi square as large as 43.13 with independent variables) is less than 0.0001. This is well below the level of significance we set (0.5%). Do we reject the null hypothesis in favor of the alternative hypothesis, that is, is there a relationship between Depression and Cancer at a statistically significant level of 0.5%?
Exercise D4: The Chi Square Test allows you to determine the there is an association between the variables, but does not specify what that relationship is.
· What do you think is the association that the chi square test supports in DoW #7?
Explain why this association does NOT support the claim the Depression Causes Cancer, even though we have strong statistical evidence supporting the association. (You may want to revisit the YouTube video.