statistics with R programming
Week 13 Outline (12-2-20):
Assignment 7 (on blackboard) due date is _____________
Presentation due date is ______________
Assignments 5 and 6 are graded, and answer keys emailed to you.
Exam 2 is Thurs Dec 3 at 5:00, to Sat Dec 5 at 5:00
· Chapters 6, 7, 8 and 9 (SLR only, not MLR), so only assignments 5 and 6 are relevant
Questions
· Two way ANOVA (analysis of variance)
· We still require samples to be taken randomly
· We just finished one way ANOVA
· response variable is quantitative
· there is one explanatory variable, and it is categorical, which describes which of the independent populations each subject came from
· goal is to compare population means
· Sometimes when researchers design experiments, where they randomly assign subjects to treatment groups, there can be two explanatory variables, both categorical
· There may be the primary explanatory variable like treatment group, and also a secondary explanatory variable that researchers want to control for, such as gender, species, geographic location, presence of other condition, etc.
· Controlling for the secondary explanatory variable helps account for more unexplained variation in the response variable, which makes it easier to detect differences in the response variable across categories of the primary explanatory variable
· Or it could just be that there are exactly two explanatory variables of primary interest, and they are both categorical
· Goal is still to compare population means, but we have to be careful about how we do that for two categorical explanatory variables
· We can first do an overall F test to see if x1 and x2 generally explain anything about the mean level of y, though some researcher don’t do this
· Assuming the initial test finds some contribution by x1 and x2 , then the first question to always answer in this setting is if there is an interaction between the two categorical explanatory variables
· Remember that interaction between x1 and x2 means the relationship between y and x1 depends on the level (or category) of x2
· Terminology: we usually refer to the separate (ignoring the other x) effects of x1 and x2 on y as the “main effects”, as opposed to the “interaction effect”
· Interaction plot
· A very simple way to explore if there is an interaction in the sample is to plot the sample means of y for each combination of x1 and x2
· Example: treatment (yes or no) and gender are the two explanatory variables (the values in the table are the sample means of the response variable for all individuals in that combination of treatment and gender)
· So there may have been 10 females who received the treatment, and their mean of y was 9.9, and same for the other 3 combinations
|
|
Yes |
No |
|
Female |
9.9 |
15.1 |
|
Male |
15.4 |
10.2 |
· The relationship between the response variable and treatment depends on which gender the person is, so that says there is an interaction between treatment and gender in the sample
· If the genders showed relatively “parallel” graphs, that would indicate no significant interaction between treatment and gender in the sample, example:
· We always have to worry about if there is significant interaction first, because we’d like to compare means of y across treatment categories (and separately across genders), but that is not meaningful if there is a significant interaction. Why?
· Suppose there were in fact 10 subjects in each of the 4 combinations of dose and gender (there won’t always be the same number in general)
· Because there are the same number in each combination, we can find the mean levels of y for each treatment from the individual combination means (again, this is only possible because there are the same number of subjects in each combination):
· Treatment=yes: (9.9+15.4)/2 = 12.65
· Treatment=no: (15.1+10.2)/2 = 12.65
· So if we first just looked at the mean of y for the two treatment groups, we would see no difference at all, which completely hides the fact that the treatment does have an effect on the response variable
· Same for gender:
· Female: (9.9+15.1)/2 = 12.5
· Male: (15.4+10.2)/2 = 12.8
· These are not exactly the same, but it’s likely they would not be seen as significantly different, meaning gender was not important to the response variable
· When there is an interaction, we cannot just compare means of y for the levels of the two explanatory variables separately
· The best we can do is compare the means of the individual combinations, here 9.9, 15.1, 15.4 and 10.2
· The same ideas translate to doing statistical inference, meaning we look for evidence of significant interaction in the population using hypothesis testing prior to looking for significant main effects
· Very important note: for the purposes of our class, we will assume that all combinations of x1 and x2 have the same number of subjects in them (balanced design). If this is not true, the analysis is more complicated
· Model equation for ANOVA
· Review for one way anova (from last class), assuming 3 levels of factor A
· Model equation for one way ANOVA:
· is the overall population mean (ignoring group)
· is the true (population) effect of the level of factor A
· has nothing to do with the level of significance
· Put another way, plays the role of the mean of the population,
· Conceptually, is how far the population mean for group is from the grand (overall) population mean is, so
· is the residual for
· Then we can write the hypotheses in terms of the ’s:
at least one
· Note that writing the hypotheses this way is completely equivalent to writing the hypotheses as
at least one population mean is different
· Model equation for two way anova (a = number of levels of factor A and b = number of levels of factor B)
·
· is the observation of the response variable in the combination of the level of factor A with the level of factor B
· is the true (population) effect of the level of factor A
· Equivalently,
· is the mean of y in the level of factor A
· is the true (population) effect of the level of factor B
· Equivalently,
· is the mean of y in the level of factor B
· is the true (population) interaction effect for the combination of the level of factor A with the level of factor B
· Equivalently,
· is the mean of y in the combination of the level of factor A with the level of factor B
· is the residual for
· Writing hypotheses using model equation (there are three sets, but we focus on the interaction hypotheses first, for reasons stated above)
· all (no interaction in the population) vs not all (there is a significant interaction in the population)
· (no main effect of factor A in pop) vs
at least one is not zero (Factor A is important in pop)
· Equivalent to vs at least one (from factor A) is different from the others
· (no main effect of factor B in pop) vs
at least one is not zero (Factor B is important in pop)
· Equivalent to vs at least one (from factor B) is different from the others
· Test for interaction
· : no significant interaction between x1 and x2 in the population
· : there is significant interaction between x1 and x2 in the population
· You can look for the p-value for this test to determine whether or not to reject as before
· If you do reject : no sig interaction, the natural follow up question is which of the combinations of x1 and x2 have significantly different means levels of y
· This can be addressed the same way as we did for the followup to one way anova
· If you do not reject : no sig interaction, the natural follow up question is to test a different set of hypotheses, specifically
· : population means of y are same across levels of x1, and also across levels of x2 (in other words, there is no main effect of x1 and x2 on the mean level of y)
· If you reject this , then you would want to do follow up testing to see which population means are different, just as for the follow up to one way anova
General process for two way anova
Perform overall F-test:
Is the model useful, i.e., does at least one term (either main effect or interaction effect) affect the mean of y?
Done; neither factor x1 or x2 affects the response (either as main effects or interaction) so there is nothing more to do.
No
Yes
Determine which combinations of x1 and x2 have different mean levels of y
(We do not need to test main effects.)
Test interaction:
Is there interaction between x1 and x2?
Yes
No
If you find neither main effect is significant, then neither x1 or x2 explained much variation in y
If x1 is significant, then determine which categories of x1 have different mean levels of y
If x2 is significant, then determine which categories of x2 have different mean levels of y
If both A and B are significant, then the model is
Test only main effects of x1 and x2:
Perform the following two tests:
Test main effect of x1:
Does mean outcome of y differ for different levels of x1?
Test main effect of x2:
Does mean outcome of y differ for different levels of x2?
· Assumptions of two way anova (same as one way anova, but applied to the individual combinations of the two explanatory variables), including samples are taken randomly
· For each combination of x1 and x2, sample sizes are reasonably large OR corresponding populations normal
· If sample sizes are not reasonably large, then we should check boxplots or histograms for each combination of x1 and x2 to see if they are reasonably normal
· All combinations of x1 and x2 should have the same (or similar) variability
· We can compare the sample standard deviations or IQR’s of all combinations to see if this is reasonably met
· Example with R code (note same sample size in each, so balanced)
|
|
Level 1 of factor B |
Level 2 of factor B |
Level 3 of factor B |
|
Level 1 of factor A |
2, 4, 6, 5, 4 |
3, 2, 6, 7, 5 |
4, 5, 7, 3, 8 |
|
Level 2 of factor A |
4, 6, 9, 6, 7 |
5, 4, 7, 9, 8 |
3, 7, 8, 6, 9 |
> response<-c(2,4,6,5,4,3,2,6,7,5,4,5,7,3,8,4,6,9,6,7,5,4,7,9,8,3,7,8,6,9)
> factorA<-c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
> factorB<-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
> factorA<-as.factor(factorA)> factorB<-as.factor(factorB)> example<-data.frame(response,factorA,factorB)> exampleresponse factorA factorB1 2 1 12 4 1 13 6 1 14 5 1 15 4 1 16 3 1 27 2 1 28 6 1 29 7 1 210 5 1 211 4 1 312 5 1 313 7 1 314 3 1 315 8 1 316 4 2 117 6 2 118 9 2 119 6 2 120 7 2 121 5 2 222 4 2 223 7 2 224 9 2 225 8 2 226 3 2 327 7 2 328 8 2 329 6 2 330 9 2 3Problem statement: Determine if there is evidence in the samples that mean levels of response are different across levels of factorA and/or factorB in the populations.
Step 0 : From each individual (subject), we observe response (quantitative), factorA (categorical), and factorB (categorical). The goal is to compare mean levels of y across both factorA and factorB, so this is two way anova (assuming conditions are met).
Step 1 : Check conditions
(1) Are sample sizes sufficiently large in each combination of factorA and factorB?
> table(example$factorA,example$factorB)1 2 31 5 5 52 5 5 5
(2) Because sample sizes are not sufficiently large, does y have a normal distribution in each combination of factorA and factorB?
> ggboxplot(example, x = "factorA", y = "response", color = "factorB")
![]()
(3) Is there the same variability in each combination of factorA and factorB?
> example%>%group_by(factorA,factorB)%>%summarise(sdresponse=sd(response))# A tibble: 6 x 3# Groups: factorA [2]factorA factorB sdresponse<dbl> <dbl> <dbl>1 1 1 1.482 1 2 2.073 1 3 2.074 2 1 1.825 2 2 2.076 2 3 2.30
You can also explore if there is an interaction in the sample with
interaction.plot(factorA,factorB,response, fun = mean)
![]()
Step 2 : For simplicity, we’ll write hypotheses in words, so we don’t need to define parameters in words
Step 3 : First we focus on testing if there is an interaction in the population so,
Ho: no interaction between factorA and factorB in the population
H1: there is an interaction between factorA and factorB in the population
= 0.05
Step 4 : F = 0.117
> exampleanova<-aov(data=example,response~factorA+factorB+factorA:factorB)> summary(exampleanova)Df Sum Sq Mean Sq F value Pr(>F)factorA 1 24.30 24.300 6.152 0.0205 *factorB 2 2.47 1.233 0.312 0.7347factorA:factorB 2 1.40 0.700 0.177 0.8387Residuals 24 94.80 3.950
Step 5 : p-value = 0.8387
Step 6 : p-value<0.05, so do not reject Ho
Note: if we had found a significant interaction (which we did not) , then the next step would be to do multiple comparisons on all the combinations of level of factorA and factorB.
> exampleanova<-aov(data=example,response~factorA+factorB+factorA:factorB)> TukeyHSD(exampleanova)Tukey multiple comparisons of means95% family-wise confidence levelFit: aov(formula = response ~ factorA + factorB + factorA:factorB, data = example)$factorAdiff lwr upr p adj2-1 1.8 0.3021917 3.297808 0.0205337$factorBdiff lwr upr p adj2-1 0.3 -1.919637 2.519637 0.93929133-1 0.7 -1.519637 2.919637 0.71410413-2 0.4 -1.819637 2.619637 0.8948541$`factorA:factorB`diff lwr upr p adj2:1-1:1 2.200000e+00 -1.686497 6.086497 0.51420091:2-1:1 4.000000e-01 -3.486497 4.286497 0.99950472:2-1:1 2.400000e+00 -1.486497 6.286497 0.42085001:3-1:1 1.200000e+00 -2.686497 5.086497 0.92762682:3-1:1 2.400000e+00 -1.486497 6.286497 0.42085001:2-2:1 -1.800000e+00 -5.686497 2.086497 0.70803352:2-2:1 2.000000e-01 -3.686497 4.086497 0.99998371:3-2:1 -1.000000e+00 -4.886497 2.886497 0.96556542:3-2:1 2.000000e-01 -3.686497 4.086497 0.99998372:2-1:2 2.000000e+00 -1.886497 5.886497 0.61183521:3-1:2 8.000000e-01 -3.086497 4.686497 0.98698532:3-1:2 2.000000e+00 -1.886497 5.886497 0.61183521:3-2:2 -1.200000e+00 -5.086497 2.686497 0.92762682:3-2:2 8.881784e-16 -3.886497 3.886497 1.00000002:3-1:3 1.200000e+00 -2.686497 5.086497 0.9276268
However, we did not find an interaction, so the next thing to do in this problem is test for significant main effect of factorA and factorB (so rerun two way ANOVA without the interaction term). We could formally repeat the 7 steps, but it isn’t necessary. We’ll focus on the test statistic and p-value.
> exampleanova<-aov(data=example,response~factorA+factorB)> summary(exampleanova)Df Sum Sq Mean Sq F value Pr(>F)factorA 1 24.30 24.300 6.568 0.0165 *factorB 2 2.47 1.233 0.333 0.7195Residuals 26 96.20 3.700---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Because the p-value for factorA (0.0165) is less than 0.05, we are finding a significant difference in mean levels of response across levels of factorA. Because there are only two levels of factorA, we don’t have to do followup post-hoc multiple comparisons. However, if there were three or more levels of factorA, you would proceed just as we did with one way anova post-hoc testing
> exampleanova<-aov(data=example,response~factorA+factorB)> TukeyHSD(exampleanova)Tukey multiple comparisons of means95% family-wise confidence levelFit: aov(formula = response ~ factorA + factorB, data = example)$factorAdiff lwr upr p adj2-1 1.8 0.3562436 3.243756 0.0165231$factorBdiff lwr upr p adj2-1 0.3 -1.837586 2.437586 0.93532353-1 0.7 -1.437586 2.837586 0.69801033-2 0.4 -1.737586 2.537586 0.8881580
The p-value for factorB (0.7195) is greater than 0.05, so we are not finding evidence that the mean levels of response across factorB are different, so there is nothing more to do here.
· What to do when assumptions are not met?
· This is a good question, but I’ve decided it is beyond the scope of our class, meaning it would take a significant amount of time to go into, and we don’t have a significant amount of time left
· You could try to use transformations of the response variable for non-normality or if variabilities are different
· The one exception is if the situation is such that subjects have the response variable measured repeatedly over time, often called repeated measures data, so that one of the explanatory variables is which measurement for each subject
· In this case, researchers will use Friedman’s test, but we won’t go into any details here, because we haven’t talked about how to analyze repeated measures data generally
· Start Aho Chapter 11 (Categorical data)
· Everything we do in this class, as always, requires the sample is taken randomly
· One categorical variable
· One sample proportion HT and CI
· We have one categorical variable observed on one sample,
· For this part, we will only have two categories, loosely referred to as yes’s and no’s (or successes and failures as in binomial context)
· Summarized by sample proportion of yes’s, denoted by
· Goal is to perform statistical inference (HT or CI) for one population proportion, denoted by
· Built on the sampling distribution of
· takes on a different value for each sample, just like sample mean for quantitative data, and we need to know the overall distribution of these, just like for the sample mean
· = mean of all possible value of
· =
· = sd of all possible value of
· =
· Shape of all possible values of will be approximately normal if both and
· Applet: http://www.rossmanchance.com/applets/OneProp/OneProp.htm?candy=1
· Example: suppose we know that 40% of all GVSU students (the population) are in favor of reinstating spring break (assuming the semester ends a week later). If we take a random sample of 80 students, what is the probability of getting a majority of the sample that favors reinstating spring break?
· HT for one population proportion
· Step 1 (checking conditions) requires that is normal (so need to check and
)
· Steps 2 and 3 are about
· Step 4 is based on z-score = (value-mean)/sd, but we have to plug in what value, mean and sd are here
· For step 5, if conditions are met in step 1, then the test statistic will follow a normal distribution with mean=0 and sd=1, so normalcdf can be used to find p-value
· CI for one population proportion
· Chi-square Goodness of fit test
· Two categorical variables
· Two sample proportion HT and CI
· Chi-square test of independence
· Fisher’s exact test
Chapter 11 reading: Aho Chapter 11: Pages 503-511, 513 (section 11.4.1), 519-524 (skip section 11.6.2 and example for likelihood ratio test)