NA

sep21-activity_complete.html

Home >Mathematics homework help >Statistics homework help >NA

Day 08 Activity

Fisher & Hughes

September 21, 2018

Study

A study was conducted to determine the effects of alcohol on human reaction times. Fifty-seven adult individuals within two-age groups were recruited for this study and were randomly allocated into one of three alcohol treatment groups – a control where the subjects remain sober during the entire study, a moderate group were the subject is supplied alcohol but is limited in such a way that their blood alcohol content (BAC) remains under the legal limit to drive (BAC of 0.08) and a group that received a high amount of alcohol to which their BAC may exceed the legal limit for driving. Each subject was trained on a video game system and their reaction time (in milliseconds) to a visual stimulus was recorded at 7 time points 30 minutes apart (labeled T0=0, T1=30, T2=60 and so on). At time point T0, all subjects were sober and those in one of the alcohol consumption groups began drinking after the first measured reaction time (controlled within the specifications outlined). The researcher is interested in determining the influence alcohol and age (namely, is reaction time different for those in the 20s versus 30s) has on reaction times.

The task for today is to do a complete analysis for this study and dig into the effects of alcohol, age and time have on reaction times.

Data input and wrangling

First read in the data:

alcohol <- read.csv("alcoholReaction.csv")
head(alcohol)

##   Subject Age Alcohol    T0    T1    T2    T3    T4    T5    T6
## 1       1  24 Control 255.3 254.8 256.4 255.1 257.0 256.1 257.0
## 2       2  34 Control 250.1 249.2 249.0 248.0 248.0 248.9 248.1
## 3       3  31 Control 248.2 247.1 246.9 246.7 246.0 246.0 247.0
## 4       4  24 Control 253.9 253.8 254.9 254.1 253.2 254.1 255.0
## 5       5  38 Control 250.0 251.0 250.0 249.9 248.8 249.1 249.9
## 6       6  38 Control 246.0 248.0 247.0 248.1 248.1 246.9 244.0

Note, the Age variable is recorded as an actual age in years, not the category of 20s or 30s like we want – we need to dichotomize this variable. Also note the data is in wide format – the reaction times (the response variables) are spread over multiple columns. We need a way to gather these columns into a single column. So we need to do some data processing.

First consider the below code:

head(alcohol %>% 
       mutate(Age = case_when(Age<31 ~ "20s",
                              Age %in% 31:40 ~ "30s")))

##   Subject Age Alcohol    T0    T1    T2    T3    T4    T5    T6
## 1       1 20s Control 255.3 254.8 256.4 255.1 257.0 256.1 257.0
## 2       2 30s Control 250.1 249.2 249.0 248.0 248.0 248.9 248.1
## 3       3 30s Control 248.2 247.1 246.9 246.7 246.0 246.0 247.0
## 4       4 20s Control 253.9 253.8 254.9 254.1 253.2 254.1 255.0
## 5       5 30s Control 250.0 251.0 250.0 249.9 248.8 249.1 249.9
## 6       6 30s Control 246.0 248.0 247.0 248.1 248.1 246.9 244.0

case_when is essentially a piece-wise comparison. When Age is less than 31, you overwrite Age variable with “20s”. If the Age is greater than 30, you replace it with “30s”. In this example we used both a < comparison and the %in% statement we’ve seen before just to show multiple functionality. Also note we include 30 in the 20s group and 40 in the 30s group so they are each of size 10.

alcohol <- alcohol %>%
  mutate(Age = case_when(Age<31 ~ "20s",
                         Age %in% 31:40 ~ "30s") )

So the Age variable has been categorized. Now we need to convert the data from wide to tall format. We do this with the gather() function included in tidyverse.

alcohol.tall <- alcohol %>%
  gather(key=Time, value=Reaction, c(T0, T1, T2, T3, T4, T5, T6))

A blurb about gather There are essentially three inputs into the gather() functions. First

key - Essentially provides the name of the new variable we are going to create that consist of the column names
value - Is the name for the new variable that will house the values originally stored in the columns of interest
The final part is a list of all the columns we want to gather, in this case, T0, T1, T2, T3, T4, T5 and T6.

head(alcohol.tall, n=10)

##    Subject Age Alcohol Time Reaction
## 1        1 20s Control   T0    255.3
## 2        2 30s Control   T0    250.1
## 3        3 30s Control   T0    248.2
## 4        4 20s Control   T0    253.9
## 5        5 30s Control   T0    250.0
## 6        6 30s Control   T0    246.0
## 7        7 20s Control   T0    248.8
## 8        8 30s Control   T0    245.9
## 9        9 20s Control   T0    246.9
## 10      10 30s Control   T0    249.1

You will now note the data is a in a tall format, which is good for analysis.

Lastly, so R doesn’t try and treat it as a number, we tell it that the Subject variable is a factor or categorical variable. I also put the Alcohol variables in the order we think…

alcohol.tall <- alcohol.tall %>%
  mutate(Subject = as.factor(Subject),
         Alcohol = factor(Alcohol, levels=c("Control", "Moderate", "High")))

Exploratory Data Analysis

There are 2 categories for age, 3 categories for alcohol use and then 7 time points to consider. Essentially \(2\times 3\times 7 = 42\) combinations to consider. Rather than look numerically we will consider things graphically.

First we consider a plot of the Reaction times in Time based on Alcohol treatment with Age determining the linetype.

ggplot(alcohol.tall) + 
  geom_line(aes(x=Time, y=Reaction, group=Subject, color=Alcohol, linetype=Age))

Not only is this plot noisy, it is hard to determine anything. Let’s facet based on Age

ggplot(alcohol.tall) + 
  geom_line(aes(x=Time, y=Reaction, group=Subject, color=Alcohol)) + 
  facet_wrap(~Age)

This second plot is improved but still quite noisy. Let’s plot average profiles rather than the raw data.

ggplot(alcohol.tall, aes(x=Time, y=Reaction, group=Alcohol, color=Alcohol)) + 
  stat_summary(fun.y=mean, geom="line") + 
  facet_wrap(~Age)

These average profiles are fairly telling and maybe even a little surprising. Overall you see the High aclohol group (blue line) shows an increase in reaction time over the time of the study. The Control group shows a near decrease in the 30s group but also note the spead is only about a half a unit decrease.

Model fitting and analysis

We fit a 2 factor repeated measure model and look at the output.

fit <- aov(Reaction ~ Age*Alcohol*Time + Error(Subject/Time), data=alcohol.tall)
summary(fit)

## 
## Error: Subject
##             Df Sum Sq Mean Sq F value Pr(>F)
## Age          1     18   17.72   0.254  0.616
## Alcohol      2    143   71.47   1.026  0.366
## Age:Alcohol  2     93   46.31   0.665  0.519
## Residuals   51   3553   69.66               
## 
## Error: Subject:Time
##                   Df Sum Sq Mean Sq F value   Pr(>F)    
## Time               6   50.3   8.386   6.929 6.45e-07 ***
## Age:Time           6   10.3   1.714   1.416  0.20786    
## Alcohol:Time      12   40.0   3.330   2.752  0.00145 ** 
## Age:Alcohol:Time  12   13.8   1.150   0.950  0.49702    
## Residuals        306  370.4   1.210                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

First we look at the most complicated interaction term, in this case Age:Alcohol:Time and it is NOT significant. So we follow up by considering the two-way interaction terms. We see Age:Alcohol and Age:Time are not significant but Alcohol:Time is. There is an interaction between Alcohol group and Time. Given the interactions involving Age are not significnat, we can also consider the Age main effect, but see it is also insignificant (F-stat 0.252 on 1 and 51 degrees of freedom, \(p\)-value=0.616). Age appears to have no influence on the reaction times. We follow up with conditional multiple comparisons.

Multiple Comparison Follow ups

Note: We have two levels of control in this study, there is an explicit Control group and at time point T0 no subjects had been given a treatment, so it also operates as a control. Dunnett’s method for multiple comparison is most appropriate (see chapter 2.7 of the text).

We see that Alcohol and Time both matter, but perhaps in different ways. We consider both conditional comparisons. First we run the emmeans() code

mc.alc <- emmeans(fit,  ~ Alcohol | Time)

## Warning in emm_basis.aovlist(object, ...): Some predictors are correlated with the intercept - results are biased.
## May help to re-fit with different contrasts, e.g. 'contr.sum'

## NOTE: Results may be misleading due to involvement in interactions

mc.time <- emmeans(fit,  ~ Time | Alcohol)

## Warning in emm_basis.aovlist(object, ...): Some predictors are correlated with the intercept - results are biased.
## May help to re-fit with different contrasts, e.g. 'contr.sum'

## NOTE: Results may be misleading due to involvement in interactions

First we consider the effects of alcohol conditioning at different time points.

contrast(mc.alc, "trt.vs.ctrl", ref=1)

## Time = T0:
##  contrast           estimate        SE    df t.ratio p.value
##  Moderate - Control 1.077143 1.0774800 51.00   1.000  0.5097
##  High - Control     1.400260 0.9861516 51.00   1.420  0.2799
## 
## Time = T1:
##  contrast           estimate        SE    df t.ratio p.value
##  Moderate - Control 1.753810 1.2014014 78.06   1.460  0.2590
##  High - Control     1.816169 1.0995693 78.06   1.652  0.1841
## 
## Time = T2:
##  contrast           estimate        SE    df t.ratio p.value
##  Moderate - Control 1.947143 1.2014014 78.06   1.621  0.1950
##  High - Control     2.023896 1.0995693 78.06   1.841  0.1274
## 
## Time = T3:
##  contrast           estimate        SE    df t.ratio p.value
##  Moderate - Control 2.133810 1.2014014 78.06   1.776  0.1450
##  High - Control     2.613442 1.0995693 78.06   2.377  0.0380
## 
## Time = T4:
##  contrast           estimate        SE    df t.ratio p.value
##  Moderate - Control 2.405476 1.2014014 78.06   2.002  0.0907
##  High - Control     2.814351 1.0995693 78.06   2.560  0.0239
## 
## Time = T5:
##  contrast           estimate        SE    df t.ratio p.value
##  Moderate - Control 2.365476 1.2014014 78.06   1.969  0.0975
##  High - Control     3.206623 1.0995693 78.06   2.916  0.0090
## 
## Time = T6:
##  contrast           estimate        SE    df t.ratio p.value
##  Moderate - Control 2.487143 1.2014014 78.06   2.070  0.0781
##  High - Control     3.517532 1.0995693 78.06   3.199  0.0039
## 
## Results are averaged over the levels of: Age 
## P value adjustment: dunnettx method for 2 tests

plot(contrast(mc.alc, "trt.vs.ctrl", ref=1))

First note, that in all seven comparisons, the Moderate group is never different than the Control group (this is true for all time, smallest adjusted \(p\)-value is 0.0781). Thus, the profiles of the Moderate group and the Control group are statistically the same.

We can see that in the early time points, there was no difference between the treatment groups receiving alcohol and those not but as time progressed the “High” alcohol group had higher reaction times than the control (starting at T3, it always significant with adjusted \(p\)-value of 0.0380).

Next we compare the effects of time conditioning on the alcohol group.

contrast(mc.time, "trt.vs.ctrl", ref=1)

## Alcohol = Control:
##  contrast   estimate        SE  df t.ratio p.value
##  T1 - T0   0.1700000 0.3478929 306   0.489  0.9675
##  T2 - T0   0.1750000 0.3478929 306   0.503  0.9647
##  T3 - T0   0.2600000 0.3478929 306   0.747  0.8938
##  T4 - T0   0.0700000 0.3478929 306   0.201  0.9976
##  T5 - T0  -0.1750000 0.3478929 306  -0.503  0.9647
##  T6 - T0  -0.1600000 0.3478929 306  -0.460  0.9727
## 
## Alcohol = Moderate:
##  contrast   estimate        SE  df t.ratio p.value
##  T1 - T0   0.8466667 0.4017122 306   2.108  0.1603
##  T2 - T0   1.0450000 0.4017122 306   2.601  0.0492
##  T3 - T0   1.3166667 0.4017122 306   3.278  0.0065
##  T4 - T0   1.3983333 0.4017122 306   3.481  0.0032
##  T5 - T0   1.1133333 0.4017122 306   2.771  0.0309
##  T6 - T0   1.2500000 0.4017122 306   3.112  0.0111
## 
## Alcohol = High:
##  contrast   estimate        SE  df t.ratio p.value
##  T1 - T0   0.5859091 0.3398943 306   1.724  0.3302
##  T2 - T0   0.7986364 0.3398943 306   2.350  0.0929
##  T3 - T0   1.4731818 0.3398943 306   4.334  0.0001
##  T4 - T0   1.4840909 0.3398943 306   4.366  0.0001
##  T5 - T0   1.6313636 0.3398943 306   4.800  <.0001
##  T6 - T0   1.9572727 0.3398943 306   5.758  <.0001
## 
## Results are averaged over the levels of: Age 
## P value adjustment: dunnettx method for 6 tests

plot(contrast(mc.time, "trt.vs.ctrl", ref=1))

We see that the Control group never deviates from the control time point (T0). This should not be surprising given they remained sober for the entire study. In both of the other treatments we see the influence of Time (and thus alcohol consumption) on reaction times.

Even though the profile of the Moderate group was not significantly different than the Control group, they did experience an increase in reaction times with the consumption of alcohol (just not enough to deviate overall from the Control group). We see that the High consumption did deviate from the Control group sometime around time point T3 (90 minutes).

Conclusions

We established above that the key finding is that those with a high dosage of alcohol had a longer reaction time compared to the the control group as time progressed. We also find that those receiving a moderate amount of alcohol performed similarly to the control group. We close by building a profile plot to summarize the findings (remember, Age was not important).

First we plot the profiles of the three alcohol treatments summarizing over all ages.

alcohol.summary <- alcohol.tall %>%
  group_by(Alcohol, Time) %>%
  summarize(Mean=mean(Reaction),
            SE= sd(Reaction)/sqrt(n()))
ggplot(alcohol.summary, aes(x=Time, y=Mean, color=Alcohol)) +
  geom_errorbar(aes(ymin=Mean-SE, ymax=Mean+SE), width=0.1, position=position_dodge(0.3)) +
  geom_line(aes(group=Alcohol), position=position_dodge(0.3)) +
  geom_point(position=position_dodge(0.3))

Note this plot is a bit misleading since we have plotted the moderate group even though it is statistically similar to the control group (note the SE bars overlap for all time points for the moderate and contrl groups). To link the control and moderate groups, we have to do a bit more data processing. In the below code we recast the Alcohol variable to only two groups.

alcohol.summary2 <- alcohol.tall %>%
  mutate(Alcohol = case_when(Alcohol=="High" ~ "Legally Drunk",
                             TRUE ~ "Legally Sober")) %>%    # `TRUE ~` is everything else
  group_by(Alcohol, Time) %>%
  summarize(Mean=mean(Reaction),
            SE= sd(Reaction)/sqrt(n()))

The TRUE ~ "Legally Sober" line essentially tells R that in any other case (TRUE is always True) to mark it as Legally Sober. In the first line of the case_when statement we use the == notation to compare for equality.

Now we make an overall plot summarizing the findings of our study. To demonstrate the level of sophistication we can include in a plot, I do quite a bit with axes, labeling and color choices. Note this is sort of thing covered in detail in STA404. Here we demonstrate the functionality.

ggplot(alcohol.summary2, aes(x=Time, y=Mean, color=Alcohol)) +
  geom_errorbar(aes(ymin=Mean-SE, ymax=Mean+SE), width=0.1, position=position_dodge(0.3)) +
  geom_line(aes(group=Alcohol), position=position_dodge(0.3)) +
  geom_point(position=position_dodge(0.3)) +
  scale_x_discrete(name="Minutes since start of study", labels=c("0","30","60","90", "120", "150", "180")) +
  scale_color_manual(name="Alcohol level", values=c("darkgreen", "cyan")) + 
  labs(y="Mean Reaction Time (ms)") + 
  theme_bw() + 
  ggtitle("Alcohol effects on Reaction time to Visual Stimulus") + 
  theme(legend.position=c(0.125,0.85))  # 0.125 (ie 12.5%) from the left edge and 0.85 from the bottom edge