statistics with R programming
Assignment 5 KEY, STA 610, Fall 2020
1. Consider #5 on pages 241-242 in the Aho textbook. Do not follow their parts (a) through (g), and do not assume the population variances are equal. The data are on BB so you don’t have to extract from R, but they are also found in R if you can load the asbio package.
(a) Determine if there is evidence in the sample that the O2 level below the town is lower than above the town, on average, in the populations. Regardless of your assessment of the conditions, assume both populations are normal in step 1 (after checking sample sizes and sample shapes).
Step 0 : From each individual (random location), we observe O2 level (quantitative) and if above or below (categorical). The goal is to compare the mean level of O2 above to the mean level of O2 below (or O2 is the response variable and above/below is the explanatory variable), so we have two independent quantitative populations, and they are independent because the locations above are not connected (paired) in any way with the locations below.
Step 1 : population sd’s known? no
sample sizes over 30? No, 15 for both groups
The above group looks pretty normal, however the below group has a little left skewness and a moderate outlier on high end. The problem says to assume both populations are normal.
Step 2 : mu1 = pop mean O2 for above, mu2 = pop mean O2 for below
Step 3 : Ho: mu1=mu2 H1: mu1 > mu2 alpha=.05
Step 4 : t = 1.96 (Show work by hand too because requested)
> t.test(O2~location,data=dO2)
Welch Two Sample t-test
data: O2 by location
t = 1.9551, df = 20.343, p-value = 0.06445
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.01183912 0.37183912
sample estimates:
mean in group Above mean in group Below
4.92 4.74
Step 5 : p-value = .06445/2 = 0.032 (I asked R for two tail test, so I could get CI from R)
Step 6 : p-value<0.05, so reject Ho
Step 7 : There is suff ev to support that the population mean O2 level is lower for the below group
(b) Calculate and interpret a 95% confidence interval for the difference in population means (assuming both populations are normal).
From above, (-0.012,0.372), we are 95% confident that the difference in the pop means (mu1-mu2) is between -0.012 and 0.372.
(c) Explain how this confidence interval could be used to decide whether or not to reject the null hypothesis in 1a.
0 is inside the interval, so we should not reject Ho:mu1-mu=0, but this is a two sided interval, which does not line up with a one-sided test, so it is OK that we get a different decision in this instance.
(d) Redo only steps 2, 3, 4, and 5 from 1a without assuming the populations are normal, but instead assuming the populations have the same shape and variability.
Step 2 : eta1= pop median for above eta2= pop median for below
Step 3 : Ho: eta1=eta2 H1: eta1>eta2 alpha=0.05
Step 4 : W=165 (do hand calculation too because it was requested)
> wilcox.test(dO2$O2~dO2$location, alternative = c("two.sided"), mu = 0, exact = TRUE)
Wilcoxon rank sum test with continuity correctiondata: dO2$O2 by dO2$locationW = 165, p-value = 0.02926alternative hypothesis: true location shift is not equal to 0
Step 5 : p-value = .02926/2 = 0.015 (cut in half because I asked R for two tail test)
2. Consider #7 on page 242 in the Aho textbook. Do not follow their parts (a) through (e). Note that the same variable was observed on each subject twice (which is why they are listing X – Y), and this must be taken into account.
(a) Determine if there is evidence in the sample that program X is superior to program Y, on average, in the population. Regardless of your assessment of the condition, assume the population is normal in step 1 (after checking sample size and sample shape).
Step 0 : From each individual (twin pair) we observed the difference in weight gain, which is one quantitative variable (when you have paired data, you should always focus on the differences)
Step 1 : pop sd known? No
n=12<30
boxplot is mildly right skewed, but problem says to assume pop is normal
Step 2 : mu=pop mean difference in weight (X-Y)
Step 3: Ho: mu=0 H1: mu>0 alpha=0.05
Step 4 : t= -5.2237 (hand calculate too because it was requested)
> t.test(diff,alternative="greater",mu=0)One Sample t-testdata: difft = -5.2237, df = 11, p-value = 0.9999alternative hypothesis: true mean is greater than 095 percent confidence interval:-1.799565 Infsample estimates:mean of x-1.339167
Step 5 : p-value= .9999
Step 6 : p-value>0.05 so do not reject Ho
(b) Redo only steps 2, 3, 4 and 5 from 2a, but now assuming the population is symmetric.
Step 2 : eta = pop median difference in weight loss (X-Y)
Step 3 : Ho: eta=0 H1: eta>0 alpha=0.05
Step 4 : V=1 (hand calculate too because it was requested)
> wilcox.test(diff,alternative="greater",mu=0)Wilcoxon signed rank testdata: diffV = 1, p-value = 0.9998alternative hypothesis: true location is greater than 0
Step 5 : p-value = 0.9998
(c) Redo only steps 2, 3, 4 and 5 from 2a, but now assuming the population is not symmetric.
Step 2 : eta = pop median difference in weight loss (X-Y)
Step 3 : Ho: eta=0 H1: eta>0 alpha=0.05
Step 4 : S=1 (hand calculate too)
> SIGN.test(diff,md=0,alternative="greater")One-sample Sign-Testdata: diffs = 1, p-value = 0.9998
Step 5 : p-value = 0.9998