Statistics Technology Lab
Math 2150L – Statistics Technology Lab
Diagnostic Testing – Comparing Two Means – Independent Samples
Steps for Comparing Two Means – Independent Samples
1. Use QQ Plot to determine if either or both of the data sets are approximately normally
distributed.
2. If it is safe to assume both data sets are approximately normally distributed then proceed
with the Independent Samples t-Test.
3. If it is not safe to assume either or both data sets are approximately normally distributed
then the Wilcoxon Non-Parametric Test for Two Independent Samples must be used.
t-Test for Two Independent Samples where both samples are approximately normally distributed
Step 1: Hypothesis to be tested
Ho: 1 = 2
HA: 1 ≠ 2 , 1 < 2 , 1 > 2
Step 2: Use the “t.test” R command to generate a p-value.
> t.test(variable1, variable2, alternative=”alternative”, paired=FALSE, conf.level=0.95)
Step 3: If p-value < = 0.05 the data has provided evidence HA is true
Wilcoxon Non-Parametric Test for Two Independent Samples (sample data not normal)
Step 1: Hypothesis to be tested
Ho: Median1 = Median2
HA: Median1 ≠ Median2 , Median1 < Median2 , Median1 > Median2
Step 2: Use the “wilcox.test” R command to generate a p-value.
> wilcox.test(variable1, variable2, alternative=”alternative”, paired=FALSE)
Step 3: If p-value < = 0.05 the data has provided evidence HA is true
Example – Using the two data sets below, run the QQ Plot diagnostics to determine if it is safe to
assume either (or both) data sets are normally distributed. Based on the diagnostics results,
perform the appropriate test to either compare means or medians between the two groups.
Sample A – 1, 3, 4, 6, 8, 12, 15, 16, 17, 22, 26
Sample B – 2, 5, 7, 9, 11, 13, 14, 18, 19, 20, 25, 27
(next page for solution)
Math 2150L – Statistics Technology Lab
Diagnostic Testing – Comparing Two Means – Independent Samples
Since both samples are small (nA = 11 < 30 and nB = 12 < 30) we cannot assume the data are
approximately symmetric and must investigate the QQ plots of the samples shown below.
The graph above on the left is for sampleA and the graph above on the right is for sample. If the
data are approximately symmetric the points should generally follow the reference line with no
severe breaks at the ends. It is my opinion that for both sampleA and sample the points generally
follow the reference line with no severe breaks at the ends. That being the case we can proceed
to test the equality of means hypothesis using the t-test.
Step 1: Hypotheses to be tested
Ho: A = B
HA: A ≠ B
Step 2: The “t.test” R command below generates a p-value = 0.4897
> t.test(sampleA,sampleB, alternative=”two.sided”, paired=FALSE, conf.level=0.95)
Step 3: Since the p-value 0.4897 = 0.05 the data has provided no significant evidence HA is
true and we can assume A B.
As an example, suppose we thought the QQ Plot of sample (above on the left) made a severe
break on the lower left end and that was evidence sample was not approximately symmetric. We
would then proceed with Wilcox test of Medians rather than the t-test of means as follows:
Math 2150L – Statistics Technology Lab
Diagnostic Testing – Comparing Two Means – Independent Samples
Step 1: Hypotheses to be tested
Ho: MedianA = MedianB
HA: MedianA ≠ MedianB
Step 2: The “wilcox.test” R command below generates a p-value = 0.4865
> wilcox.test(sampleA, sampleB, alternative=”two.sided”, paired=FALSE)
Step 3: Since the p-value 0.4865 = 0.05 the data has provided no significant evidence HA is
true and we can assume MedianA MedianB.