stat correction to be done its spss ANOVA
1
T1: Conduct one-way ANOVA analysis on the height using the "Weight and Height.xlsx" sample data under the Data Sets (See Attachment) folder (Note that you will need to change the format of the data set to conduct the analysis). Based on your results, what conclusion would you draw regarding the average height of students in the three states?
Solution:
Null Hypothesis: All the states have the same mean height of the students.
Alternate Hypothesis: Not all the states have the same mean height of the students.
The One-Way Analysis of Variance was performed in SPSS. The SPSS output is given as follows:
|
Descriptives |
||||||||
|
Height |
||||||||
|
|
N |
Mean |
Std. Deviation |
Std. Error |
95% Confidence Interval for Mean |
Minimum |
Maximum |
|
|
|
|
|
|
|
Lower Bound |
Upper Bound |
|
|
|
KS |
28 |
67.5714 |
3.30184 |
.62399 |
66.2911 |
68.8517 |
62.00 |
75.00 |
|
MO |
41 |
69.9756 |
3.24220 |
.50635 |
68.9522 |
70.9990 |
61.00 |
75.00 |
|
NE |
23 |
67.8696 |
4.18593 |
.87283 |
66.0594 |
69.6797 |
61.75 |
74.00 |
|
Total |
92 |
68.7174 |
3.65929 |
.38151 |
67.9596 |
69.4752 |
61.00 |
75.00 |
|
Test of Homogeneity of Variances |
|||
|
Height |
|||
|
Levene Statistic |
df1 |
df2 |
Sig. |
|
1.697 |
2 |
89 |
.189 |
Levene Statistic (2, 89) = 1.697, p-value = 0.189 Comment by Bo Yan: I don’t have SPSS installed on my computer. But the your results are slightly different from the numbers I got from using Excel and another statistical package.
The p-value is significant at α=0.05. From this test we conclude that we do not have the homogeneity of variance. Comment by Bo Yan: Does the evidence support your conclusion here when p > 0.05?
|
ANOVA |
|||||
|
Height |
|||||
|
|
Sum of Squares |
df |
Mean Square |
F |
Sig. |
|
Between Groups |
118.211 |
2 |
59.105 |
4.781 |
.011 |
|
Within Groups |
1100.316 |
89 |
12.363 |
|
|
|
Total |
1218.527 |
91 |
|
|
|
F (2, 89) = 4.781, p-value = 0.011
The p-value is significant at α=0.05. We reject the null hypothesis. From our analysis there is sufficient evidence to conclude that not all the states have the same mean height of the students, at a significance level of 0.05. This can be further emphasized from the results of the post-hoc tests. The following table shows the results of the Turkey’s HSD post hoc tests: Comment by Bo Yan: Good.
|
Multiple Comparisons |
||||||
|
Dependent Variable: Height Tukey HSD |
||||||
|
(I) State |
(J) State |
Mean Difference (I-J) |
Std. Error |
Sig. |
95% Confidence Interval |
|
|
|
|
|
|
|
Lower Bound |
Upper Bound |
|
KS |
MO |
-2.40418* |
.86202 |
.018 |
-4.4588 |
-.3495 |
|
|
NE |
-.29814 |
.98948 |
.951 |
-2.6566 |
2.0603 |
|
MO |
KS |
2.40418* |
.86202 |
.018 |
.3495 |
4.4588 |
|
|
NE |
2.10604 |
.91601 |
.061 |
-.0773 |
4.2894 |
|
NE |
KS |
.29814 |
.98948 |
.951 |
-2.0603 |
2.6566 |
|
|
MO |
-2.10604 |
.91601 |
.061 |
-4.2894 |
.0773 |
|
*. The mean difference is significant at the 0.05 level. |
KS & MO:
The 95% confidence interval for the difference of means is given as (-4.4588, -.3495).
As the confidence interval does not contain 0, we can conclude that there is statistically significant difference between the mean height of students belonging to the states KS and MO. Comment by Bo Yan: Now you should understand the difference between statistically significant and practically significant difference. So I expect you to be more careful when using the word “significant”, which could be easily misunderstood by people without statistical training if not used carefully.
KS & NE:
The 95% confidence interval for the difference of means is given as (-2.6566, 2.0603).
As the confidence interval contains 0, we can conclude that there is no significant difference between the mean height of students belonging to the states KS and NE. Comment by Bo Yan: As I explained in last week’s summary, we don’t accept the null hypothesis.
MO & NE:
The 95% confidence interval for the difference of means is given as (-.0773, 4.2894).
As the confidence interval contains 0, we can conclude that there is no significant difference between the mean height of students belonging to the states MO and NE. Comment by Bo Yan: Same as above
The Mean Plot is given as follows:
The means plot graph shows that there is clearly significant difference in the groups. Comment by Bo Yan: Your description of the chart is problematic in two ways. First, the means were obtained from a sample, which can only suggest difference. Second, if you change the starting point of the y axis to 0, the chart might suggest otherwise.
Results:
A one-way ANOVA revealed significant differences in the mean heights of students for the three states, F (2, 89) = 4.781, p-value = 0.011. Further post-hoc analysis revealed that the states KS (M = 67.5714, SD = 3.30184) and MO (M = 69.9756, SD = 3.24220) have significantly different mean heights of students.
T2 : Conduct some research on the Internet on T-test and ANVOA. Based on your research, explain why not conducting three T-tests to answer the question in T1.
The reasons why we conduct ANOVA and not a T-test is because
· Comparing three sets of data using the t-test would require three t-tests to be conducted which increases the chances of making type I error.
· T-test shall not make use of all the availed information from which the samples were collected. Comment by Bo Yan: What do you think is the impact of not using all available information?
· It is easier to perform a single ANOVA than performing three multiple t-tests.
T3: A grocery store conducted a study to find out which brand is preferred among its customers by asking 300 customers whether they like or dislike Brand A and another 300 customers whether they like or dislike Brand B. The results are presented in the table below. According to this sample (without hypothesis testing), which brand is preferred and explain your finding.
|
|
Brand A |
Brand B |
Total |
|
Like |
215 |
241 |
456 |
|
Dislike |
85 |
59 |
144 |
|
Total |
300 |
300 |
600 |
Brand B is more liked. If the tests are anything to go by and without undertaking any tests, we can observe that brand B is more liked than brand A by all genders. This is because if placed in a frequency distribution, the brand B shall have a lot of frequency.
· The results are further disaggregated by gender and presented in the following two tables for males and females respectively. According to the male and female samples (without hypothesis testing), which brand is preferred by male customers and female customers respectively and explain your findings.
|
Male |
Brand A |
Brand B |
Total |
|
Like |
120 |
20 |
140 |
|
Dislike |
80 |
40 |
100 |
|
Total |
200 |
60 |
240 |
·
|
Female |
Brand A |
Brand B |
Total |
|
Like |
95 |
221 |
316 |
|
Dislike |
5 |
39 |
44 |
|
Total |
100 |
260 |
360 |
· Based on the two sets of findings, what is your overall conclusion? What is the implication of this study? Comment by Bo Yan: You did not answer this question.