statistic assignment
1
Executive Summary
The purpose of this study is to understand and analyse the performance of 60 students
undertaking a subject, ‘Statistics for Business’ in an Australian Business School. The statistical
methods is …………………………. From the analysis performed, we find that:
There is a wide difference in the performance of students from different classes.
There is a disparity in the performance of female and male students.
Although there is very little overall difference in the performance of domestic and
international students, there is a certain degree of variability when their performances
are compared within each class.
Therefore, based on all our findings, we present the following recommendations to the
Business School:
The quality of resources and teaching faculty provided to students of all the three
classes undertaking the ‘Statistics for Business’ subject must be standardised.
Since there lies a certain degree of variability in the performance of domestic and
international students in the classes, it is important to focus on both their academic
progress for the overall improvement in the performance.
Since there lies a disparity in the performance of female and male students, it is essential
that all lecturers and tutors pay equal attention to students of both genders.
2
Table of Contents
Business Problem ..................................................................................................................... 3
Statistical Problem ................................................................................................................... 4
Analysis: Answer 1 ................................................................................................................... 5
Analysis: Answer 2 ................................................................................................................... 9
Analysis: Answer 3 ................................................................................................................. 11
Analysis: Answer 4 ................................................................................................................. 13
Analysis: Answer 5 ................................................................................................................. 15
Analysis: Answer 6 ................................................................................................................. 17
Analysis: Answer 7 ................................................................................................................. 19
Conclusion .............................................................................................................................. 21
Recommendations ................................................................................................................. 22
3
Business Problem
The overarching business problem in this case is centred around an Australian Business School
that is seeking to improve the performance of students in the ‘Statistics for Business’ subject.
Further on, the Business School also aims to build an understanding of the key differences in
the performance of the students. To meet this objective, the population of students is segmented
by gender, international student status and class. After examining the data, an analysis of the
performance of students is performed to identify the segments that are underperforming and
recommend them to the board of the Business School.
Statistical Problem
The underlying statistical problem in this case can be dissected into seven independent sections.
The first three sections include calculation of ………. In this report, we focus on the measures
of …………………... This forms the foundation for a holistic understanding of the population.
The fourth, fifth, sixth and seventh parts of this report demand the use of inferential statistical
techniques such as………………………..
4
Analysis: Answer 1
For the purpose of this analyses, we make two assumptions which can be outlined as follows:
1.) We consider 5% level of significance in rectifying our hypothesis testing.
2.) We assume that the variances of the populations are unequal.
Table 1.2 Descriptive Statistics of marks of Class 2
Fig 1.3 Visual representation of marks of Class 3
70.8715 Mean 53.6605
3.442810226 Standard Error 2.281671995
66.845 Median 51.515
15.3967154 Standard Deviation 10.20394737
237.058845 Sample Variance 104.1205418
-0.939632186 Kurtosis -0.764355747
0.484511604 Skewness 0.496344245
47.82 Range 34.67
50.84 Minimum 38.63
98.66 Maximum 73.3
1417.43 Sum 1073.21
20 Count 20
21.72483353 Coefficient of Variation 19.01575156
57.475 25th Percentile (Q1) 46.6025
83.1625 75th Percentile (Q3) 62.3225
25.6875 Interquartile Range (IQR) 15.72
Class 1 Class 2
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
M ar ks
Students
Class 3
5
Table 1.3 Descriptive Statistics of marks of Class 3
Location: The measures of location help us derive information about sections of the population
when the data is ordered in an ascending array. According to our analysis, students of Class 1
have the highest average mark (70.87), followed by students of Class 2 with an average of
53.66 and Class 3 with the lowest average at 43.80 as calculated with a confidence interval of
95%. However, the mean alone does not give a clear picture of the population and hence we
find the quartiles to gain a better understanding of location. As per our analysis, 25% of marks
obtained by students in Class 1 are equal to or less than 57.47, 50% of marks (median) are
equal to or less than 66.84 and 75% of marks obtained are equal to or less than 83.16. As for
Class 2, 25% of marks obtained by students are equal to or less than 46.6, 50% of marks
(median) are equal to or less than 51.5 and 75% of marks obtained are equal to or less than
62.3. In Class 3, 25% of marks obtained by students are equal to or less than 39.9, 50% of
marks (median) are equal to or less than 44.18 and 75% of marks obtained are equal to or less
than 48.
53.6605 Mean 43.8045
2.281671995 Standard Error 1.87260956
51.515 Median 44.18
10.20394737 Standard Deviation 8.374564545
104.1205418 Sample Variance 70.13333132
-0.764355747 Kurtosis 0.294746556
0.496344245 Skewness -0.399527654
34.67 Range 32.55
38.63 Minimum 26.18
73.3 Maximum 58.73
1073.21 Sum 876.09
20 Count 20
19.01575156 Coefficient of Variation 19.11804619
46.6025 25th Percentile (Q1) 39.985
62.3225 75th Percentile (Q3) 48.085
15.72 Interquartile Range (IQR) 8.1
Class 2 Class 3
6
Shape: The measures of shape in statistics helps us build an understanding of the way in which
a distribution of data appears. As per our analysis, the skewness of marks obtained by students
of Class 1 and Class 2 are positive and hence both their distributions are positively skewed
(right skewed). The skewness of marks obtained by students in Class 3 is negative and hence
its distribution is negatively skewed (left skewed). This means that the average of marks
obtained is greater than the median mark for classes 1 and 2 while the average mark obtained
in class 3 is lesser than its median mark. In addition to this, since distributions for Classes 1
and 2 have excess negative kurtosis, they have platykurtic distributions while the distribution
for Class 3 has excess positive kurtosis and therefore has a leptokurtic distribution.
Variability: The measures of variability in a data set help us give a complete view of the data
set’s spread and dispersion. The marks obtained by students of Class 1 has the highest range
(47.82), followed by that of Class 2 (34.67). The lowest range of marks obtained is by students
of Class 3 (32.55). Since the range of these data sets is only a crude measure of variability, we
calculate the coefficients of variation to compare the marks obtained by the students. Hence,
we find that the coefficient of variation is highest for Class 1 (21.72), followed by Class 3
(19.11) and lastly Class 2 (19.01). Thus, Class 1 has the highest variability, followed by Class
3 and Class 2.
7
Analysis: Answer 2
Table 2.1 Descriptive statistics calculation for domestic and int’l students
Fig. 2.1 Visual representation of marks obtained by domestic students
Mean 55.57666667
Standard Error 2.759676865
Median 54.37
Standard Deviation 15.1153727
Sample Variance 228.474492
Kurtosis 1.065394116
Skewness 0.782385824
Range 70.33
Minimum 27.9
Maximum 98.23
Sum 1667.3
Count 30
Coefficient of Variation 27.19733588
25th Percentile (Q1) 44.28
75th Percentile (Q3) 65.215
Interquartile Range 20.935
Marks for Domestic students
Mean 56.64766667
Standard Error 3.158371604
Median 51.515
Standard Deviation 17.29911373
Sample Variance 299.2593357
Kurtosis 0.274411144
Skewness 0.86508514
Range 72.48
Minimum 26.18
Maximum 98.66
Sum 1699.43
Count 30
Coefficient of Variation 30.53808699
25th Percentile (Q1) 45.06
75th Percentile (Q3) 66.5825
Interquartile Range 21.5225
Marks for International students
0.00
20.00
40.00
60.00
80.00
100.00
120.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
M ar k s
Students
Marks of Domestic Students
8
Fig. 2.2 Visual representation of marks obtained by international students
Location: According to our analysis, international students have a slightly higher average mark
(56.64) than domestic students (55.57) as calculated with a confidence interval of 95%. In
addition to this, 25% of marks obtained by domestic students is equal to or less than 44.28,
50% of marks (median) is less than or equal to 51.15 and 75% of marks obtained is less than
or equal to 65.125. On the other hand, 25% of marks obtained by international students is equal
to or less than 45, 50% of marks (median) is less than or equal to 54.37 and 75% of marks
obtained is less than or equal to 66.5. Therefore, the quartiles of both the domestic and
international student populations gives a very similar picture.
Shape: The descriptive analysis performed above shows that the distributions of marks
obtained by both domestic and international students showcase positive skewness and hence
are positively skewed (right skewed). This means that the average of marks obtained is greater
than the median mark for both domestic and international students. Furthermore, since both the
distributions have excess positive kurtosis, they have leptokurtic distributions.
Variability: As per our analysis, we find that the ranges of the marks obtained by domestic
students (70.3) and international students (72.4) are almost similar. Therefore, to truly measure
variability, we calculate the coefficients of variation. The coefficient of variation for marks
obtained by international students (30.53) is higher than that of the marks obtained by domestic
students (27.19) thus showing greater variability.
0.00
20.00
40.00
60.00
80.00
100.00
120.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
M ar ks
Students
Marks of International Students
9
Analysis: Answer 3
Table 3.1 Descriptive statistics calculation for domestic and international students in Class 1
Fig. 3.1 Visual representation of marks obtained by domestic students in Class 1
Mean 68.755 Mean 72.2825
Standard Error 5.811018782 Standard Error 4.398162344
Median 66.2 Median 71.145
Standard Deviation 16.43604315 Standard Deviation 15.23568128
Sample Variance 270.1435143 Sample Variance 232.1259841
Kurtosis -0.227515917 Kurtosis -1.008029314
Skewness 0.808624416 Skewness 0.38963064
Range 45.86 Range 47.82
Minimum 52.37 Minimum 50.84
Maximum 98.23 Maximum 98.66
Sum 550.04 Sum 867.39
Count 8 Count 12
Coefficient of Variation 23.90523329 Coefficient of Variation 21.0779667
25th Percentile (Q1) 53.6125 25th Percentile (Q1) 59.8275
75th Percentile (Q3) 82.2575 75th Percentile (Q3) 87.115
Interquartile Range 28.645 Interquartile Range 27.2875
Class 1
Domestic Student - Class 1 International Student - Class 1
0.00
20.00
40.00
60.00
80.00
100.00
120.00
1 2 3 4 5 6 7 8
M ar ks
Students
Marks of Domestic Students - Class 1
10
Fig. 3.2 Visual representation of marks obtained by international students in Class 1
Location: According to the analysis performed in the table above, we find that the average
marks obtained by international students of class 1 (72.28) is greater than the marks obtained
by students of class 2 (68.75) as calculated with a confidence interval of 95%. Furthermore,
25% of marks obtained by domestic students is equal to or less than 53.61, 50% of marks
(median) is less than or equal to 66.2 and 75% of marks is less than or equal to 82.2. For the
case of international students, 25% of marks obtained is equal to or less than 59.8, 50% of
marks obtained is equal to or less than 71.1 and 75% of marks is less than or equal to 87.1.
Shape: As per the descriptive analysis performed earlier, we find that the distributions of marks
obtained by both the domestic and international students are positively skewed. The
distribution of marks obtained by domestic students are more positively skewed than that of
international students. Therefore, the average mark obtained is greater than the median for both
these distributions.
Variability: The ranges of marks obtained for domestic (45.86) and international students
(47.82) is very similar. Thus, we compare their coefficients of variation to get a better picture
of their variability. The coefficient of variation of marks obtained by domestic students (23.9)
is higher than that of international students (21). Hence, the marks obtained by domestic
students have a higher degree of variability than that of international students.
0.00
20.00
40.00
60.00
80.00
100.00
120.00
1 2 3 4 5 6 7 8 9 10 11 12
M ar ks
Students
Marks of International Students - Class 1
11
Analysis: Answer 4
Hypothesis: In this case, to analyse the difference in marks between the male and female
populations of students, we set up the hypothesis to test the authenticity of the claim that the
average marks obtained by female students is significantly different from that obtained by male
students with a 5% level of significance.
Since the variances of the two population are unknown, we use the separate variance (two-
tailed) t-test assuming that the population variances are unequal. Let μ1 be the mean marks of
the population of female students and μ2 be the mean marks of the population of male students.
Thus, their null and alternative hypothesis can be framed as:
H0: μ1- μ2 = 0
Ha: μ1- μ2 ≠ 0
Decision Rule: If the computed t-value using the t statistic formula exceeds the critical t-values
of -2.02 or +2.02, the null hypothesis should be rejected. (Reject H0 if t < -2.02 or if t >2.02)
Fig. 4.1 Excel output for the Separate-Variances t-test for the difference between two means
Separate-Variances t Test for the Difference Between Two Means
(assumes unequal population variances)
Hypothesized Difference 0
Level of Significance 0.05
Population 1 Sample
Sample Size 30
Sample Mean 62.04033333
Sample Standard Deviation 19.7523
Population 2 Sample
Sample Size 30
Sample Mean 50.184
Sample Standard Deviation 8.0910
Numerator of Degrees of Freedom 230.6522 Pop. 1 Sample Variance 390.1535
Denominator of Degrees of Freedom 5.9964 Pop. 2 Sample Variance 65.4636
Total Degrees of Freedom 38.4653 Pop. 1 Sample Var./Sample Size 13.0051
Degrees of Freedom 38 Pop. 2 Sample Var./Sample Size 2.1821
Standard Error 3.8971 For one-tailed tests:
Difference in Sample Means 11.85633333 TDIST value 0.0021
Separate-Variance t Test Statistic 3.0424 1-TDIST value 0.9979
Two-Tail Test
Lower Critical Value -2.0244
Upper Critical Value 2.0244
p-Value 0.0042
Reject the null hypothesis
Data
Intermediate Calculations Calculations Area
12
Conclusion: Since the value calculated from the t-test (3.04) is higher than the upper critical
value (2.02), we conclude that the decision must be to reject the null hypothesis. This means
that the alternative hypothesis can be accepted. Therefore, the average marks obtained by
female students is different from the average marks obtained by male students.
13
Analysis: Answer 5
Hypothesis: In this case, to analyse the difference in marks obtained by students of Class 1,
Class 2 and Class 3, we set up the hypothesis to test the authenticity of the claim that the
average marks obtained by students of Class 1, Class 2 and Class 3 are significantly different
from each other with a 5% level of significance.
Since we attempt to compare the averages of more than 2 different populations, we use the F-
test calculated from one-way (or single factor) ANOVA. Let μ1 be the mean marks of the
population of Class 1, μ2 be the mean marks of the population of Class 2 and μ3 be the mean
marks of the population of Class 3 respectively. Thus, their null and alternative hypothesis can
be framed as:
H0: μ1= μ2 = μ3
Ha: At least one of the mean marks of a Class is different from the others.
Decision rule: The decision rule is to reject the null hypothesis if the calculated value of F is
found to be greater than the critical value of 3.15. (Reject H0 if F > 3.15)
Fig. 5.1 Excel output for Single Factor (One way) ANOVA for the difference between 3 means
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
CLASS1 20 1417.43 70.8715 237.0588
CLASS2 20 1073.21 53.6605 104.1205
CLASS3 20 876.09 43.8045 70.13333
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 7506.545 2 3753.272 27.37532 4.65E-09 3.158843
Within Groups 7814.942 57 137.1042
Total 15321.49 59
14
Conclusion: Since the value calculated from the F test (27.37) is greater than the critical value
of 3.15 and lies in the rejection area, the decision is to reject the null hypothesis. Therefore, we
reach the conclusion that the mean marks obtained by students of either Class 1, Class 2 or
Class 3 is different from the others.
15
Analysis: Answer 6
Hypothesis: In this case, to analyse the difference in marks between the populations of
domestic and international students, we set up the hypothesis to test the authenticity of the
claim that the average marks obtained by domestic students is significantly different from that
obtained by international students with a 5% level of significance.
Since the variances of the two population are unknown, we use the separate variance (two-
tailed) t-test with the assumption that the population variances are unequal. Let μ1 be the mean
marks of the population of domestic students and μ2 be the mean marks of the population of
international students. Thus, their null and alternative hypothesis can be framed as:
H0: μ1- μ2 = 0
Ha: μ1- μ2 ≠ 0
Decision Rule: If the computed t-value using the t statistic formula exceeds the critical t-values
of -2.003 or 2.003, the null hypothesis should be rejected. (Reject H0 if t < -2.003 or if t >2.003)
Fig. 6.1 Excel output for the Separate-Variances t-test for the difference between two means
Separate-Variances t Test for the Difference Between Two Means
(assumes unequal population variances)
Hypothesized Difference 0
Level of Significance 0.05
Population 1 Sample
Sample Size 30
Sample Mean 55.57666667
Sample Standard Deviation 15.1154
Population 2 Sample
Sample Size 30
Sample Mean 56.64766667
Sample Standard Deviation 17.2991
Numerator of Degrees of Freedom 309.4478 Pop. 1 Sample Variance 228.4745
Denominator of Degrees of Freedom 5.4313 Pop. 2 Sample Variance 299.2593
Total Degrees of Freedom 56.9750 Pop. 1 Sample Var./Sample Size 7.6158
Degrees of Freedom 56 Pop. 2 Sample Var./Sample Size 9.9753
Standard Error 4.1942 For one-tailed tests:
Difference in Sample Means -1.071 TDIST value 0.3997
Separate-Variance t Test Statistic -0.2554 1-TDIST value 0.6003
Two-Tail Test
Lower Critical Value -2.0032
Upper Critical Value 2.0032
p -Value 0.7994
Do not reject the null hypothesis
Data
Intermediate Calculations Calculations Area
16
Conclusion: Since the value obtained from the t-test (-0.255) is between the upper (+2.003)
and lower (-2.003) critical values, we reach the conclusion that the decision must be to not
reject the null hypothesis. Therefore, we do not support the alternative hypothesis (Ha) and the
claim is not corrected. Thus, we conclude that there is insufficient evidence in the data in order
to declare a significant difference in the average mean marks obtained by domestic and
international students.
17
Analysis: Answer 7
Hypothesis: In this case, to analyse the difference in marks between the populations of
domestic and international students from Class 1, we set up the hypothesis to test the
authenticity of the claim that the average marks obtained by domestic students is significantly
different from that obtained by international students with a 5% level of significance.
The variances of the two population are unknown and hence we use the separate variance (two-
tailed) t-test with the assumption that the population variances are unequal. Let μ1 be the mean
marks of the population of domestic students of Class 1 and μ2 be the mean marks of the
population of international students of Class 1. Thus, their null and alternative hypothesis can
be framed as:
H0: μ1- μ2 = 0
Ha: μ1- μ2 ≠ 0
Decision Rule: If the computed t-value using the t statistic formula exceeds the critical t-values
of -2.144 or 2.144, the null hypothesis should be rejected. (Reject H0 if t < -2.144 or if t >2.144)
Fig. 7.1 Excel output for the Separate-Variances t-test for the difference between two means
Separate-Variances t Test for the Difference Between Two Means
(assumes unequal population variances)
Hypothesized Difference 0
Level of Significance 0.05
Population 1 Sample
Sample Size 8
Sample Mean 68.755
Sample Standard Deviation 16.4360
Population 2 Sample
Sample Size 12
Sample Mean 72.2825
Sample Standard Deviation 15.2357
Numerator of Degrees of Freedom 2820.8602 Pop. 1 Sample Variance 270.1435
Denominator of Degrees of Freedom 196.9130 Pop. 2 Sample Variance 232.1260
Total Degrees of Freedom 14.3254 Pop. 1 Sample Var./Sample Size 33.7679
Degrees of Freedom 14 Pop. 2 Sample Var./Sample Size 19.3438
Standard Error 7.2878 For one-tailed tests:
Difference in Sample Means -3.5275 TDIST value 0.3179
Separate-Variance t Test Statistic -0.4840 1-TDIST value 0.6821
Two-Tail Test
Lower Critical Value -2.1448
Upper Critical Value 2.1448
p -Value 0.6358
Do not reject the null hypothesis
Data
Intermediate Calculations Calculations Area
18
Conclusion: Since the value obtained from the t-test (-0.4848) is between the upper (+2.144)
and lower (-2.144) critical values, we reach the conclusion that the decision must be to not
reject the null hypothesis. Therefore, we do not support the alternative hypothesis (Ha) and the
claim is not corrected. Thus, we conclude that there isn’t enough evidence in the data to declare
a significant difference between the average marks obtained by domestic and international
students of Class 1.
19
Conclusion
After an exhaustive descriptive and inferential statistical calculation as performed in the
previous sections of our study, we discuss our findings in this section of the report. Through
our descriptive analyses, we find that although Class 1 has the highest average mark in
comparison to Class 2 and Class 3, it also has the highest range of individual marks and the
highest coefficient of variation in comparison to the other classes. Furthermore, although from
a first glance International students seem to perform better when pitted against Domestic
students, they have a higher coefficient of variation. When focused on Class 1 of the data set
to compare the domestic and international students, the story told is different. Herein, domestic
students have a higher measure of variability (coefficient of variation) than that of international
students.
According to the inferential statistical calculation performed in our study, we conclude that the
average mark of female students is different from that of their male counterparts. In addition,
we also conclude that the average mark obtained by students of at least one of the classes is
different from another. Furthermore, we study the difference in average marks of domestic and
international students across the entire population as well as for a single class (Class 1) and
reach the conclusion that there isn’t enough evidence to declare that there is a significant
difference in average marks obtained by them which is a drawback of this study.
20
Recommendations
From all the conclusions drawn in this study, we attempt to establish its implications for the
Australian Business School and its efforts to improving the performance of students in the
‘Statistics for Business’ subject. The study implies that there is a wide difference in the
performance of students from different classes. In addition to that, there also lies a disparity in
the performance of female students and male students. However, when the overall performance
of the population of domestic students is compared with that of international students, there is
hardly a difference discovered. Although, when a class of students is studied, we find degrees
of variability within the populations of domestic and international students. Therefore, based
on all our findings, we present the following recommendations to the Business School:
The quality of resources and teaching faculty provided to students of all the three
classes undertaking the ‘Statistics for Business’ subject must be standardised.
Since there lies a certain degree of variability in the performance of domestic and
international students in the classes, it is important to focus on both their academic
progress for the overall improvement in the performance.
Since there lies a disparity in the performance of female and male students, it is essential
that all lecturers and tutors pay equal attention to students of both genders to improve
the overall performance.