statistic assignment

Bilov
Sampleassignment1.pdf

1

Executive Summary

The purpose of this study is to understand and analyse the performance of 60 students

undertaking a subject, ‘Statistics for Business’ in an Australian Business School. The statistical

methods is …………………………. From the analysis performed, we find that:

 There is a wide difference in the performance of students from different classes.

 There is a disparity in the performance of female and male students.

 Although there is very little overall difference in the performance of domestic and

international students, there is a certain degree of variability when their performances

are compared within each class.

Therefore, based on all our findings, we present the following recommendations to the

Business School:

 The quality of resources and teaching faculty provided to students of all the three

classes undertaking the ‘Statistics for Business’ subject must be standardised.

 Since there lies a certain degree of variability in the performance of domestic and

international students in the classes, it is important to focus on both their academic

progress for the overall improvement in the performance.

 Since there lies a disparity in the performance of female and male students, it is essential

that all lecturers and tutors pay equal attention to students of both genders.

2

Table of Contents

Business Problem ..................................................................................................................... 3

Statistical Problem ................................................................................................................... 4

Analysis: Answer 1 ................................................................................................................... 5

Analysis: Answer 2 ................................................................................................................... 9

Analysis: Answer 3 ................................................................................................................. 11

Analysis: Answer 4 ................................................................................................................. 13

Analysis: Answer 5 ................................................................................................................. 15

Analysis: Answer 6 ................................................................................................................. 17

Analysis: Answer 7 ................................................................................................................. 19

Conclusion .............................................................................................................................. 21

Recommendations ................................................................................................................. 22

3

Business Problem

The overarching business problem in this case is centred around an Australian Business School

that is seeking to improve the performance of students in the ‘Statistics for Business’ subject.

Further on, the Business School also aims to build an understanding of the key differences in

the performance of the students. To meet this objective, the population of students is segmented

by gender, international student status and class. After examining the data, an analysis of the

performance of students is performed to identify the segments that are underperforming and

recommend them to the board of the Business School.

Statistical Problem

The underlying statistical problem in this case can be dissected into seven independent sections.

The first three sections include calculation of ………. In this report, we focus on the measures

of …………………... This forms the foundation for a holistic understanding of the population.

The fourth, fifth, sixth and seventh parts of this report demand the use of inferential statistical

techniques such as………………………..

4

Analysis: Answer 1

For the purpose of this analyses, we make two assumptions which can be outlined as follows:

1.) We consider 5% level of significance in rectifying our hypothesis testing.

2.) We assume that the variances of the populations are unequal.

Table 1.2 Descriptive Statistics of marks of Class 2

Fig 1.3 Visual representation of marks of Class 3

70.8715 Mean 53.6605

3.442810226 Standard Error 2.281671995

66.845 Median 51.515

15.3967154 Standard Deviation 10.20394737

237.058845 Sample Variance 104.1205418

-0.939632186 Kurtosis -0.764355747

0.484511604 Skewness 0.496344245

47.82 Range 34.67

50.84 Minimum 38.63

98.66 Maximum 73.3

1417.43 Sum 1073.21

20 Count 20

21.72483353 Coefficient of Variation 19.01575156

57.475 25th Percentile (Q1) 46.6025

83.1625 75th Percentile (Q3) 62.3225

25.6875 Interquartile Range (IQR) 15.72

Class 1 Class 2

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

M ar ks

Students

Class 3

5

Table 1.3 Descriptive Statistics of marks of Class 3

Location: The measures of location help us derive information about sections of the population

when the data is ordered in an ascending array. According to our analysis, students of Class 1

have the highest average mark (70.87), followed by students of Class 2 with an average of

53.66 and Class 3 with the lowest average at 43.80 as calculated with a confidence interval of

95%. However, the mean alone does not give a clear picture of the population and hence we

find the quartiles to gain a better understanding of location. As per our analysis, 25% of marks

obtained by students in Class 1 are equal to or less than 57.47, 50% of marks (median) are

equal to or less than 66.84 and 75% of marks obtained are equal to or less than 83.16. As for

Class 2, 25% of marks obtained by students are equal to or less than 46.6, 50% of marks

(median) are equal to or less than 51.5 and 75% of marks obtained are equal to or less than

62.3. In Class 3, 25% of marks obtained by students are equal to or less than 39.9, 50% of

marks (median) are equal to or less than 44.18 and 75% of marks obtained are equal to or less

than 48.

53.6605 Mean 43.8045

2.281671995 Standard Error 1.87260956

51.515 Median 44.18

10.20394737 Standard Deviation 8.374564545

104.1205418 Sample Variance 70.13333132

-0.764355747 Kurtosis 0.294746556

0.496344245 Skewness -0.399527654

34.67 Range 32.55

38.63 Minimum 26.18

73.3 Maximum 58.73

1073.21 Sum 876.09

20 Count 20

19.01575156 Coefficient of Variation 19.11804619

46.6025 25th Percentile (Q1) 39.985

62.3225 75th Percentile (Q3) 48.085

15.72 Interquartile Range (IQR) 8.1

Class 2 Class 3

6

Shape: The measures of shape in statistics helps us build an understanding of the way in which

a distribution of data appears. As per our analysis, the skewness of marks obtained by students

of Class 1 and Class 2 are positive and hence both their distributions are positively skewed

(right skewed). The skewness of marks obtained by students in Class 3 is negative and hence

its distribution is negatively skewed (left skewed). This means that the average of marks

obtained is greater than the median mark for classes 1 and 2 while the average mark obtained

in class 3 is lesser than its median mark. In addition to this, since distributions for Classes 1

and 2 have excess negative kurtosis, they have platykurtic distributions while the distribution

for Class 3 has excess positive kurtosis and therefore has a leptokurtic distribution.

Variability: The measures of variability in a data set help us give a complete view of the data

set’s spread and dispersion. The marks obtained by students of Class 1 has the highest range

(47.82), followed by that of Class 2 (34.67). The lowest range of marks obtained is by students

of Class 3 (32.55). Since the range of these data sets is only a crude measure of variability, we

calculate the coefficients of variation to compare the marks obtained by the students. Hence,

we find that the coefficient of variation is highest for Class 1 (21.72), followed by Class 3

(19.11) and lastly Class 2 (19.01). Thus, Class 1 has the highest variability, followed by Class

3 and Class 2.

7

Analysis: Answer 2

Table 2.1 Descriptive statistics calculation for domestic and int’l students

Fig. 2.1 Visual representation of marks obtained by domestic students

Mean 55.57666667

Standard Error 2.759676865

Median 54.37

Standard Deviation 15.1153727

Sample Variance 228.474492

Kurtosis 1.065394116

Skewness 0.782385824

Range 70.33

Minimum 27.9

Maximum 98.23

Sum 1667.3

Count 30

Coefficient of Variation 27.19733588

25th Percentile (Q1) 44.28

75th Percentile (Q3) 65.215

Interquartile Range 20.935

Marks for Domestic students

Mean 56.64766667

Standard Error 3.158371604

Median 51.515

Standard Deviation 17.29911373

Sample Variance 299.2593357

Kurtosis 0.274411144

Skewness 0.86508514

Range 72.48

Minimum 26.18

Maximum 98.66

Sum 1699.43

Count 30

Coefficient of Variation 30.53808699

25th Percentile (Q1) 45.06

75th Percentile (Q3) 66.5825

Interquartile Range 21.5225

Marks for International students

0.00

20.00

40.00

60.00

80.00

100.00

120.00

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

M ar k s

Students

Marks of Domestic Students

8

Fig. 2.2 Visual representation of marks obtained by international students

Location: According to our analysis, international students have a slightly higher average mark

(56.64) than domestic students (55.57) as calculated with a confidence interval of 95%. In

addition to this, 25% of marks obtained by domestic students is equal to or less than 44.28,

50% of marks (median) is less than or equal to 51.15 and 75% of marks obtained is less than

or equal to 65.125. On the other hand, 25% of marks obtained by international students is equal

to or less than 45, 50% of marks (median) is less than or equal to 54.37 and 75% of marks

obtained is less than or equal to 66.5. Therefore, the quartiles of both the domestic and

international student populations gives a very similar picture.

Shape: The descriptive analysis performed above shows that the distributions of marks

obtained by both domestic and international students showcase positive skewness and hence

are positively skewed (right skewed). This means that the average of marks obtained is greater

than the median mark for both domestic and international students. Furthermore, since both the

distributions have excess positive kurtosis, they have leptokurtic distributions.

Variability: As per our analysis, we find that the ranges of the marks obtained by domestic

students (70.3) and international students (72.4) are almost similar. Therefore, to truly measure

variability, we calculate the coefficients of variation. The coefficient of variation for marks

obtained by international students (30.53) is higher than that of the marks obtained by domestic

students (27.19) thus showing greater variability.

0.00

20.00

40.00

60.00

80.00

100.00

120.00

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

M ar ks

Students

Marks of International Students

9

Analysis: Answer 3

Table 3.1 Descriptive statistics calculation for domestic and international students in Class 1

Fig. 3.1 Visual representation of marks obtained by domestic students in Class 1

Mean 68.755 Mean 72.2825

Standard Error 5.811018782 Standard Error 4.398162344

Median 66.2 Median 71.145

Standard Deviation 16.43604315 Standard Deviation 15.23568128

Sample Variance 270.1435143 Sample Variance 232.1259841

Kurtosis -0.227515917 Kurtosis -1.008029314

Skewness 0.808624416 Skewness 0.38963064

Range 45.86 Range 47.82

Minimum 52.37 Minimum 50.84

Maximum 98.23 Maximum 98.66

Sum 550.04 Sum 867.39

Count 8 Count 12

Coefficient of Variation 23.90523329 Coefficient of Variation 21.0779667

25th Percentile (Q1) 53.6125 25th Percentile (Q1) 59.8275

75th Percentile (Q3) 82.2575 75th Percentile (Q3) 87.115

Interquartile Range 28.645 Interquartile Range 27.2875

Class 1

Domestic Student - Class 1 International Student - Class 1

0.00

20.00

40.00

60.00

80.00

100.00

120.00

1 2 3 4 5 6 7 8

M ar ks

Students

Marks of Domestic Students - Class 1

10

Fig. 3.2 Visual representation of marks obtained by international students in Class 1

Location: According to the analysis performed in the table above, we find that the average

marks obtained by international students of class 1 (72.28) is greater than the marks obtained

by students of class 2 (68.75) as calculated with a confidence interval of 95%. Furthermore,

25% of marks obtained by domestic students is equal to or less than 53.61, 50% of marks

(median) is less than or equal to 66.2 and 75% of marks is less than or equal to 82.2. For the

case of international students, 25% of marks obtained is equal to or less than 59.8, 50% of

marks obtained is equal to or less than 71.1 and 75% of marks is less than or equal to 87.1.

Shape: As per the descriptive analysis performed earlier, we find that the distributions of marks

obtained by both the domestic and international students are positively skewed. The

distribution of marks obtained by domestic students are more positively skewed than that of

international students. Therefore, the average mark obtained is greater than the median for both

these distributions.

Variability: The ranges of marks obtained for domestic (45.86) and international students

(47.82) is very similar. Thus, we compare their coefficients of variation to get a better picture

of their variability. The coefficient of variation of marks obtained by domestic students (23.9)

is higher than that of international students (21). Hence, the marks obtained by domestic

students have a higher degree of variability than that of international students.

0.00

20.00

40.00

60.00

80.00

100.00

120.00

1 2 3 4 5 6 7 8 9 10 11 12

M ar ks

Students

Marks of International Students - Class 1

11

Analysis: Answer 4

Hypothesis: In this case, to analyse the difference in marks between the male and female

populations of students, we set up the hypothesis to test the authenticity of the claim that the

average marks obtained by female students is significantly different from that obtained by male

students with a 5% level of significance.

Since the variances of the two population are unknown, we use the separate variance (two-

tailed) t-test assuming that the population variances are unequal. Let μ1 be the mean marks of

the population of female students and μ2 be the mean marks of the population of male students.

Thus, their null and alternative hypothesis can be framed as:

H0: μ1- μ2 = 0

Ha: μ1- μ2 ≠ 0

Decision Rule: If the computed t-value using the t statistic formula exceeds the critical t-values

of -2.02 or +2.02, the null hypothesis should be rejected. (Reject H0 if t < -2.02 or if t >2.02)

Fig. 4.1 Excel output for the Separate-Variances t-test for the difference between two means

Separate-Variances t Test for the Difference Between Two Means

(assumes unequal population variances)

Hypothesized Difference 0

Level of Significance 0.05

Population 1 Sample

Sample Size 30

Sample Mean 62.04033333

Sample Standard Deviation 19.7523

Population 2 Sample

Sample Size 30

Sample Mean 50.184

Sample Standard Deviation 8.0910

Numerator of Degrees of Freedom 230.6522 Pop. 1 Sample Variance 390.1535

Denominator of Degrees of Freedom 5.9964 Pop. 2 Sample Variance 65.4636

Total Degrees of Freedom 38.4653 Pop. 1 Sample Var./Sample Size 13.0051

Degrees of Freedom 38 Pop. 2 Sample Var./Sample Size 2.1821

Standard Error 3.8971 For one-tailed tests:

Difference in Sample Means 11.85633333 TDIST value 0.0021

Separate-Variance t Test Statistic 3.0424 1-TDIST value 0.9979

Two-Tail Test

Lower Critical Value -2.0244

Upper Critical Value 2.0244

p-Value 0.0042

Reject the null hypothesis

Data

Intermediate Calculations Calculations Area

12

Conclusion: Since the value calculated from the t-test (3.04) is higher than the upper critical

value (2.02), we conclude that the decision must be to reject the null hypothesis. This means

that the alternative hypothesis can be accepted. Therefore, the average marks obtained by

female students is different from the average marks obtained by male students.

13

Analysis: Answer 5

Hypothesis: In this case, to analyse the difference in marks obtained by students of Class 1,

Class 2 and Class 3, we set up the hypothesis to test the authenticity of the claim that the

average marks obtained by students of Class 1, Class 2 and Class 3 are significantly different

from each other with a 5% level of significance.

Since we attempt to compare the averages of more than 2 different populations, we use the F-

test calculated from one-way (or single factor) ANOVA. Let μ1 be the mean marks of the

population of Class 1, μ2 be the mean marks of the population of Class 2 and μ3 be the mean

marks of the population of Class 3 respectively. Thus, their null and alternative hypothesis can

be framed as:

H0: μ1= μ2 = μ3

Ha: At least one of the mean marks of a Class is different from the others.

Decision rule: The decision rule is to reject the null hypothesis if the calculated value of F is

found to be greater than the critical value of 3.15. (Reject H0 if F > 3.15)

Fig. 5.1 Excel output for Single Factor (One way) ANOVA for the difference between 3 means

Anova: Single Factor

SUMMARY

Groups Count Sum Average Variance

CLASS1 20 1417.43 70.8715 237.0588

CLASS2 20 1073.21 53.6605 104.1205

CLASS3 20 876.09 43.8045 70.13333

ANOVA

Source of Variation SS df MS F P-value F crit

Between Groups 7506.545 2 3753.272 27.37532 4.65E-09 3.158843

Within Groups 7814.942 57 137.1042

Total 15321.49 59

14

Conclusion: Since the value calculated from the F test (27.37) is greater than the critical value

of 3.15 and lies in the rejection area, the decision is to reject the null hypothesis. Therefore, we

reach the conclusion that the mean marks obtained by students of either Class 1, Class 2 or

Class 3 is different from the others.

15

Analysis: Answer 6

Hypothesis: In this case, to analyse the difference in marks between the populations of

domestic and international students, we set up the hypothesis to test the authenticity of the

claim that the average marks obtained by domestic students is significantly different from that

obtained by international students with a 5% level of significance.

Since the variances of the two population are unknown, we use the separate variance (two-

tailed) t-test with the assumption that the population variances are unequal. Let μ1 be the mean

marks of the population of domestic students and μ2 be the mean marks of the population of

international students. Thus, their null and alternative hypothesis can be framed as:

H0: μ1- μ2 = 0

Ha: μ1- μ2 ≠ 0

Decision Rule: If the computed t-value using the t statistic formula exceeds the critical t-values

of -2.003 or 2.003, the null hypothesis should be rejected. (Reject H0 if t < -2.003 or if t >2.003)

Fig. 6.1 Excel output for the Separate-Variances t-test for the difference between two means

Separate-Variances t Test for the Difference Between Two Means

(assumes unequal population variances)

Hypothesized Difference 0

Level of Significance 0.05

Population 1 Sample

Sample Size 30

Sample Mean 55.57666667

Sample Standard Deviation 15.1154

Population 2 Sample

Sample Size 30

Sample Mean 56.64766667

Sample Standard Deviation 17.2991

Numerator of Degrees of Freedom 309.4478 Pop. 1 Sample Variance 228.4745

Denominator of Degrees of Freedom 5.4313 Pop. 2 Sample Variance 299.2593

Total Degrees of Freedom 56.9750 Pop. 1 Sample Var./Sample Size 7.6158

Degrees of Freedom 56 Pop. 2 Sample Var./Sample Size 9.9753

Standard Error 4.1942 For one-tailed tests:

Difference in Sample Means -1.071 TDIST value 0.3997

Separate-Variance t Test Statistic -0.2554 1-TDIST value 0.6003

Two-Tail Test

Lower Critical Value -2.0032

Upper Critical Value 2.0032

p -Value 0.7994

Do not reject the null hypothesis

Data

Intermediate Calculations Calculations Area

16

Conclusion: Since the value obtained from the t-test (-0.255) is between the upper (+2.003)

and lower (-2.003) critical values, we reach the conclusion that the decision must be to not

reject the null hypothesis. Therefore, we do not support the alternative hypothesis (Ha) and the

claim is not corrected. Thus, we conclude that there is insufficient evidence in the data in order

to declare a significant difference in the average mean marks obtained by domestic and

international students.

17

Analysis: Answer 7

Hypothesis: In this case, to analyse the difference in marks between the populations of

domestic and international students from Class 1, we set up the hypothesis to test the

authenticity of the claim that the average marks obtained by domestic students is significantly

different from that obtained by international students with a 5% level of significance.

The variances of the two population are unknown and hence we use the separate variance (two-

tailed) t-test with the assumption that the population variances are unequal. Let μ1 be the mean

marks of the population of domestic students of Class 1 and μ2 be the mean marks of the

population of international students of Class 1. Thus, their null and alternative hypothesis can

be framed as:

H0: μ1- μ2 = 0

Ha: μ1- μ2 ≠ 0

Decision Rule: If the computed t-value using the t statistic formula exceeds the critical t-values

of -2.144 or 2.144, the null hypothesis should be rejected. (Reject H0 if t < -2.144 or if t >2.144)

Fig. 7.1 Excel output for the Separate-Variances t-test for the difference between two means

Separate-Variances t Test for the Difference Between Two Means

(assumes unequal population variances)

Hypothesized Difference 0

Level of Significance 0.05

Population 1 Sample

Sample Size 8

Sample Mean 68.755

Sample Standard Deviation 16.4360

Population 2 Sample

Sample Size 12

Sample Mean 72.2825

Sample Standard Deviation 15.2357

Numerator of Degrees of Freedom 2820.8602 Pop. 1 Sample Variance 270.1435

Denominator of Degrees of Freedom 196.9130 Pop. 2 Sample Variance 232.1260

Total Degrees of Freedom 14.3254 Pop. 1 Sample Var./Sample Size 33.7679

Degrees of Freedom 14 Pop. 2 Sample Var./Sample Size 19.3438

Standard Error 7.2878 For one-tailed tests:

Difference in Sample Means -3.5275 TDIST value 0.3179

Separate-Variance t Test Statistic -0.4840 1-TDIST value 0.6821

Two-Tail Test

Lower Critical Value -2.1448

Upper Critical Value 2.1448

p -Value 0.6358

Do not reject the null hypothesis

Data

Intermediate Calculations Calculations Area

18

Conclusion: Since the value obtained from the t-test (-0.4848) is between the upper (+2.144)

and lower (-2.144) critical values, we reach the conclusion that the decision must be to not

reject the null hypothesis. Therefore, we do not support the alternative hypothesis (Ha) and the

claim is not corrected. Thus, we conclude that there isn’t enough evidence in the data to declare

a significant difference between the average marks obtained by domestic and international

students of Class 1.

19

Conclusion

After an exhaustive descriptive and inferential statistical calculation as performed in the

previous sections of our study, we discuss our findings in this section of the report. Through

our descriptive analyses, we find that although Class 1 has the highest average mark in

comparison to Class 2 and Class 3, it also has the highest range of individual marks and the

highest coefficient of variation in comparison to the other classes. Furthermore, although from

a first glance International students seem to perform better when pitted against Domestic

students, they have a higher coefficient of variation. When focused on Class 1 of the data set

to compare the domestic and international students, the story told is different. Herein, domestic

students have a higher measure of variability (coefficient of variation) than that of international

students.

According to the inferential statistical calculation performed in our study, we conclude that the

average mark of female students is different from that of their male counterparts. In addition,

we also conclude that the average mark obtained by students of at least one of the classes is

different from another. Furthermore, we study the difference in average marks of domestic and

international students across the entire population as well as for a single class (Class 1) and

reach the conclusion that there isn’t enough evidence to declare that there is a significant

difference in average marks obtained by them which is a drawback of this study.

20

Recommendations

From all the conclusions drawn in this study, we attempt to establish its implications for the

Australian Business School and its efforts to improving the performance of students in the

‘Statistics for Business’ subject. The study implies that there is a wide difference in the

performance of students from different classes. In addition to that, there also lies a disparity in

the performance of female students and male students. However, when the overall performance

of the population of domestic students is compared with that of international students, there is

hardly a difference discovered. Although, when a class of students is studied, we find degrees

of variability within the populations of domestic and international students. Therefore, based

on all our findings, we present the following recommendations to the Business School:

 The quality of resources and teaching faculty provided to students of all the three

classes undertaking the ‘Statistics for Business’ subject must be standardised.

 Since there lies a certain degree of variability in the performance of domestic and

international students in the classes, it is important to focus on both their academic

progress for the overall improvement in the performance.

 Since there lies a disparity in the performance of female and male students, it is essential

that all lecturers and tutors pay equal attention to students of both genders to improve

the overall performance.