Statistical Analysis Subject
|
SOUTHERN CROSS UNIVERSITY School of Business and Tourism MAT10251 Statistical Analysis |
PROJECT COVER SHEET
Please complete all of the following details and then make these sheets the first pages of your project – do not send it as a separate document.
Your project must be submitted as a Word document.
PART B
|
Student Name: |
Umair Elahi |
|
Student ID No.: |
23039692 |
|
Tutor’s name: |
Badri Bhattarai |
|
Due date: |
13th January 2019 |
|
Date submitted: |
16Th January 2019 |
Declaration:
I have read and understand the Rules Relating to Awards ( Rule 3 Section 18 – Academic Integrity ) as contained in the SCU Policy Library. I understand the penalties that apply for academic misconduct and agree to be bound by these rules.
The work I am submitting electronically is entirely my own work.
.
|
Signed: (please type your name) |
Umair |
|
Date: |
16/01/19 |
|
|
STUDENT NAME: Umair Elahi
STUDENT ID NUMBER: 23039692 |
MAT10251 – Statistical Analysis
Project Part B
Complete the summary table below.
|
Sample Number (last digit of your student ID number) |
2 |
|
Fuel First letter family name A to M – Unleaded 91 First letter family name N to Z – Diesel |
E |
|
Confidence Level |
95% |
|
Level of Significance |
5% |
Value: 15%
|
|
|
PLEASE ENSURE YOU KEEP A COPY OF YOUR PROJECT
|
Self-Marking Sheet for Part A
Reflection/feedback ( approximately 200 words )
From the work done in part A, the representation of data in a graph was well understood and implemented. As showcased, two graphs were constructed using the same data set but different class intervals resulting in two different shapes. In addition, calculation of the descriptive statistics was well executed. The interpretation of the aforementioned statistical values was also done appropriately with deep understanding of what each statistic meant or represented.
However, there were some challenges and mistakes encountered during the tasks. First, the task of introducing data was a challenge. To avoid this in future, taking time to read and fully understand the population from which the sample is derived and also to understand the sample is a step to be taken. By doing so, I will be able to introduce the data before commencing on the calculations. Another challenge was in the choice of the measure of central tendency as the median and mean were close to each other. To avoid this, more background research regarding the same will be done.
From the submission and self-marking of part A, I was able to discover the mistakes and challenges I faced when doing the tasks and think of the ways with which I can avoid or rectify such mistakes in the future. Marking and Feedback Sheet Part B
Comments: Please follow the provided instruction. If you need any help, please see me next time.
Figure 1(Histogram) Similar to Video
|
Bins |
Midpoints |
Frequency |
|
134.99 |
$132.50 |
6 |
|
139.99 |
$137.50 |
21 |
|
144.99 |
$142.50 |
12 |
|
149.99 |
$147.50 |
16 |
|
154.99 |
$152.50 |
15 |
|
159.99 |
$157.50 |
10 |
|
164.99 |
$162.50 |
0 |
Figure 2 (Histogram) With New Clases Bins and Midpoints
|
Bins |
Midpoints |
Frequency |
|
131.99 |
$131.00 |
0 |
|
133.99 |
$133.00 |
4 |
|
135.99 |
$135.00 |
9 |
|
137.99 |
$137.00 |
7 |
|
139.99 |
$139.00 |
7 |
|
141.99 |
$141.00 |
4 |
|
143.99 |
$143.00 |
6 |
|
145.99 |
$145.00 |
4 |
|
147.99 |
$147.00 |
6 |
|
149.99 |
$149.00 |
8 |
|
151.99 |
$151.00 |
6 |
|
153.99 |
$153.00 |
7 |
|
155.99 |
$155.00 |
10 |
|
157.99 |
$157.00 |
1 |
|
159.99 |
$159.00 |
1 |
|
161.99 |
$161.00 |
0 |
The First and Second graph are construted using the same data but because of choosing different classes the shapes are different . The first data set shows a skew to the right while the second one is showing some sort of symmetric or uniform data set, the first graph is constructed using 5 cents difference while the second is costructed using 2 cents difference,
So defining the second one in detail.As you can see that the above grapgh is representing the NSW Unleaded 91 Fuel prices in 80 Town/Suburbs according to Cents per litre with different prices ranging from 132.9 cents / litre the minimum to 158.9 cents / litre the maxium.
|
Descriptive Summary |
|
|
|
|
|
|
Cents Per Litre |
|
Mean |
145.3375 |
|
Median |
145.85 |
|
Mode |
155.9 |
|
Minimum |
132.9 |
|
Maximum |
158.9 |
|
Range |
26 |
|
Variance |
55.5586 |
|
Standard Deviation |
7.4538 |
|
Coeff. of Variation |
5.13% |
|
Skewness |
0.0083 |
|
Kurtosis |
-1.3280 |
|
Count |
80 |
|
Standard Error |
0.8334 |
From the above graph we can see that there are four suburbs for the fuel prices ranging from 130 cents/litre to 134cents/litre, four for 142cents/litre to 144cents/litre and four for 144cents/litre to 146 cents/litre while majority of the suburbs has got the same price range i.e. 134 cents/litre to 136 cents/litre and but if we see prices ranging from136 cents/litre to 138 cents/litre and 138 cents/litre to 140 cents/litre we can see that seven of the suburbs has got the same price range respectively.
Descriptive Statistics
More useful information can be found in the descriptive statistics in the table given above. In particular the least fuel price among all of the suburbs in NSW is 132.9 cents/litre while the most expensive or highest is 158.9 cents/litre. The median, which is the middle value among 80 suburbs fuel prices is 145.85 cents/litre i.e. 50 precent of the suburbs are falling under this price range. While the mean, the single value, the central tendency, the average is 145.335 cents/litre. As the mean and median are comparatively same we can conclude that average fuel price among 80 suburbs is 145.335 cents/litre. However the standard deviation of 7.4538 shows that the most of the fuel prices are very very close to the mean i.e. 145.335 cents/litre because of the less standard deviation.
|
Five-Number Summary |
|
|
Minimum |
132.90 |
|
First quartile |
137.90 |
|
Median |
145.85 |
|
Third quartile |
151.90 |
|
Maximum |
158.90 |
Furthermore we will end up by describing the five numbers summary given above which divides the samples into quarters, with 25% of the data set in the sample lie below the first quartile i.e. 137.90 cents/litre and 25% more lie above the third quartile i.e. 151.90 cents/litre.
Figure 3 Boxplot
Written Answer Part B Components of a longer report
The questions in part B both deal with the question of whether or not motorists view the price of the fuel as expensive though from different perspectives.
Question 1 in particular answers the question of whether the price of the fuel is expensive from the perspective of the population mean. The sample mean was estimated to be 145.3375 cents.
The results are as follows:
The interval was found to be [143.7074 , 146.9709] cents
Since the interval does not include the value $1.50 or 150 cents, the null hypothesis is rejected. Comment by Badri Bhattarai: ????? please display your excel output.
Question 2 on the other hand answers the question of whether or not the fuel price is expensive from the perspective of a subset (more than 25% of petrol stations) of the sample having the fuel price at least $1.50 per litre.
Calculations were done and the results are as follows:
It was found that the price of fuel in 24 out of 80 petrol stations in the state was higher than $1.50. This translates to 30%.
B.1 Average Price Unleaded 91/Diesel Price
No, the average price of fuel on that day and in the state specified was not expensive. This is so since as per the interval test in statistics that was carried out (check appendix), the null hypothesis which states that the fuel was expensive was rejected in our case.
B.2 Unleaded 91/Diesel Price Expensive
Yes, the price of fuel was at least $1.50 per litre in more than 25% of petrol stations in the state specified by the sample.
From the foregoing, we conclude that using the criteria where motorists perceive fuel price to be expensive when the price of fuel is at least $1.50 at more than 25% petrol stations in a state, the price of the fuel was expensive on the day in the state specified. Comment by Badri Bhattarai: Support your answers with your excel outputs
Appendices Part B
Appendix B.1 – Statistical answer for Question 1
The random variables were defined as follows:
· X_ is a random variable representing the sample mean.
· Sigma represents the standard deviation of the data from the mean.
· N represents the number of entries or petrol stations in the sample.
The following assumptions were made in the calculation and inference of the data:
X ~ N(X_ , Sigma2) i.e. X follows a normal distribution with mean= X_ and variance Sigma2.
The interval test was chosen in this case. This is because with the descriptive statistics that were previously calculated it was easier and faster to use the interval method. Also, the interval method does not require much calculation in the event that the average for which the price of fuel has to be to be considered expensive changes from $1.50. in fact, all that will be needed is to check whether the new average falls in the interval or not and make a decision.
Hypothesis testing
Null hypothesis: The price of fuel is expensive. In other words, the average price is at least $1.50.
Alternative hypothesis: The price of fuel is not expensive. In other words, the average price is less than $1.50.
To test the above hypothesis, a confidence interval was constructed as shown below:
X_ ± sigma/ where X_= 145.3375 cents, sigma= 7.4538 and N=80
The 95% confidence interval was found to be [143.7074, 146.9709] cents.
When comparing the value 150 cents to the interval, it can be seen that the value falls outside the interval on the upper limit. Therefore, the null hypothesis is rejected.
For question 1, the excel output used was that of the descriptive statistics that are needed in the calculation of the interval.
|
Descriptive Summary |
|
|
|
|
|
|
Cents Per Litre |
|
Mean |
145.3375 |
|
Median |
145.85 |
|
Mode |
155.9 |
|
Minimum |
132.9 |
|
Maximum |
158.9 |
|
Standard Deviation |
7.4538 |
Interpretation of results: since we have failed to reject the null hypothesis, we conclude that the price of fuel is not expensive as per the criterion used in question 1.
Appendix B.2 – Statistical answer for Question 2
For question two, only one random variable was defined. X represents the individual price of fuel at each petrol station in the state.
The following logical function was used in excel: =IF(C2:C81>150,1,0) where the column C contained the price of fuel at each petrol station in cents. The column created by this logical function was then summed to find out the total number of stations which had at least a fuel price of 150 cents.
Hypothesis testing
Null hypothesis: the percentage of petrol stations with fuel price higher than 250 cents is greater than 25% hence fuel price is expensive.
Alternative hypothesis: the percentage of petrol stations with fuel price less than 250 cents is less than 25% hence fuel price is not expensive. Comment by Badri Bhattarai: ???
It was found had 24 petrol stations had fuel price higher than 150 cents. This translates to 30%. Therefore, we fail to reject the null hypothesis.
Interpretation of results: since we have failed to reject the null hypothesis, we conclude that the price of fuel is expensive as per the criterion used in question 2. The excel output is as shown below:
|
Town/Suburb |
Location |
Unleaded 91 (Cents per Litre) |
logic |
|
Albury |
Regional |
143.8 |
0.0 |
|
Bathurst |
Regional |
150.9 |
1.0 |
|
Bermagui |
Regional |
151.9 |
1.0 |
|
Bourke |
Regional |
155.9 |
1.0 |
|
Broken Hill |
Regional |
147.9 |
0.0 |
|
Casino |
Regional |
155.9 |
1.0 |
|
Coffs Harbour |
Regional |
153.9 |
1.0 |
|
Coonabarabran |
Regional |
153.9 |
1.0 |
|
Dorrigo |
Regional |
148.9 |
0.0 |
|
Drake |
Regional |
139.9 |
0.0 |
|
Evans Head |
Regional |
152.9 |
1.0 |
|
Glen Innes |
Regional |
152.9 |
1.0 |
|
Goulburn |
Regional |
145.8 |
0.0 |
|
Gunnedah |
Regional |
142.9 |
0.0 |
|
Halfway Creek |
Regional |
155.9 |
1.0 |
|
Kempsey |
Regional |
146.9 |
0.0 |
|
Lismore |
Regional |
155.9 |
1.0 |
|
Manilla |
Regional |
156.9 |
1.0 |
|
Moree |
Regional |
152.9 |
1.0 |
|
Mudgee |
Regional |
155.9 |
1.0 |
|
Mungindi |
Regional |
155.9 |
1.0 |
|
Muswellbrook |
Regional |
158.9 |
1.0 |
|
Narrabri |
Regional |
149.9 |
0.0 |
|
Newcastle West |
Regional |
154.9 |
1.0 |
|
Port Kembla |
Regional |
138.9 |
0.0 |
|
Port Macquarie |
Regional |
155.4 |
1.0 |
|
Queanbeyan |
Regional |
149.9 |
0.0 |
|
Tamworth |
Regional |
149.9 |
0.0 |
|
Tenterfield |
Regional |
146.7 |
0.0 |
|
Tenterfield |
Regional |
146.7 |
0.0 |
|
Tweed Heads |
Regional |
148.9 |
0.0 |
|
Ulladulla |
Regional |
144.7 |
0.0 |
|
Uralla |
Regional |
151.9 |
1.0 |
|
Waga Waga |
Regional |
154.9 |
1.0 |
|
Walgett |
Regional |
150.9 |
1.0 |
|
Wauchope |
Regional |
155.9 |
1.0 |
|
West Armidale |
Regional |
151.0 |
1.0 |
|
Woolgoolga |
Regional |
153.9 |
1.0 |
|
Wyong |
Regional |
141.7 |
0.0 |
|
Yamba |
Regional |
153.9 |
1.0 |
|
Alexandria |
Capital - Sydney |
143.9 |
0.0 |
|
Arncliffe |
Capital - Sydney |
136.9 |
0.0 |
|
Bankstown |
Capital - Sydney |
133.9 |
0.0 |
|
Baulkham Hills |
Capital - Sydney |
141.9 |
0.0 |
|
Bexley North |
Capital - Sydney |
135.9 |
0.0 |
|
Blacktown |
Capital - Sydney |
136.9 |
0.0 |
|
Bondi Junction |
Capital - Sydney |
148.4 |
0.0 |
|
Brighton Le Sands |
Capital - Sydney |
135.9 |
0.0 |
|
Brookvale |
Capital - Sydney |
146.4 |
0.0 |
|
Cabramatta |
Capital - Sydney |
142.9 |
0.0 |
|
Casula |
Capital - Sydney |
137.9 |
0.0 |
|
Croydon Park |
Capital - Sydney |
135.7 |
0.0 |
|
Fairfield |
Capital - Sydney |
135.9 |
0.0 |
|
Five Dock |
Capital - Sydney |
150.0 |
0.0 |
|
Forestville |
Capital - Sydney |
149.4 |
0.0 |
|
Granville |
Capital - Sydney |
132.9 |
0.0 |
|
Homebush |
Capital - Sydney |
135.8 |
0.0 |
|
Leppington |
Capital - Sydney |
135.9 |
0.0 |
|
Lewisham |
Capital - Sydney |
133.9 |
0.0 |
|
Lidcombe |
Capital - Sydney |
138.9 |
0.0 |
|
Maroubra |
Capital - Sydney |
143.9 |
0.0 |
|
Marrickville |
Capital - Sydney |
137.5 |
0.0 |
|
Miranda |
Capital - Sydney |
137.9 |
0.0 |
|
Mona Vale |
Capital - Sydney |
144.9 |
0.0 |
|
Mortdale |
Capital - Sydney |
136.9 |
0.0 |
|
North Ryde |
Capital - Sydney |
135.9 |
0.0 |
|
Northwood |
Capital - Sydney |
139.9 |
0.0 |
|
Pagewood |
Capital - Sydney |
148.4 |
0.0 |
|
Pennant Hills |
Capital - Sydney |
143.4 |
0.0 |
|
Petersham |
Capital - Sydney |
137.7 |
0.0 |
|
Punchbowl |
Capital - Sydney |
138.9 |
0.0 |
|
Quakers Hill |
Capital - Sydney |
139.9 |
0.0 |
|
Revesby |
Capital - Sydney |
133.9 |
0.0 |
|
Ryde |
Capital - Sydney |
140.9 |
0.0 |
|
Sydney |
Capital - Sydney |
138.7 |
0.0 |
|
Tarren Point |
Capital - Sydney |
140.4 |
0.0 |
|
Villawood |
Capital - Sydney |
134.7 |
0.0 |
|
West Hoxton |
Capital - Sydney |
145.9 |
0.0 |
|
Woolloomooloo |
Capital - Sydney |
146.9 |
0.0 |
|
Yagoona |
Capital - Sydney |
134.5 |
0.0 |
|
|
|
|
24.0 |
Do not cut my marks as I have been approved by my unit assessor because I have got the extension but I can’t be able to upload my assignment again thill the extension date so she reset my link. The attached copy of email you can see below thanks.
2
Sheet1
| Max Marks | Recommended Marks | |||
| Cover sheet or sample incorrect | -2.0 | |||
| Format incorrect, including name | -2.0 | |||
| Statistical Calculations | ||||
| Graph (Frequency Histogram or Polygon) | 4.0 | 4.0 | ||
| Descriptive Statistics | 4.0 | 4.0 | ||
| Total Descriptive Statistics | 8.0 | 8.0 | ||
| Written Answer (Component of a business report) | ||||
| Introduction and data | 2.0 | 0.0 | ||
| Comments on graph | 3.0 | 3.0 | ` | |
| Comments on descriptive statistics | 4.0 | 3.0 | ||
| Difference in measures of central tendency | 1.0 | 1.0 | ||
| Structure, grammar and spelling | 2.0 | 2.0 | ||
| Total Report | 12.0 | 9.0 | ||
| Total | 20.0 | 17.0 |
Sheet2
Sheet3
Max MarksMark
Cover sheet or sample incorrect-2
Format incorrect, including file name-2
Self-Marking and Reflection Part A (5 marks)
Self-Marking Part A22.0
Reflection32.0
Part B Statistical Inference Tasks (19 marks)
Statistical Inference Question 1
Choice of technique, assumptions & other required steps41.0
Calculation (Excel output)30.0
Conclusion20.0
Statistical Inference Question 2
Choice of technique, assumptions & other required steps50.0
Calculation (Excel output)30.0
Decision and conclusion20.0
Written task - Discussion and results (6 marks)
Question 121.0
Question 220.0
Structure, grammar and spelling21.0
Total Part B307.0
Sheet1
| Max Marks | Mark | |
| Cover sheet or sample incorrect | -2 | |
| Format incorrect, including file name | -2 | |
| Self-Marking and Reflection Part A (5 marks) | ||
| Self-Marking Part A | 2 | 2.0 |
| Reflection | 3 | 2.0 |
| Part B Statistical Inference Tasks (19 marks) | ||
| Statistical Inference Question 1 | ||
| Choice of technique, assumptions & other required steps | 4 | 1.0 |
| Calculation (Excel output) | 3 | 0.0 |
| Conclusion | 2 | 0.0 |
| Statistical Inference Question 2 | ||
| Choice of technique, assumptions & other required steps | 5 | 0.0 |
| Calculation (Excel output) | 3 | 0.0 |
| Decision and conclusion | 2 | 0.0 |
| Written task - Discussion and results (6 marks) | ||
| Question 1 | 2 | 1.0 |
| Question 2 | 2 | 0.0 |
| Structure, grammar and spelling | 2 | 1.0 |
| Total Part B | 30 | 7.0 |
Sheet2
Sheet3
Max MarksRecommended
Marks
Cover sheet or sample incorrect-2.0
Format incorrect, including name-2.0
Statistical Calculations
Graph (Frequency Histogram or Polygon)4.04.0
Descriptive Statistics4.04.0
Total Descriptive Statistics8.08.0
Written Answer (Component of a business
report)
Introduction and data2.00.0
Comments on graph3.03.0
Comments on descriptive statistics4.03.0
Difference in measures of central tendency1.01.0
Structure, grammar and spelling2.02.0
Total Report12.09.0
Total20.017.0