Case Study- Report
1
Example
Statistics Investigation Report
Module: FC301- Statistics
Student Name: xxxxx
Student ID: xxxxx
Tutor(s) name(s): xxxxx and xxxxx
Date: dd/mm/yyyy
2
Pedestrians killed in collision with car, pick-up truck or van Deaths (US) (CDC)
correlates with
Drivers killed in collision with car, pick-up truck or van Deaths (US) (CDC)
Years Pedestrians killed in collision with car, pick-
up truck or van Deaths (US) (CDC)
(x)
Drivers killed in collision with car, pick-
up truck or van Deaths (US) (CDC)
(y)
1999 2877 2988
2000 2909 2996
2001 3005 3288
2002 3078 3594
2003 2980 3412
2004 2790 3210
2005 2728 2839
2006 2764 2726
2007 2476 2429
2008 2112 1937
2009 1999 1672
2010 1838 1314
Total 31556 32405
Part 1: For your independent variable x:
(a)
Minimum Value 1838
Maximum Value 3078
Range 1240
Q1 2112
Median 2777
Q3 2980
Interquartile Range 868
Mode \
Mean 2629.67
Standard Deviation 409.7267382
3
(b)
F CF RF PRF
1500≤x<2000 2 2 0.17 17%
2000≤x<2500 2 4 0.17 17%
2500≤x<3000 6 10 0.5 50%
3000≤x<3500 2 12 0.17 17%
Total 12 12 1 100%
(c)
Pie Chart
Bar Chart
17%
17%
50%
17%
1500≤x<2000 2000≤x<2500 2500≤x<3000 3000≤x<3500
0 13 25 38 50 63
1500≤x<2000
2000≤x<2500
2500≤x<3000
3000≤x<3500
4
(d) As can be seen from the bar chart, the third line (2500-3000) is the longest, and
the other three are shorter but all the same. It can be seen from the pie chart that the
number of deaths of pedestrians who collide with cars, small trucks or vans is 2500-
3000, accounting for half of all years. However, the percentages of the other three
groups of data are the same, which is 17%. These results show that half of the years
have more than 2,500 deaths.
Part 2:For your independent variable x:
(a)
Minimum Value 1314
Maximum Value 3594
Range 2289
Q1 1937
Median 2913.5
Q3 3288
Interquartile Range 1351
Mode \
Mean 2007.42
Standard Deviation 1022.673183
(b) As can be seen from a series of calculations, the minimum number is 1,314 and
the maximum number is 3,594. We can see that the difference between the maximum
and minimum of the death toll is close to 2,000. Prove the difference between the
lowest and highest scores of 2289 people. The quartile range between the quartile
(Q1 = 1937) and the upper quartile (Q3 = 3288) is 1351. In addition, the median
score was 2913.5, meaning that half of the number of deaths in the year was higher
than 2913.5, while the other half was below 2913.5. In addition, by observing the
quartile, we can assume that 25% of students scored below 47.75 in the exam. And
25% of students scored higher than 85.25. The number of deaths per year is different,
so there is no mode. However, the average score is 2007.42. The standard deviation
of s = 1022.673183 is large, which means that the score varies greatly. In addition,
the mean, mode, and median are not equal, indicating the fact that the data is not
normally distributed. After further investigation, we may think that the death toll is a
5
negative deviation because the distance between Q3 and Q2 is less than the distance
between Q2 and Q1.
Part 3: Comments
x y xy x² y²
2877 2988 8596476 8277129 8938144
2909 2996 8715364 8462281 8976016
3005 3288 9880440 9030025 10810944
3078 3594 11062332 9474084 12916836
2980 3412 10167760 8880400 11641744
2790 3210 8955900 7784100 10304100
2728 2839 7744792 7441984 8059921
2764 2726 7534664 7639696 7431076
2476 2429 6014204 6130576 5900041
2112 1937 4090944 4460544 3751969
1999 1672 3342328 3996001 2795584
1838 1314 2415132 3378244 1726596
Total 31556 32405 88520336 84955064 93252971
(a) Sxy = Σxy - ΣxΣy/n
= 88520336 - 31556×32405/12
= 3305987.667
Sxx = Σx² - (Σx)²/n
= 84955064 - 31556²/12
= 1973302.667
Syy = Σy² - (Σy)²/n
= 93252971 - 32405²/12
= 5745968.917
r = Sxy /√SxxSyy
So r = 0.98
(b)
6
b = Sxy/Sxx
= 17943363.67/1973302.667
= 9.093
a = ȳ - bx̄
= 2007.42 - 9.093×2629.67
= - 21904.16931
y = bx + a
y = 9.093x - 21904.16931
Because b > 0, the slope of the line is positive, meaning that the regression line and the correlation
between x and y are both positive.
(c)
We can see that the horizontal axis represents the pedestrian collision with a car,
small truck or van death (US) (CDC), and the vertical axis represents the death of a
driver who collided with a car, pickup truck or van death (US) (CDC). r equal 0.98,
so this is strong positive linear correction.
(d) From the calculation of Pearson's moment correlation coefficient (r = 0.98), we
can find the number of people killed in the death of a car, pickup truck or van and the
number of deaths of drivers who collide with cars, small trucks or vans. There is a
strong positive correlation, and the more people who die in the death of a car, pickup
truck or van, the more people die when the driver collides with a car, a small truck or
a van. In addition, based on the calculation of the measured coefficients (r² = 0.96), it
can be assumed that the 96% change in the number of deaths of drivers who died in a
car, small truck or van death can be passed by pedestrians in the event of a death with
a car, pickup or van. The change in the number of people killed in the death is ex-
plained. Observing the y-intercept (a = -21904.16931), it can be suggested that if the
Drivers killed
in collision
with car, pick-
up truck or
van Deaths
(US) (CDC)
7
number of people killed in a car, truck or van death is 0, they will die when the driver
collides with a car, a small truck or a van. The number of people is - 21904.16931;
and the calculation of the gradient (b = 9.093) indicates that the number of people
killed in the death of a car, pickup truck or van increases by one unit, and the number
of deaths of drivers who collide with cars, small trucks or vans will increase by 9.093
units.