Statistics Data Project using Rstudio

profilemousasaleh8
DataProjExamplewithoutCode.docx

STT 315 Data Project

Student Name Here!

For the first part of this example project, I am going to use a dataset provided in the base R installation called mtcars to explore some graphical and numerical descriptive measures of variables in the dataset.

According to the help documentation for this dataset, the data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).

The variables I am going to look at in my report are (list all variables in your data):

Variable

Type

Description

mpg

quantitative

Miles per gallon

wt

quantitative

Weight of the vehicle (1000 lbs)

am

qualitative

Automatic (0) or Manual (1) transmission

Below is a bar chart for the qualitative variable am, which identifies whether a car has an automatic (0) transmission or a manual (1) transmission.DataProjExample_files/figure-docx/unnamed-chunk-2-1.png

From the graph we can see that even in 1974, automatic transmissions were more common than manual transmissions. About 60% of the vehicles in the dataset has automatic transmission.

The following graph is a frequency histogram of the quantitative variable mpg, which measures the miles per gallon.

We note here that the distribution for miles per gallon seems to skew to the right, which is an indication that most of the cars during this period were less environmentally friendly with low mpg. In fact, we can see that most cars got fewer than 25 miles per gallon.

For the other quantitative variable, weight, I have constructed a relative frequency histogram, which shows the distribution of weights for the cars in our data.DataProjExample_files/figure-docx/unnamed-chunk-4-1.png

From the histogram we can see that most cars weighed less than 4000 pounds. The distribution as shown by the histogram is indicating the presence of outliers, with a slight left tailed structure.

The summary statistics for the variable weight are shown in the table below.

Statistic

Value

Mean

3.21725

Median

3.325

Standard deviation

0.9784574

Variance

0.957379

Range

3.911

Q1

2.58125

Q3

3.61

Interquartile Range

1.02875

n

32

The box plot for the variable weight below as shows the outliers at the upper end and the median is shifted in a mange that suggest a left skew distribution. DataProjExample_files/figure-docx/unnamed-chunk-5-1.png

From the graph we can see that 2 vehicles were classified as outliers with weights over 5000 pounds.

To investigate the relationship between mileage and transmission type, I created a side-by-side boxplot of mileage by transmission.

DataProjExample_files/figure-docx/unnamed-chunk-6-1.png

From the graph we can see that cars with automatic transmissions got much better gas mileage, since nearly all of the cars with automatic transmissions got better mileage than the median mileage of vehicles with manual transmissions. For us to check whether this difference is statistically significant and not just by chance, we will perform a formal test in the second part of this project.

To investigate the relationship between weight and mileage, I created a scatter plot with weight as the x-variable and mpg as the y-variable.DataProjExample_files/figure-docx/unnamed-chunk-7-1.png

From the graph we can see a negative linear relationship between weight and miles per gallon. Heavier cars use more gasoline. We will look at the strength of this relationship in part 2 of the project and fit a regression model to predict the Miles per Gallon of a car with a given weight.

Part II:

Based on a researcher’s claim that the average horse power engine for cars in 1974 was less than 150, we constructed an hypothesis test to validate this claim. The resulting p-value for our one-sided test is 0.3932 which suggest that we do not have sufficient evidence to say that the true mean horse power for cars in 1974 was significantly less than 150 horse-power. The result for our analysis is given below:

data: mtcars$hp

t = -0.2733, df = 31, p-value = 0.3932

alternative hypothesis: true mean is less than 150

95 percent confidence interval:

-Inf 167.2377

To verify that automatic transmissions have significantly better MPG than manual transmissions, we compare the performance on average and see whether there is any difference in the averages. The resulting p-value is significantly less than 5% (0.000285) which tell us that there is a statistical significance in the difference between MPGs for automatic and manual transmissions with a 95% reliability. The result for the test is given below:

Two Sample t-test

data: mtcars$mpg by mtcars$am

t = -4.1061, df = 30, p-value = 0.000285

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-10.84837 -3.64151

sample estimates:

mean in group 0 mean in group 1

17.14737 24.39231

From the scatter diagram in part 1 above, we observed that there is a linear relationship between MPG and weight of a car. We which to investigate this further for us to justify a linear statistical model to predict the MPG of a car with a given weight. There is a strong negative linear relationship between the two variable with correlation coefficient of -0.868 with about 75% of variability in MPG explained by the weight of a car. The linear model is also a good model to predict the MPG of a car since the p-value for the slope is almost zero. The graph below gives the fitted line of the most and the associated result follows.

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 37.2851 1.8776 19.858 < 2e-16 ***

wt -5.3445 0.5591 -9.559 1.29e-10 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.046 on 30 degrees of freedom

Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446

F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10

This is a good model to predict the MPG of a car with given weight.