statistical analysis

profileyu1q2w3e
term_project_example2.Rmd

--- title: "Better Mileage--Automatic or Manual?" author: "Lorem Ipsum" date: "February 4, 2019" output: pdf_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Research Question Since the dawn of time, humans have argued about their automobiles. One of the questions that has been a particular sore spot is the issue of transmission type and gas mileage. Early cave paintings from the Loire Valley in France suggest that some felt strongly that automatic transmissions offered greater fuel efficiency whereas others insisted that the manual, or standard, variety was better. This debate festered for centuries and finally boiled over in the 1500s with the outbreak of hostilities in the War of Eastern Provo. Unfortunately that conflict did not bring about a peaceful resolution and the debate lingers. The purpose of this report is to settle this conundrum conclusively. The data for this project is/are taken from the "mtcars" dataset which is included in the base package of R. The first step is to look at a summary of the data: ```{r mtcars, echo = FALSE} summary(mtcars) ``` ## Including Plots Plotting the data is an excellent way for getting to know the relationship between the variables. R offers three approaches to doing this: base graphics, Lattice package, and the ggplot2 package. The base graphics option is probably the easiest to learn. It can produce high quality images but is somewaht limited. The Lattice package is used primarily in scientific research and publication so we'll leave that for another day. The ggplot2 (gg stands for grammar of graphics) approach has a bit of learning curve but offers greater flexibility in producing great images. The following code shows you how to create a scatter plot in the base package and in ggplot. We'll keep it simple by plotting "miles per gallon" on the y-axis and engine displacement on the x-axis. ```{r, echo = FALSE, fig.height = 4, fig.width = 6} plot(mtcars$disp, mtcars$mpg, ylab = "Miles per Gallon", xlab = "Engine Displacement", main = "Scatterplot", pch = 19, col = "red") ``` And now for the ggplot2 equivalent. Notice it's a little more complicated but you end up with more flexibility in creating your graphics. ```{r, echo = FALSE, fig.height = 4, fig.width = 6} library(ggplot2) ggplot(data = mtcars, aes(x = disp, y = mpg)) + geom_point(color = "red") + labs(x = "Engine Displacement", y = "Miles per Gallon", title = "Scatterplot") + theme(plot.title = element_text(hjust = 0.5)) ``` Another good visualization when comparing the means of two groups is the boxplot. Here's a boxplot in ggplot2. The code is found in the R Markdown document. Please feel free to steal it. ```{r, echo =FALSE} mtcars$manual <- ifelse(mtcars$am == 1, "Manual", "Automatic") ggplot(mtcars, aes(x = am, y = mpg, group = as.factor(manual), fill = as.factor(manual))) + geom_boxplot() + labs(x = "", y = "Miles per Gallon", title = "Miles per Gallon by Transmission Type") + theme(legend.title = element_blank(), plot.title = element_text(hjust = 0.5), axis.text.x = element_blank()) ``` \newpage ## Regression Analysis The point of this exercise is to analyze whether there is a difference. As we'll learn later this semester, linear regression is a great tool for looking at these types of problems. Although the details aren't discussed here, the results are given in the table below. ```{r, echo = FALSE, comment = NA, message = FALSE, warning = FALSE, tidy = TRUE, cache = TRUE, results = "asis"} library(stargazer) lm1 <- lm(mpg ~ disp + hp + am, data = mtcars) stargazer(lm1, type = "latex", title = "Regression Analysis of MPG by Transmission Type",header = FALSE) ``` This analysis shows that, on average, a manual transmission gets about 3.796 more miles to the gallon than an automatic. This is model omits several important variables (like number of cylinders) so it shouldn't be taken too seriously. Hopefully this brief document gives you an idea as to how to do the term project. You are under no obligation to use R Markdown but I would recommend it if you are thinking about "leveling up" your analytics skills.