statistics

profileniannian
GraphicsChallenge.docx

Graphics Challenge

Due May 7 at 11:59 p.m.

The Challenge

Your challenge is to use the anorexia data set to obtain a set of plots similar to the ones below. These were done for a different data set, and the code to obtain them is given in this document.

I apologize for the mirror image. When I took a screen shot from the website where I obtained this, this is what happened. For what I really want, see this link:

https://www.andrew.cmu.edu/user/achoulde/94842/lectures/lecture10/lecture10-94842.html#factors-in-linear-regression, and scroll down to the sub-heading “Is an interaction term significant?”

The Deliverable

The deliverable is ONE Word document or PDF that contains plots like the ones above, but for the anorexia data set. Extra points will be given for use of different colors, plotting symbols, or background. You are to give an interpretation of the plots, including an assessment of the statistical significance of the interaction terms. Please also include your code.

Grading (30 points possible)

10 points: Graphics –side by side, one with interaction and one without

10 points: Interpretation, including statistical significance of terms

5 points: Inclusion of code

5 points: Fudge factor for making the graphics prettier than the result using the default settings.

Points deducted for uploading more than one file.

The Code

The code is from a course at Carnegie Mellon University called “Programming in R for Analytics”. It uses a data set found in the MASS library for R. You can obtain the data using the R code below.

# First load the necessary packages

library(MASS)

library(plyr)

library(ggplot2)

library(knitr)

data(birthwt) # load the data

help(birthwt) # get information on the data set

# Code for graphics for model without interaction variable (copied directly from the CMU website)

birthwt.lm <- lm(birthwt.grams ~ race + mother.age, data = birthwt)

intercepts <- c(coef(birthwt.lm)["(Intercept)"],
                coef(birthwt.lm)["(Intercept)"] + coef(birthwt.lm)["raceother"],
                coef(birthwt.lm)["(Intercept)"] + coef(birthwt.lm)["racewhite"])
lines.df <- data.frame(intercepts = intercepts,
                       slopes = rep(coef(birthwt.lm)["mother.age"], 3),
                       race = levels(birthwt$race))
qplot(x = mother.age, y = birthwt.grams, color = race, data = birthwt) + 
  geom_abline(aes(intercept = intercepts, 
                  slope = slopes, 
                  color = race), data = lines.df)

# Code for graphics for model with interaction variable (copied directly from the CMU website).

birthwt.lm.interact <- lm(birthwt.grams ~ race * mother.age, data = birthwt)
plot.interact <- qplot(x = mother.age, y = birthwt.grams, color = race, data = birthwt) + stat_smooth(method = "lm", se = FALSE, fullrange = TRUE)
grid.arrange(plot.interact, plot.complex, ncol = 2)