EX3.docx

EXERCISE 3

Purpose: To learn how to use the “Validation Set” approach to test the performance of a Cross-Validated Lasso model versus a full regression model versus a best subset regression model. We will be using R for this exercise. Your starter R program is EX3_Lasso.R and can be found on Canvas. Use set.seed = (1) to determine the 50% random split of the data into the training and test data sets. In addition to the hints that I provide to you in the program, I would suggest that you look at Labs 1 and 2 in Chapter 6 of the downloadable pdf file of the book An Introduction to Statistical Learning with Applications in R by James, et al.

a) Provide a L1 Norm graph of the Lasso regression coefficients as the L1 Norm decreases. What is this graph showing?

b) Provide a Cross-Validation graph of Lambda. What is this graph showing?

c) The input variables chosen by the best Lasso on the training data set are

_______________________________________________________________.

d) Test MSE for Best Lasso = _________________.

The first four parts should be directly derivable from the EX3_Lasso.R program that I have provided. The following parts are going to require some programming ingenuity on your part. I had to fumble around a little bit to get things to work but the following parts are definitely doable.

e) Test MSE for OLS using the full set of inputs = _________________.

f) Use the “regsubsets” routine to determine the 8 best inputs for a linear regression using the training data set. These variables are ___________________________ _________________________________________________________________.

g) Test MSE for Best Subset regression = ___________________.

h) Which technique provided the best test (Validation set) results? We you surprised?

i) Report the R code that you used to generate the answers to the above sections.