EXERCISE 3
Purpose: To learn how to use the “Validation Set” approach to test the performance of a Cross-Validated Lasso model versus a full regression model versus a best subset regression model. We will be using R for this exercise. Your starter R program is EX3_Lasso.R and can be found on Canvas. Use set.seed = (1) to determine the 50% random split of the data into the training and test data sets. In addition to the hints that I provide to you in the program, I would suggest that you look at Labs 1 and 2 in Chapter 6 of the downloadable pdf file of the book An Introduction to Statistical Learning with Applications in R by James, et al.
a) Provide a L1 Norm graph of the Lasso regression coefficients as the L1 Norm decreases. What is this graph showing?
b) Provide a Cross-Validation graph of Lambda. What is this graph showing?
c) The input variables chosen by the best Lasso on the training data set are
_______________________________________________________________.
d) Test MSE for Best Lasso = _________________.
The first four parts should be directly derivable from the EX3_Lasso.R program that I have provided. The following parts are going to require some programming ingenuity on your part. I had to fumble around a little bit to get things to work but the following parts are definitely doable.
e) Test MSE for OLS using the full set of inputs = _________________.
f) Use the “regsubsets” routine to determine the 8 best inputs for a linear regression using the training data set. These variables are ___________________________ _________________________________________________________________.
g) Test MSE for Best Subset regression = ___________________.
h) Which technique provided the best test (Validation set) results? We you surprised?
i) Report the R code that you used to generate the answers to the above sections.