Data mining programming assignment

profilesandyC
DMHW6.pdf

CS 5630, Data Mining

Spring 2020 Homework 6

Please work on the data set OJ that is part of the package ISLR and use 123 as the seed for all the necessary parts.

(a) [10pts] Create a training set containing a random sample of 800 observations, and a test set con- taining the remaining observations.

(b) [10pts] Fit a Naive Bayes classifier to the training data with Purchase as the response and the other variables as predictors. What are the training error rate and test error rate?

(c) [10pts] Fit a support vector classifier to the training data in part (a) using cost=0.01 with Purchase as the response and the other variables as predictors. What are the training and test error rates?

(d) [10pts] Use the tune( ) function to select an optimal cost. Consider cost values 0.001, 0.01, 0.1, 1, 10, 100.

(e) [10pts] Compute the training and test error rates using the best model frpm part (d).

(f) [10pts] Repeat parts (c) using a support vector machine with a radial kernel. Use the default value of gamma.

(g) [10pts] Repeat parts (f) using a support vector machine with a polynomial kernel with degree=2.

(h) [10pts] Consider parts (e, f, g), which approach has the best test error rate?