ITS 836 DATA SCIENCE AND BIG DATA ANALYSIS

profileVolk12
ITS836Week11HWClassifierComparison1.pptx

School of Computer & Information Sciences

ITS 836 Data Science and Big Data Analytics

ITS 836

1

Week 11 HW Questions

Question 1: Analyze the data set Social_Network_Ads.csv

# Importing the dataset

dataset = read.csv('Social_Network_Ads.csv')

dataset = dataset[3:5]

# Encoding the target feature as factor

dataset$Purchased = factor(dataset$Purchased, levels = c(0, 1))

ggplot(dataset, aes(x=dataset$Age, y=dataset$EstimatedSalary, color=dataset$Purchased)) + geom_point()

ITS 836

2

Improve this plot by:

1a) adding appropriate labels,

Axes titles

1b) Printout your 1st name on all graphs in R.

1c) what does the data say?

Wk11 Question 2: Compare the classifiers

Question 2: using the following classifiers

Naïve Bayes

Logistic Regression

Decision Trees

KNN

Support Vector Machine

Random Forest

ITS 836

3

For each classifier show

2a) The classifier boundary for training and test – use ggplot to graph the boundary

2b) Printout your 1st name on all graphs in R

2c) compare all algorithms

2d) compare the confusion Matrix

Wk11 Question 3 Comment on the following table

ITS 836

4

Wk11: Question 4: Springleaf Market Response https://www.kaggle.com/c/springleaf-marketing-response

Springleaf offers their customers personal and auto loans.

Direct mail is a way Springleaf's team connects with potential customers

Direct is a fundamental part of Springleaf's marketing strategy.

Springleaf must be sure they are focusing on the right customers

Who are likely to respond and be good candidates for their services.

ITS 836

5

Springleaf is asking you to predict

which customers will respond to a direct mail offer.

Question 4a: Clean the data for missing values

Question 4b: Statistically analyze the data

Question 4c: Apply logistic regression with CM

Question 4d: Apply Random Forest

Question 4e: Apply XGBoost

Printout your 1st name on all graphs in R.

Questions?

ITS 836

6