ITS 836 DATA SCIENCE AND BIG DATA ANALYSIS
School of Computer & Information Sciences
ITS 836 Data Science and Big Data Analytics
ITS 836
1
Week 11 HW Questions
Question 1: Analyze the data set Social_Network_Ads.csv
# Importing the dataset
dataset = read.csv('Social_Network_Ads.csv')
dataset = dataset[3:5]
# Encoding the target feature as factor
dataset$Purchased = factor(dataset$Purchased, levels = c(0, 1))
ggplot(dataset, aes(x=dataset$Age, y=dataset$EstimatedSalary, color=dataset$Purchased)) + geom_point()
ITS 836
2
Improve this plot by:
1a) adding appropriate labels,
Axes titles
1b) Printout your 1st name on all graphs in R.
1c) what does the data say?
Wk11 Question 2: Compare the classifiers
Question 2: using the following classifiers
Naïve Bayes
Logistic Regression
Decision Trees
KNN
Support Vector Machine
Random Forest
ITS 836
3
For each classifier show
2a) The classifier boundary for training and test – use ggplot to graph the boundary
2b) Printout your 1st name on all graphs in R
2c) compare all algorithms
2d) compare the confusion Matrix
Wk11 Question 3 Comment on the following table
ITS 836
4
Wk11: Question 4: Springleaf Market Response https://www.kaggle.com/c/springleaf-marketing-response
Springleaf offers their customers personal and auto loans.
Direct mail is a way Springleaf's team connects with potential customers
Direct is a fundamental part of Springleaf's marketing strategy.
Springleaf must be sure they are focusing on the right customers
Who are likely to respond and be good candidates for their services.
ITS 836
5
Springleaf is asking you to predict
which customers will respond to a direct mail offer.
Question 4a: Clean the data for missing values
Question 4b: Statistically analyze the data
Question 4c: Apply logistic regression with CM
Question 4d: Apply Random Forest
Question 4e: Apply XGBoost
Printout your 1st name on all graphs in R.
Questions?
ITS 836
6