data mining ANLY600

Week5AssignmentInstructionsandSampleRcodes.docx

Home >Business & Finance homework help >Management homework help >data mining ANLY600

Week 5 Assignment Instructions and Sample R codes

Classification Trees Analysis

This assignment is to give you the hands-on experience using R to conduct logistic regression in real world data set. Please refer to the Chapter 9 in the reference textbook (through the link at the bottom under "Lessons") for details about how to generate classification tree models and the evaluate the model performances. Then open this website, go over the mushrooms.csv example and use the same R codes to reproduce the results step by step, study the way to explain the model and evaluate the results:

Step 1: Install and load libraries

Step 2: Import the data set

Step 3: Data Cleaning

Step 4: Data Exploration and Analysis

Step 5: Data Splicing

Step 6: Building a model

Step 7: Visualising the tree

Step 8: Testing the model

Step 9: Calculating accuracy

Now open this file mushrooms2.csv (slightly different from the sample dataset) and repeat the same analysis as in the website to conduct a classification tree analysis according to the above steps specifically. Please copy/paste screen images of your work in R, and put into a Word document for submission. Be sure to provide narrative of your answers (i.e., do not just copy/paste your answers without providing some explanation of what you did or your findings). Please include Introudction, R codes with outputs, Figures and explanations with cover and reference pages. A good conclusion to wrap up the assignment is also expected. Please follow APA formats as well.

Reference

https://www.edureka.co/blog/decision-tree-algorithm/

#Installing libraries

install.packages('rpart')

install.packages('caret')

install.packages('rpart.plot')

install.packages('rattle')

#Loading libraries

library(rpart,quietly = TRUE)

library(caret,quietly = TRUE)

library(rpart.plot,quietly = TRUE)

library(rattle)

#Reading the data set as a dataframe

getwd() # to see which working directory you are in?”

# set the working directory to your desktop , for example.”

setwd("C:/Users/alpha/Desktop")

mushrooms <- read.csv("mushrooms.csv")

# structure of the data

str(mushrooms)

# number of rows with missing values

nrow(mushrooms) - sum(complete.cases(mushrooms))

# deleting redundant variable `veil.type`

mushrooms$veil.type <- NULL

# analyzing the odor variable

> table(mushrooms$class,mushrooms$odor)

number.perfect.splits <- apply(X=mushrooms[-1], MARGIN = 2, FUN = function(col){

t <- table(mushrooms$class,col)

sum(t == 0)

})

# Descending order of perfect splits

order <- order(number.perfect.splits,decreasing = TRUE)

number.perfect.splits <- number.perfect.splits[order]

# Plot graph

par(mar=c(10,2,2,2))

barplot(number.perfect.splits,main="Number of perfect splits vs feature", xlab="", ylab="Feature", las=2, col="wheat")

#data splicing

set.seed(12345)

train <- sample(1:nrow(mushrooms),size = ceiling(0.80*nrow(mushrooms)),replace = FALSE)

# training set

mushrooms_train <- mushrooms[train,]

# test set

mushrooms_test <- mushrooms[-train,]

# penalty matrix

penalty.matrix <- matrix(c(0,1,10,0), byrow=TRUE, nrow=2)

# building the classification tree with rpart

tree <- rpart(class~.,

data=mushrooms_train,

parms = list(loss = penalty.matrix), method = "class")

# Visualize the decision tree with rpart.plot

rpart.plot(tree, nn=TRUE)

#Testing the model

pred <- predict(object=tree,mushrooms_test[-1],type="class")

#Calculating accuracy

t <- table(mushrooms_test$class,pred)

confusionMatrix(t)

pred