Data Mining - Computer Science

AssignmentClassification.docx

Home >Computer Science homework help >Data Mining - Computer Science

Problem 1

Consider the following training dataset with fifteen entries. Each entry has answers to a series of questions that ask if they liked a certain type of food, in which the participant answered (1) for yes or (0) for now. The last column(“midwest?”) is our target column, meaning that once the decision tree is built, this is the classification we are trying to guess, i.e.., if a person is from Midwest.

Create the entire decision tree for this dataset using Information Gain as the attribute selection measure. Make sure that you provide me with the entropy and the information gain for the attributes at each partitioning step and highlight attribute and its value that you chose at each step to partition the dataset like I did in the example that I provided.

Problem 2

Consider the below dataset.

age	income	student	credit_rating	buys_computer
<=30	high	no	fair	no
<=30	high	no	excellent	no
31…40	high	no	fair	yes
>40	medium	no	fair	yes
>40	low	yes	fair	yes
>40	low	yes	excellent	no
31…40	low	yes	excellent	yes
<=30	medium	no	fair	no
<=30	low	yes	fair	yes
>40	medium	yes	fair	yes
<=30	medium	yes	excellent	yes
31…40	medium	no	excellent	yes
31…40	high	yes	fair	yes
>40	medium	no	excellent	no

Classify the following data-point using Naïve Bayesian method. Show all relevant calculations similar to the example given in the slide.

X = (age > 40, Income = medium, Student = no, Credit_rating = Fair)

Problem 3

Consider the following confusion matrix

Actual Class \ Predicted Class	Cancer = yes	Cancer = no	Total
Cancer = yes	90	210	300
Cancer = no	140	9560	9700
Total	230	9770	10000

Find the following:

a. Accuracy

b. Sensitivity

c. Specificity

d. Precision

e. Recall

f. F1 measure