Data Mining - Computer Science

tonnny
AssignmentClassification.docx

Problem 1

Consider the following training dataset with fifteen entries. Each entry has answers to a series of questions that ask if they liked a certain type of food, in which the participant answered (1) for yes or (0) for now. The last column(“midwest?”) is our target column, meaning that once the decision tree is built, this is the classification we are trying to guess, i.e.., if a person is from Midwest.

Create the entire decision tree for this dataset using Information Gain as the attribute selection measure. Make sure that you provide me with the entropy and the information gain for the attributes at each partitioning step and highlight attribute and its value that you chose at each step to partition the dataset like I did in the example that I provided.

Problem 2

Consider the below dataset.

age

income

student

credit_rating

buys_computer

<=30

high

no

fair

no

<=30

high

no

excellent

no

31…40

high

no

fair

yes

>40

medium

no

fair

yes

>40

low

yes

fair

yes

>40

low

yes

excellent

no

31…40

low

yes

excellent

yes

<=30

medium

no

fair

no

<=30

low

yes

fair

yes

>40

medium

yes

fair

yes

<=30

medium

yes

excellent

yes

31…40

medium

no

excellent

yes

31…40

high

yes

fair

yes

>40

medium

no

excellent

no

Classify the following data-point using Naïve Bayesian method. Show all relevant calculations similar to the example given in the slide.

X = (age > 40, Income = medium, Student = no, Credit_rating = Fair)

Problem 3

Consider the following confusion matrix

Actual Class \ Predicted Class

Cancer = yes

Cancer = no

Total

Cancer = yes

90

210

300

Cancer = no

140

9560

9700

Total

230

9770

10000

Find the following:

a. Accuracy

b. Sensitivity

c. Specificity

d. Precision

e. Recall

f. F1 measure