Introduction to data mining

profileyk1993
QUESTION3.pdf

Table 5.10. Data set for Exercise 7. Record A B C Class

1 2 e

t

o

7 8 9 10

0 0 0 0 0 1 1 1 1 I

0 0 I 1 0 0 0 0 1 0

0 1 1 I 1 1 1 I

1 1

+

+ +

+ +

318 Chapter 5 Classification: Alternative Techniques

(b) Given the information in part (a), is a randomly chosen college student more likely to be a graduate or undergraduate student?

(c) Repeat part (b) assuming that the student is a smoker.

(d) Suppose 30% of the graduate students live in a dorm but only l0% of the undergraduate students live in a dorm. If a student smokes and lives in the dorm, is he or she more likely to be a graduate or undergraduate student? You can assume independence between students who live in a dorm and those who smoke.

7. Consider the data set shown in Table 5.10

Estimate the conditional probabilities for P(Al-l_), P(Bi+), P(Cl+), P(Al-), P(Bl-) , and P(Cl-) .

Use the estimate of conditional probabilities given in the previous question to predict the class label for a test sample (A:0,8 - I,C :0) using the naive Bayes approach.

(c) Estimate the conditional probabilities using the m-estimate approach, wi th p : I /2 and m:4.

(d) Repeat part (b) using the conditional probabilities given in part (c).

(e) Compare the two methods for estimating probabilities. Which method is better and why?

8. Consider the data set shown in Table 5.11.

(a) Estimate the conditional probabilities for P(A : 1l+), P(B : 11a), P(C : 1 l+) , P( .4 : 1 l - ) , P(B : 1 l - ) , and P(C : 1 l - ) us ing the same approach as in the previous problem.

(a)

(b)