Introduction to Data Mining

Assignment.docx

ITS 632 – Introduction to Data Mining

1. Short-answer questions (70 points, 10 points each)

a. Briefly describe why clustering is one kind of unsupervised learning

b. Briefly describe how a K-means clustering works

c. Briefly describe the main difference between K-means and K-medoid methods.

d. In data mining, one of the fields is outlier analysis. Explain what is an outlier? Are outliers noise data?

e. A good clustering method will produce high quality clusters. What criteria can we use to judge where clusters are high quality clusters?

f. List out at least two drawbacks of K-means clustering approach.

2. Given the following distance matrix of four data points 1, 2, 3, and 4:

( Requirement: Report all the partial trees and matrices for the intermediate steps.)

Perform hierarchical clustering using single-linkage, complete linkage, and average linkage similarity measures (30 points);