Introduction to Data Mining
ITS 632 – Introduction to Data Mining
1. Short-answer questions (70 points, 10 points each)
a. Briefly describe why clustering is one kind of unsupervised learning
b. Briefly describe how a K-means clustering works
c. Briefly describe the main difference between K-means and K-medoid methods.
d. In data mining, one of the fields is outlier analysis. Explain what is an outlier? Are outliers noise data?
e. A good clustering method will produce high quality clusters. What criteria can we use to judge where clusters are high quality clusters?
f. List out at least two drawbacks of K-means clustering approach.
2. Given the following distance matrix of four data points 1, 2, 3, and 4:
( Requirement: Report all the partial trees and matrices for the intermediate steps.)
|
|
1 |
2 |
3 |
4 |
|
1 |
0 |
|
|
|
|
2 |
20 |
0 |
|
|
|
3 |
5 |
25 |
0 |
|
|
4 |
15 |
40 |
20 |
0 |
Perform hierarchical clustering using single-linkage, complete linkage, and average linkage similarity measures (30 points);