Datamining assignment

E_RK

Answer the following questions. Please ensure to use the Author, YYYY APA citations with any content brought into the assignment.

For sparse data, discuss why considering only the presence of non-zero values might give a more accurate view of the objects than considering the actual magnitudes of values. When would such an approach not be desirable?
Describe the change in the time complexity of K-means as the number of clusters to be found increases.
Discuss the advantages and disadvantages of treating clustering as an optimization problem. Among other factors, consider efficiency, non-determinism, and whether an optimization-based approach captures all types of clusterings that are of interest.
What is the time and space complexity of fuzzy c-means? Of SOM? How do these complexities compare to those of K-means?
Explain the difference between likelihood and probability.
Give an example of a set of clusters in which merging based on the closeness of clusters leads to a more natural set of clusters than merging based on the strength of connection (interconnectedness) of clusters.