Data Mining Questions
QUESTION 1
Suppose that you are employed as a data mining consultant for an Internet search engine company. Describe how data mining can help the company by giving specific examples of how techniques, such as clustering, classification, association rule mining, and anomaly detection can be applied.
QUESTION 2
Identify at least two advantages and two disadvantages of using color to visually represent information.
QUESTION 3
Consider a group of documents that has been selected from a much larger set of diverse documents so that the selected documents are as dissimilar from one another as possible. If we consider documents that are not highly related (connected, similar) to one another as being anomalous, then all of the documents that we have selected might be classified as anomalies. Is it possible for a data set to consist only of anomalous objects or is this an abuse of the terminology?
Question 4
Consider a group of documents that has been selected from a much larger set of diverse documents so that the selected documents are as dissimilar from one another as possible. If we consider documents that are not highly related (connected, similar) to one another as being anomalous, then all of the documents that we have selected might be classified as anomalies. Is it possible for a data set to consist only of anomalous objects or is this an abuse of the terminology?
(a) Is there a difference between the two sets of points? Please explain.
(b) If so, which set of points will typically have a smaller SSE for K=10 clusters?
(c) What will be the behavior of DBSCAN on the uniform data set?
Question 5
Give an example of a data set consisting of three natural clusters, for which (almost always) K-means would likely find the correct clusters, but bisecting K-means would not.
6 years ago
3
Purchase the answer to view it

- DataMining.docx
- DataMininganswers.docx
Purchase the answer to view it

- datamining.docx
- datamining.docx
- Doctor Caitlin
- I need help with this
- In a one-page paper, describe the key components of the Genetic Information Nondiscrimination Act of 2008 (H.R. 493). Information is available online at http://www.genome.gov/24519851. Review the information found under Information for Researchers and Hea
- Economics Paper
- Complete the following homework scenario: • Using only.gov Websites report the current GDP, the current Federal deficit, the current Federal debt, and the bottom line of the current (last) budget approved by Congress (surplus or shortage). Note that the f
- two reading summary
- The Island of Doctor Moreau
- Corporate Governance (To Achieve Optimal Capital Structure)
- Management homework
- entomological application