DATA MINING

Nikroshitha
Description.docx

Please click on the link above to submit this week's assignment.

Find a dataset suitable for clustering and use Orange, Weka, or IPython Notebook to find a good set of clusters. Try using various kinds of clustering methods (at least two) and compare the outcomes. Experiment with different parameters of the clustering methods to see which yield the best results. For each method's clustering output, compute descriptive statistics for each cluster and visualize the results. Describe how the clusters differ from each other and how they are alike. Attempt to describe each cluster with a short label that would fit the instances you likely to find in that cluster. Note - you don't need to use all features to do the clustering. Sometimes, it's more appropriate to cluster based on only several features (columns). Remember that if there are too many features, you will encounter the curse of dimensionality and clustering will be difficult.

Describe the data, methodology, and results in a formal technical report. Use the attached template. Make sure to include figures and tables that describe the process and the outcomes, and reference them from the text. Submit your report using a PDF format.

Grading Rubric (25 points total): 0-5: Data (suitable for problem, sufficiently large, non-trivial) 0-5: Methodology (appropriate methods and metrics used) 0-5: Results (non-trivial, interesting, data-driven results) 0-5: Presentation (well written report, good use of figures and tables, used references when appropriate, no spelling or grammar mistakes) 0-5: Following directions (submission format, software used, etc.)

Due NEXT Sunday by 11:59PM CST.