clustering especially K-Means Using Python programming and Anaconda
HW: Clustering MCIS-6273: Data Mining
This assignment is to give you a basic understanding of clustering especially K-Means Using Python programming and Anaconda.
Please download the dataset Mall_Customers.csv from blackboard. It will be used for solving this assignment.
Using K-Means
Part1: [10 points] First, read the data set into your code. Save two data features to X. (Please pick the fourth feature (Annual Income (k$)) and the fifth feature (Spending Score(1-100)), in this case we can visualize the clusters.) Please do the following:
1. Use the elbow method to find the optimal number of clusters 2. Fit K-Means to the dataset by using the optimal number of clusters found by the
elbow method 3. Predict the clustering results y for data set X 4. Visualizing the clusters results, please use different color for different clusters.
1. title, x label, y label should be specified. 2. The legend should be included.
Part2: [10 points] Repeat the steps in Part1 but now pick the second feature (Gender) and the third feature (Age) in your work to visualize the clusters. [This part may be trickier.]
Guidelines: • This assignment is to be solved in groups of two students, not more. • You only need to deliver a PDF report that is nicely formatted with: [5 points]
◦ Title page: Title and Group Names ◦ ToC page: ◦ Pages should be numbered and numbers show in the ToC ◦ A snapshot of each of the figures as described below, please see the Notes.
▪ Each snapshot has to have a caption, 10 words, describing the picture. ◦ Only one report per group should be submitted ◦ No need to submit any code
Notes: • For reading and handling the data and guide your work, you will be given the code
example_3D.py and data 3D_network.csv. ◦ You should run the code and understand what it does first. ◦ Also, you will be given a code file named: practice_blobs.py. You can run the code in
Anaconda and see how the output and the different steps should be performed so you know what to do.
◦ The codes run with no issues so any issues running the code is your responsibility to resolve
• To know more about the Elbow Method mentioned above for choosing the right number of clusters, please check: https://www.geeksforgeeks.org/elbow-method-for-optimal-value- of-k-in-kmeans/
• The report you will submit should have the figures below. ◦ To give you an idea, running the practice_blobs.py gives the following output: [arrows
for output order]
predicted group: 2 distance from center 0 is: 3.731771999479638 distance from center 1 is: 6.290334770382815 distance from center 2 is: 3.382224740457218 distance from center 3 is: 7.132308122920062