need help with quiz
Final Term Exam ANLY 506 Late Summer - 2 hours 50 pts
Last Name _____________ First Name_____________ Course (circle) 51 / 52
Instructions: Please Read!
· Phones are not allowed during the exam
· Materials: You can use any course material in digital or printed form (lectures, books). All code must be executed in R. You are NOT allowed to browse kaggle, medium blogs...
· Code: All answers that require code will be marked with *CODE*. Note - they will not be graded if code is not provided.
· Graphs: All answers that require a graph will be marked with *GRAPH*. For submission: use 1) knitted Markdown or 2) Word Document with all graphs inserted, please clearly identify a question number for each graph.
· Code: All code should be written in one file (R script or Rmarkdown), make sure each code has a clear identification for which question it was used (e.g. #question 1)
· Submission: 1) R code (R script or Rmarkdown), 2) Knitted HTML or PDF or a Word Document with inserted figures, 3) exam with clearly written answers [if I cannot read, I cannot grade]
#### Question 1
Name three differences between Hierarchical and Partitional (K-Means) Clusters:
1.
2.
3.
#### Question 2 *CODE*
b <- c(3,17,10,5)
a <- c(5, 14, 7, 8)
Find Euclidean and Manhattan Distance between two vectors
Euclidean:
Manhattan:
#### Question 3 *CODE* and *GRAPH*
Download dataset late_summer.csv
Perform Elbow method and identify the optimal number of clusters. Describe how you decided what is the optimal number
Answer:
#### Question 4 *CODE* and *GRAPH*
Use the same data set from the question 3 late_summer.csv
Run cluster analysis with the optimal number of clusters you have established in the question 3. Create a graph.
#### Question 5 *CODE* and *GRAPH*
Use iris dataset. Perform agglomerative clustering agnes using euclidean distance and complete linkage method. Plot Dendrogram and report the agglomerative coefficients
Answer: Agglomerative Coefficient =
#### Question 6
Complete the correct statements about clusters:
1. Intra-cluster distance must be ________________________________
2. Inter-cluster distance must be ________________________________
#### Question 7
Which clustering method computes the dissimilarity based the largest distance between two clusters?
Answer:
#### Question 8 *CODE*
Use iris data. Run K-Means with 3 clusters. Provide the size of each of the three clusters. Note: scale data
Answer:
#### Question 9 *CODE*
Use iris data. Calculate eigen values and vectors.
Select the largest eigenvalue.
Answer:
#### Question 10 *CODE* and *GRAPH*
Create a biplot of the iris pca results.
In which component Sepal.Length is negative?
Answer: