need help with quiz

profileChikki
FinalTermprintExamANLY506LateSummer.docx

Final Term Exam ANLY 506 Late Summer - 2 hours 50 pts

Last Name _____________ First Name_____________ Course (circle) 51 / 52

Instructions: Please Read!

· Phones are not allowed during the exam

· Materials: You can use any course material in digital or printed form (lectures, books). All code must be executed in R. You are NOT allowed to browse kaggle, medium blogs...

· Code: All answers that require code will be marked with *CODE*. Note - they will not be graded if code is not provided.

· Graphs: All answers that require a graph will be marked with *GRAPH*. For submission: use 1) knitted Markdown or 2) Word Document with all graphs inserted, please clearly identify a question number for each graph.

· Code: All code should be written in one file (R script or Rmarkdown), make sure each code has a clear identification for which question it was used (e.g. #question 1)

· Submission: 1) R code (R script or Rmarkdown), 2) Knitted HTML or PDF or a Word Document with inserted figures, 3) exam with clearly written answers [if I cannot read, I cannot grade]

#### Question 1

Name three differences between Hierarchical and Partitional (K-Means) Clusters:

1.

2.

3.

#### Question 2 *CODE*

b <- c(3,17,10,5)

a <- c(5, 14, 7, 8)

Find Euclidean and Manhattan Distance between two vectors

Euclidean:

Manhattan:

#### Question 3 *CODE* and *GRAPH*

Download dataset late_summer.csv

Perform Elbow method and identify the optimal number of clusters. Describe how you decided what is the optimal number

Answer:

#### Question 4 *CODE* and *GRAPH*

Use the same data set from the question 3 late_summer.csv

Run cluster analysis with the optimal number of clusters you have established in the question 3. Create a graph.

#### Question 5 *CODE* and *GRAPH*

Use iris dataset. Perform agglomerative clustering agnes using euclidean distance and complete linkage method. Plot Dendrogram and report the agglomerative coefficients

Answer: Agglomerative Coefficient =

#### Question 6

Complete the correct statements about clusters:

1. Intra-cluster distance must be ________________________________

2. Inter-cluster distance must be ________________________________

#### Question 7

Which clustering method computes the dissimilarity based the largest distance between two clusters?

Answer:

#### Question 8 *CODE*

Use iris data. Run K-Means with 3 clusters. Provide the size of each of the three clusters. Note: scale data

Answer:

#### Question 9 *CODE*

Use iris data. Calculate eigen values and vectors.

Select the largest eigenvalue.

Answer:

#### Question 10 *CODE* and *GRAPH*

Create a biplot of the iris pca results.

In which component Sepal.Length is negative?

Answer: