Economics Help part 1
Question 1 Consider two people, characterized by the following features. What is the Euclidean distance between them?
SQRT((191-180)^2 + (24-18)^2) = SQRT(121 + 36) = SQRT(157) = 12.53
ABS(191-180) + ABS(24-18) = 17
SQRT((191-24)^2 + (180-18)^2) = SQRT(167^2+162^2) = 232.66
None of the other answers are correct.
Question 2 According to the rating data in the training spreadsheet, what rating did user 2 give to song 138?
1
2
3
4
5
Question 3 For a user who resides in New York, what three values would you find in the following three columns user_LA, user_NY and user_MI?
0, 1, 0
1, 0, 0
0, 0, 0
0, 0, 1
Question 4 How many observations have been allocated to the test partition?
20
21
10
Require more information to answer this question.
Exercise Instructions
Open the Excel workbook. There are several tabs in this workbook. The first two tabs contain historical user-song ratings, randomly partitioned into training data and test data, respectively. The third tab contains a partially populated template for generating k-NN predictions. The remaining tabs contain pairwise distance calculations between each test observation, and each training data observation. For example, the tab named “29-167” contains the pairwise distance calculations between test observation 29-167, and every training data observation.
On the k-NN predictions tab, you will find that three sets of predictions have been pre-populated for you. These include i) a popularity-based predictor, i.e., the average rating that has been provided in the training data for the song ID in question (this is a common, intuitive approach, but it is also unsophisticated), ii) a continuous k-Nearest Neighbor prediction (i.e., kNN regression) and iii) a discrete k-Nearest Neighbor prediction (kNN classification). All three sets of predictions are provided for a set of 20 test observations that were randomly drawn from the available rating data. The kNN predictions are based on the k nearest-neighbors of each test observation, where k is the number of neighbors to consider, where ‘near’ versus ‘far’ is defined in terms of Euclidean distance. As you modify K, you will see the kNN predictions change for each test observation. You will also see that the popularity-based predictions remain fixed.
In addition to the predictions, placeholders have been provided for you to capture performance (error) metrics for all three approaches, including the continuous popularity and kNN based predictions (MAE, RMSE) and discrete (accuracy, error and a confusion matrix) prediction implementations.
As you adjust the value of K, you will see the predictions change, as well as the individual error values for each test observation.
Question 5 Vary the value of k from 1 through 10. Based on the continuous (kNN regression) prediction error measures, what is the optimal number of nearest neighbors employ?
1
2
3
4
5
6
7
8
9
10
Question 6 Based on the confusion matrix you observe when k = 5, calculate the prediction accuracy of the kNN classifier in cell B30 (Hint: overall accuracy is the proportion of the 20 predictions that were correct, i.e., on the diagonal of the confusion matrix).
65%
35%
40%
We cannot answer this question without more information.
Question 7 Vary the value of k from 1 through 9. Considering the discrete, kNN classifier’s prediction accuracy, what is the optimal number of nearest neighbors employ, i.e., what value of k maximizes the prediction accuracy?
1
2
3
4
5
6
7
8
9
Question 8 Yes or no? Would either of these kNN predictive models be useful for generating recommendations related to a new song that has just been released, i.e., which has not yet been rated by any users, or for a new user, i.e., a user who has not yet entered any song ratings? (Hint: you can assume that you would have details on the new user or the new song).
Yes
No