Machine Learning
Supervised learning (6 points total)
1) Does she like the song under ‘?’? Red circle are for the songs she likes and the diamonds for the
ones she doesn’t. (3 points)
Yes
No
Unclear
2) Does she like the song under ‘?’? Red circle are for the songs she likes and the diamonds for the
ones she doesn’t. (3 points)
Yes
No
Unclear
Scatter Plots (19 points total)
1) Match the road bumpiness and slope with the colored X markings in the figure. (7 points)
1) green A) very steep and some bumpiness
2) blue B) low slope and no bumpiness
3) red C) low slope and extreme bumpiness
2) Is ‘?’ more like circles or more like diamonds? (3 points)
circles
diamonds
unclear
3) What is ‘?’ more like? (3 points)
circles
diamonds
unclear
4) What is ‘?’ more like? (3 points)
circles
diamonds
unclear
5) Which line separates the classes best. (3 points)
Steepest
Moderately steep
At least steep
Basic Probability (60 points total)
Question 1: A die is rolled, find the probability that an even number is obtained. (2 points)
Question 2: Two coins are tossed, find the probability that two heads are obtained. (2 points)
Question 3: Which of these numbers cannot be a probability? (4 points)
a) -0.00001
b) 0.5
c) 1.001
d) 0
e) 1
f) 20%
Question 4: If two dice are rolled, what is the probability that the sum is
a) equal to 1 (2 points)
b) equal to 4 (2 points)
c) less than 13 (2 points)
Question 5: A die is rolled and a coin is tossed, find the probability that the die shows an odd number
and the coin shows a head. (4 points)
Question 6: A card is drawn at random from a deck of cards. Find the probability of getting the 3 of
diamond. (4 points)
Question 7: A card is drawn at random from a deck of cards. Find the probability of getting a queen. (4
points)
Question 8: A jar contains 3 red marbles, 7 green marbles and 10 white marbles. If a marble is drawn
from the jar at random, what is the probability that this marble is white? (4 points)
Question 9: The blood groups of 200 people is distributed as follows: 50 have type A blood, 65 have B
blood type, 70 have O blood type and 15 have type AB blood. If a person from this group is selected at
random, what is the probability that this person has O blood type? (6 points)
10) A die is rolled, find the probability that the number obtained is greater than 4. (6 points)
11) Two coins are tossed, find the probability that one head only is obtained. (6 points)
12) Two dice are rolled, find the probability that the sum is equal to 5. (4 points)
13) A card is drawn at random from a deck of cards. Find the probability of getting the King of hearts.
(4 points)
14) Is message ‘Love life’ more likely from Chris or Sara? (2 points)
15) Is message ‘Love deal’ more likely from Chris or Sara?(2 points)
Support Vector Machines (26 points total)
1) Mark the line that separates the clusters the best.
A) (3 points)
Steepest
Moderately steep
At least steep
B) (3 points)
Vertical
Positive
Negative
C) (3 points)
Left
Middle
Right
D) (3 points)
Positive slope line
Negative slope line
E) (3 points)
Vertical
Positive
Negative
Unclear
2) What is criteria to pick the line in a picture below? (3 points)
1) Simple
2) Random
3) Something else
3) Can you separate clusters by line? (1 point)
4) Is the transformation below linearly separable? (1 point)
5) How would you transform the data below to make it linearly separable? (6 points)
Decision tree (10 points total)
1) Is the data below linearly separable? (1 point)
2) Construct a decision tree that classifies data below. (9 points)
Data sets (15 points total)
1) What type of data is ‘job title’? (3 points)
2) What type of data is ‘time stamp’ on email? (3 points)
3) What type of data is content of email? (3 points)
4) What type of data is ‘number of emails sent by a person’? (3 points)
5) What type of data is ‘to/from’ fields in email? (3 points)
Regression (41 points total)
1) Pick two point to connect the best line approximating the data. (3 points)
A
B C
D
E
F
2) What is the most reasonable slope and intercept for the line that approximates the data below? (6
points)
3) Which line has the greatest slope. (3 points)
Left-most
Middle
Right-most
4) Which line has the greatest intercept. (3 points)
Upper
Middle
Lower
5) For line 6.25 * x + 30 that predicts the wealth in thousands of dollars given the age, what is the
wealth of person at age 36? (3 points)
6) Which figure is a good candidate for linear regression. (5 points)
1st (from left and top to bottom)
2nd
3rd
4th
5th
7) What is a good formula of form y=a*x1 + b*x2 + c for the data below; what are the good a, b, and c?
(9 points)
8) What is a good formula of form y=a*x1 + b*x2 + c for the data below; what are the good a, b, and c?
(9 points)
Outliers (15 total points)
1) What is the best fit line for the data? (3 points)
Top horizontal
Bottom horizontal
Sloped
Unclear
2) Which data sets have outliers. (6 points)
1st
2nd
3rd
4th
5th
6th
3) Which point below causes the largest error in liner regression model? (3 points)
4) What is the effect on the slope in linear regression when the outlier below is removed? (3 points)
Clustering (3 points total)
1) How many clusters do you see in the data? (2 points)
2) Is it possible, by clustering, that the data below gets into 2 clusters, given 3 initial cluster centers? (1
point)