Machine Learning

profileGopi16
machineLearningTest_Main.pdf

Supervised learning (6 points total)

1) Does she like the song under ‘?’? Red circle are for the songs she likes and the diamonds for the

ones she doesn’t. (3 points)

Yes

No

Unclear

2) Does she like the song under ‘?’? Red circle are for the songs she likes and the diamonds for the

ones she doesn’t. (3 points)

Yes

No

Unclear

Scatter Plots (19 points total)

1) Match the road bumpiness and slope with the colored X markings in the figure. (7 points)

1) green A) very steep and some bumpiness

2) blue B) low slope and no bumpiness

3) red C) low slope and extreme bumpiness

2) Is ‘?’ more like circles or more like diamonds? (3 points)

circles

diamonds

unclear

3) What is ‘?’ more like? (3 points)

circles

diamonds

unclear

4) What is ‘?’ more like? (3 points)

circles

diamonds

unclear

5) Which line separates the classes best. (3 points)

Steepest

Moderately steep

At least steep

Basic Probability (60 points total)

Question 1: A die is rolled, find the probability that an even number is obtained. (2 points)

Question 2: Two coins are tossed, find the probability that two heads are obtained. (2 points)

Question 3: Which of these numbers cannot be a probability? (4 points)

a) -0.00001

b) 0.5

c) 1.001

d) 0

e) 1

f) 20%

Question 4: If two dice are rolled, what is the probability that the sum is

a) equal to 1 (2 points)

b) equal to 4 (2 points)

c) less than 13 (2 points)

Question 5: A die is rolled and a coin is tossed, find the probability that the die shows an odd number

and the coin shows a head. (4 points)

Question 6: A card is drawn at random from a deck of cards. Find the probability of getting the 3 of

diamond. (4 points)

Question 7: A card is drawn at random from a deck of cards. Find the probability of getting a queen. (4

points)

Question 8: A jar contains 3 red marbles, 7 green marbles and 10 white marbles. If a marble is drawn

from the jar at random, what is the probability that this marble is white? (4 points)

Question 9: The blood groups of 200 people is distributed as follows: 50 have type A blood, 65 have B

blood type, 70 have O blood type and 15 have type AB blood. If a person from this group is selected at

random, what is the probability that this person has O blood type? (6 points)

10) A die is rolled, find the probability that the number obtained is greater than 4. (6 points)

11) Two coins are tossed, find the probability that one head only is obtained. (6 points)

12) Two dice are rolled, find the probability that the sum is equal to 5. (4 points)

13) A card is drawn at random from a deck of cards. Find the probability of getting the King of hearts.

(4 points)

14) Is message ‘Love life’ more likely from Chris or Sara? (2 points)

15) Is message ‘Love deal’ more likely from Chris or Sara?(2 points)

Support Vector Machines (26 points total)

1) Mark the line that separates the clusters the best.

A) (3 points)

Steepest

Moderately steep

At least steep

B) (3 points)

Vertical

Positive

Negative

C) (3 points)

Left

Middle

Right

D) (3 points)

Positive slope line

Negative slope line

E) (3 points)

Vertical

Positive

Negative

Unclear

2) What is criteria to pick the line in a picture below? (3 points)

1) Simple

2) Random

3) Something else

3) Can you separate clusters by line? (1 point)

4) Is the transformation below linearly separable? (1 point)

5) How would you transform the data below to make it linearly separable? (6 points)

Decision tree (10 points total)

1) Is the data below linearly separable? (1 point)

2) Construct a decision tree that classifies data below. (9 points)

Data sets (15 points total)

1) What type of data is ‘job title’? (3 points)

2) What type of data is ‘time stamp’ on email? (3 points)

3) What type of data is content of email? (3 points)

4) What type of data is ‘number of emails sent by a person’? (3 points)

5) What type of data is ‘to/from’ fields in email? (3 points)

Regression (41 points total)

1) Pick two point to connect the best line approximating the data. (3 points)

A

B C

D

E

F

2) What is the most reasonable slope and intercept for the line that approximates the data below? (6

points)

3) Which line has the greatest slope. (3 points)

Left-most

Middle

Right-most

4) Which line has the greatest intercept. (3 points)

Upper

Middle

Lower

5) For line 6.25 * x + 30 that predicts the wealth in thousands of dollars given the age, what is the

wealth of person at age 36? (3 points)

6) Which figure is a good candidate for linear regression. (5 points)

1st (from left and top to bottom)

2nd

3rd

4th

5th

7) What is a good formula of form y=a*x1 + b*x2 + c for the data below; what are the good a, b, and c?

(9 points)

8) What is a good formula of form y=a*x1 + b*x2 + c for the data below; what are the good a, b, and c?

(9 points)

Outliers (15 total points)

1) What is the best fit line for the data? (3 points)

Top horizontal

Bottom horizontal

Sloped

Unclear

2) Which data sets have outliers. (6 points)

1st

2nd

3rd

4th

5th

6th

3) Which point below causes the largest error in liner regression model? (3 points)

4) What is the effect on the slope in linear regression when the outlier below is removed? (3 points)

Clustering (3 points total)

1) How many clusters do you see in the data? (2 points)

2) Is it possible, by clustering, that the data below gets into 2 clusters, given 3 initial cluster centers? (1

point)