MIDTERM EXAM

profileoguzhansaglik
MidtermExam.pdf

CMPS 530

Machine Learning & Pattern Recognition

Thursday 6-9 pm

MIDTERM EXAM

Name: ________________________________

Notes:

20 Questions 5 points each: Total (100 points)

Multiple Choice Questions: There could be more than one correct answers-choose all that apply

Describe Questions: Please describe briefly – no more than 500 words

Pseudo Code: Piece of code that doesn’t have to be syntactically correct to run on a machine and can be

a combination of plain word English and programming language syntax

**This is a take home test but work & submit individually. Please submit as an uploaded pdf document

to itslearning and/or email to: [email protected]

1) Machine Learning is:

A. Optimizing a piece of code to recognize patterns in data

B. Useful in data driven analysis

C. Grouping of data to reveal clusters if any

D. Applied in web searches

2) One of the below is not a category of machine learning algorithms

A. Supervised

B. Unsupervised

C. Semi-Supervised

D. Descriptive

E. Reinforcement

3) (T/F) In Unsupervised Machine Learning, labeled data is useful in training a model so that

unknown values can be given a label (category)

4) An example where past student’s study times and grades are used to predict what an incoming

student’s grade might look like can be solved using one or more of these ML techniques

A. Regression

B. Neural Networks

C. Naïve Bayes

D. K-Means

E. KNN

5) (T/F) Unsupervised Learning is also referred to as Clustering sometimes

6) Test data is used for:

A. Generating new data

B. Identifying issues with the data

C. Validate Model

D. Write Hypothesis

7) (T/F) An estimator is a function from sample data to some estimand, such as a value of a

parameter

8) (T/F) Prediction is the common theme between the disciplines of Statistics and Machine

Learning

9) Probability Distribution provides:

A. Knowledge of the properties of distribution families

B. Usefulness in analyzing data

C. Representation of probability values to data points in event

D. A way to assume anything about the data

10) A classifier algorithm will help to:

A. Correlate input data to a class category

B. Predict the continuous values of a variable

C. Define the underlying relationship between only two variables

D. All of the above

11) Naïve Bayes classifier is based on:

A. Markov model

B. KNN method

C. Neural Network

D. Bayes Theorem

12) The assumption of conditional independence refers to:

A. Independently working on solving problems

B. Creating a Data Model with categorical variables

C. Creating a Data Model with only Binary variables

D. Non-dependence or mutual exclusivity of two events

13) (T/F) In k-means clustering algorithm, subjective choice for the value of K is made

14) (T/F) K-means and KNN are both examples of clustering algorithms

15) ANN is a type of:

A. Data Model

B. Data Structure

C. Machine Learning Algorithm

D. Unsupervised machine learning algorithm

16) What python library/package did you use to encode categorical variables to numeric values

A. Matplotlib

B. Sklearn

C. Numpy

D. LabelEncoder

17-19) Write a python syntax/pseudocode that will perform the following tasks:

a) Use relevant libraries to import machine learning package sklearn

b) Create an array of two variables (features: Size & Color) and one label (Sales):

Size: Small, Medium, Large

Color: Red, Green, Blue, Orange

Sales: Low, Medium, High

c) Populate the training data with the values seen below

Size Color Sales

Small Red Medium

Small Blue Low

Medium Orange High

Large Blue High

Large Red Low

Small Blue High

Medium Red High

Small Blue Medium

Large Orange Low

Small Blue High

d) Invoke a KNN classifier function and predict if size is small and color blue, what are the sales

prediction

20) Briefly describe any application where you would use a machine learning algorithm

- Write a problem statement

- Method recommended

- Steps involved and analysis