Python

profilemike05
CSCI333.pdf

CSCI 333 – Fall 2019 Final Project

100 points + 10 bonus points

Due Date: Friday, December 6, 2019, 11:59 pm.

Note: This is an individual assignment. Each student MUST complete the work on his/her own. Any code sharing/plagiarism is not tolerated.

Overview

This project consists of two tasks. The goal is to apply what we have learned to solve real problems in Data Science. Glance at “What to Submit” when you start working on a task so that you know what information to provide from each task.

Submission Example

csci333-project-XX

csci333-project-XX.doc

Task1XX.py

task2XX.py

README.txt

What to Submit

1. One doc file “csci333-project-XX.doc” including the text source code and screenshots of the outputs of all programs. Please replace XX with your first name and last name. You can copy/paste the text source code from Pycharm or other IDEs into the doc file. Hopefully, based on the screen snapshots of the output, you can show that your programs passed tests and were well.

2. Python files for all programs. In well-defined programs, proper comments are required. For programs without comments, they will be deducted greatly in grade.

3. Note that if any program or code does not work, you can explain the status of the program or code and then attach your explanation and description in a file “README.txt”.

4. Optional. Anything you want to attract the attention of instructor in grading.

Task 1 (50 points): (Intro to Data Science: Survey Response Statistics) Twenty students were asked to rate on a scale of 1 to 5 the quality of the food in the student cafeteria, with 1 being “awful” and 5 being “excellent”. Place the 20 responses in a list.

1, 2, 5, 4, 3, 5, 2, 1, 3, 3, 1, 4, 3, 3, 3, 2, 3, 3, 2, 5

Write a program that does the following:

(a) Determine and display the frequency of each rating.

(b) Use the built-in functions, statistics module functions and NumPy or Panda functions cov- ered in the course materials to display the following response statistics: minimum, maximum, range, mean, median, variance and standard deviation.

(c) Display a bar chart showing the response frequencies and their percentages of the total responses.

Grading Rubric

– 10 points for defining functions.

– 15 points for finishing Task1(a)-(c).

– 5 points for appropriate comments.

– 10 points for a runnable python program with correct data visualization.

– 10 points for screenshots of the program.

Task 2 (50 points): (Classification with k-Nearest Neighbors and the Digits Dataset) Read the file “09-02-MachineLearning.pdf” and the python program “CaseStudy1.py” to learn the algorithm of k-Nearest Neighbors with the Digits dataset for recognizing handwritten digits.

Re-write the python program by doing the following subtasks:

(a) Write code to display the two-dimensional array representing the sample image at index 24 and numeric value of the digit the image represents.

(b) Write code to display the image for the sample image at index 24 of the Digits dataset.

(c) For the Digits dataset, what numbers of samples would the following statement reserve for training and testing purposes?

1X train, X test, y train, y test = 2train test split(digits.data, digits.target, random state=11, test size=0.60)

(d) Write code to get and display these numbers.

(e) Rewrite the list comprehension in snippet [50] using a for loop. Hint: create an empty list and then use the built-in function “append”.

Grading Rubric

– 15 points for finishing Task2(a)-(e).

– 5 points for appropriate comments.

– 20 points for a runnable rewritten python program

– 10 points for screenshots of the program.

Challenges in This Project

1. For 10% extra credit, you are welcome to explore the design of each task. Note: You still have to finish all tasks required by this project.

2. You should configure your machine and PyCharm properly to facilitate the project develop- ment.

—————x———— Good Luck ————x————–