# Data science

Sonali_06

• 25 days ago
• 15
files (1)

## Week12_Assignment_Data_Wrangling1.pdf

CS628 - Data Science

Week 12 Assignment

Monroe College

Note: Read the resources posted in week 12 folder and the chapter 7 of the ebook in

the week 12 folder (Python for Data Analysis).

Solve the following problems 1 to 3, work with the Nutrition_subset data set. The data set

contains the weight in grams along with the amount of saturated fat and the amount of

cholesterol for a set of 961 foods. Use Python.

1. The elements in the data set are food items of various sizes, ranging from a

teaspoon of cinnamon to an entire carrot cake.

a. Sort the data set by the saturated fat (saturated_fat) and produce a listing of

the five food items highest in saturated fat.

b. Comment on the validity of comparing food items of different sizes.

2. Derive a new variable, saturated_fat_per_gram, by dividing the amount of

saturated fat by the weight in grams.

a. Sort the data set by saturated_fat_per_gram and produce a listing of the

five food items highest in saturated fat per gram.

b. Which food has the most saturated fat per gram?

3. Derive a new variable, cholesterol_per_gram.

a. Sort the data set by cholesterol_per_gram and produce a listing of the five

food items highest in cholesterol fat per gram.

b. Which food has the most cholesterol fat per gram?

Solve the following problems 4 to 6, work with the adult_ch3_training data set. The response is whether income exceeds \$50,000. Use Python.

4. Add a record index field to the data set.

5. Determine whether any outliers exist for the education field.

6. Do the following for the age field.

a. Standardize the variable.

b. Identify how many outliers there are and identify the most extreme outlier.