Lab 5- CSS300
CSS 300 Module 5 Activity Worksheet
Use this worksheet to complete your lab activity. Submit it to the applicable assignment submission folder when complete. Please ensure that your answers are easy to distinguish from the questions. You may also submit your work in a fresh Word document if you prefer.
Deliverable:
· A word document answering the following questions (or you may work in this worksheet; please ensure your answers/responses are easy to distinguish from the questions/prompts).
Download the avocado.csv dataset. You can find out more about this dataset Kaggle’s Avocado Prices page.
1. For each numeric field, what is the count and five number summary (median, 1st - 3rd quartiles, minimum and maximum)? What is the standard deviation and mean? Use the following code sample below:
df.describe()
2. Create a histogram for each numeric field. Based on the histograms, are there any signs of skewness for any of the fields? Are there any outliers? Do any columns have missing values? Use the following sample code:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid")
plt.figure(figsize=(10,8))
filter_data = df.dropna(subset=['AveragePrice'])
plt.figure(figsize=(14,8))
sns.distplot(filter_data['AveragePrice'], kde=False)
3. For the categorical fields, find out how many observations fall within each type by creating a bar chart using the following sample code:
sns.set(style='darkgrid')
plt.figure(figsize=(20,10))
ax = sns.countplot(y='region', data=df)
4. Create a heat map to check relationships. Are any of the numeric fields correlated?
corr = df.corr(method='pearson')
import seaborn as sns
sns.heatmap(corr)
5. Take a look at the “Matplotlib Exercises.ipynb” file and answer the questions in the Jupyter Notebook as noted. These exercises do not require any data.
6. Submit both the MS Word document with your answers regarding the Avacado dataset and Matplotlib Exercises Jupyter Notebooks for grading.