python 3..
eda-starter.ipynb
{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# EDA Basics" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# import required packages\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# set matplotlib inline and use seaborn \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task 1\n", "There are 8 basic functions as described at http://mathonweb.com/help_ebook/html/functions_4.htm. Do the followings:\n", "1. create 8 subplots (4x2 4 rows with two subplots each row) with figure size (15, 15)\n", "2. plot a function (you choose) of each type in each subplot, add the function type as the title of the subplot, and the quadratic function has to be in the subplot in the second row and second column\n", "3. use at least 4 colors, 4 different line styles, and 4 different markers for the 8 functions\n", "4. change the x label to x and y label to the function\n", "5. add legends for each subplot to show the functions, such as y = 3x +5" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task 2\n", "Create an account on kaggle.com and read the overview of the titanic competition at https://www.kaggle.com/c/titanic/overview, do the followings:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- download the training dataset and rename it to titanic-train.csv and load it using pandas" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# download the training dataset and rename it to titanic-train.csv and load it using pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- how many rows and features in the dataset\n", "- which variable is the target\n", "- are there any null values? if yes, in which columns and how many?\n", "- what are the data types of each feature?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- drop the columns with null values from the dataframe: you can find the solution from https://stackoverflow.com/questions/13411544/delete-column-from-pandas-dataframe, the best solution may not be the top ranked one - dig out the one that you like best\n", "- print out the info of the data frame after dropping the columns, how many columns left?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Plot the Histograms for all numerical features with figure size 10x10\n", "- discribe each histogram and state any findings you may have for each" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- calculate the suvival rate using `value_counts()`\n", "- you cannot hardcode any number" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- create the histogram for gender\n", "- calculate the survival rate based on gender using `groupby()`\n", "- you cannot hardcode any number" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- based on the histograms above - choose featrues that you think may have outliers and create boxplots for those features to verify" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- create a scatter plot for `Fare` feature to take a closer look at the data points. Hint: use `df.index` as x and use values of `Fare` as y\n", "- figure size 10 x 10, opacity set to 0.4\n", "- by eye-balling the scatter chart - how many tickets are over $500?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create scatter plots to answer the following questions:\n", "- are there any correlations among fare, # of siblings/spouses aboard, and survival?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }