My sql project

profileSutony
ChicagoTaxiCabProject1.docx

Chicago Taxi Cab Project

Project Description:

This project uses data from several million Chicago taxi rides in 2016. In this project you will look for trends based on the time and location of the taxi ride. In this project, you will produce a semi-professional report answering some detailed questions about driving a cab in Chicago. The report can be in a Word, markdown, or other document format and should include charts and table illustrating relevant metrics. All graphs can be made in Excel and all tables can be made in Word or Excel. You should also turn in your SQL scripts so that you can receive partial credit for any mistakes.

The goal of this project is to create a document that can be used as a talking point for a job interview.

The report should contain the following sections:

· Executive Summary: A summary of the results of the analysis. This should be less than 1 page. A few sentences and some bullet point would suffice.

· Analysis and Results: This section should be broken up in a logical manner according to the questions below. Include answers to the questions in a narrative form using supporting charts and tables where needed*. Based on the analysis, what conclusions can you draw about the data?

*Do not include your code in your report. Turn that in separately through canvas for potential partial credit.

Questions to answer:

1. Seasonality of cab fares:

a. Create a table that gives the total paid for all fares in the month, average per fare, the standard deviation (std), and the number of fares by month.

b. Which months have the highest total fare for all rides? Does there seem to be any seasonality?

c. Which months have the highest average fare per ride? Does there appear to be any seasonality? As part of answering this question, produce a plot with error bars (+/- 1 standard deviation) of the average fare per month.

d. Which months have the largest variance in cab rides?

2. Fares by day of week:

a. Create a table that gives the total paid for all fares in a given week day, average per fare, the standard deviation (std), and the number of fares by day.

b. Which days have the highest total fare for all rides?

c. Which days have the highest fares per ride? Do weekends differ from weekdays in a statistically significant way? As part of answering this question, produce a plot with error bars (+/- 1 standard deviation) of the average fare by day.

3. Fares by time of day:

a. Create a table that gives the total paid for all fares in a given hour day, average per fare, the standard deviation (std), and the number of fares by hour.

b. Is it better to work the night shift or the day shift in terms of earned income?

4. Payment types:

a. Create a table that shows the average total fare and standard deviation by payment type.

b. Which payment types are the most lucrative?

c. Are disputed payments more likely to occur at one time of day than another?

5. Conclusion:

a. If you were to become a cab driver. When would you want to work? Why?

b. Is there anything else that stands out as worth mentioning?

Extra Credit Questions (worth up to 15 points total):

For 1-4, create a table or plot that helps answer each of the following question:

1. (2 pt) Do people tip better on the weekends or during the week?

2. (2 pt) Do people tip better in the morning or in the evening?

3. (2 pt) Do people take longer trips during the week or on the weekends?

4. (2 pt) Do people take longer trips in the morning or the evening?

5. (2 pt) Write a query that returns a receipt for each row with the following content. Items in bold should be dynamically populated from database columns. You only need to use the taxi_01 table. Remember CHAR(10,13) is used for carriage returns and line feeds.

Thank you for riding taxi #2776 on January 22, 2016 9:30 am.

Your ride lasted 6 minutes and covered a distance 0.7 miles.

The total for this trip is $4.45.

Please remember to tip your driver.

6. (5 pt) Which cab driver (taxi_id) is the best? Explain how you reached that conclusion. This should involve multiple lines of evidence. It is not necessarily the one who makes the most money.

Grading:

The project will be worth 30 points and graded as follows:

Accuracy of Report Results (20 points)

· Are the SQL queries correct and producing the correct results?

· Are the tables and graphs correct?

· Does the narrative agree with the data shown?

Readability and Formatting of Report – (10 points)

· Is the report laid out in a logical order?

· Does the report contain all the necessary parts?

· Is it grammatically and logically correct?

· It is visually appealing?

Example Tables and Chart:

Month

Total Number of Rides

1

1,705,805

2

1,751,192

3

1,975,108

4

1,952,152

5

1,943,584

6

1,934,479

7

1,745,387

8

1,523,947

9

1,294,020

10

1,499,771

11

1,295,000

12

1,245,712

Table 1: Total number of rides by month

About the Data

This data from this project comes from Kaggle. (https://www.kaggle.com/chicago/chicago-taxi-rides-2016) and has been loaded into the taxi database on 10.41.4.35. The data is divided into 12 tables, one for each month of 2016. Look at the link above for information about the data set. The column definitions are given below.

Column

Datatype

Description

taxi_id

INTEGER

A unique identifier for the taxi.

trip_start_timestamp

TIMESTAMP

When the trip started, rounded to the nearest 15 minutes.

trip_end_timestamp

TIMESTAMP

When the trip ended, rounded to the nearest 15 minutes.

trip_seconds

INTEGER

Time of the trip in seconds.

trip_miles

FLOAT

Distance of the trip in miles.

pickup_census_tract

INTEGER

The Census Tract where the trip began. For privacy, this Census Tract is not shown for some trips.

dropoff_census_tract

INTEGER

The Census Tract where the trip ended. For privacy, this Census Tract is not shown for some trips.

pickup_community_area

INTEGER

The Community Area where the trip began.

dropoff_community_area

INTEGER

The Community Area where the trip ended.

fare

FLOAT

The fare for the trip.

tips

FLOAT

The tip for the trip. Cash tips generally will not be recorded.

tolls

FLOAT

The tolls for the trip.

extras

FLOAT

Extra charges for the trip.

trip_total

FLOAT

Total cost of the trip, the total of the fare, tips, tolls, and extras.

payment_type

STRING

Type of payment for the trip.

company

INTEGER

The id code for the taxi company.

pickup_latitude

INTEGER

The id code for the latitude of the center of the pickup census tract or the community area if the census tract has been hidden for privacy.

pickup_longitude

INTEGER

The id code for the center of the pickup census tract or the community area if the census tract has been hidden for privacy.

dropoff_latitude

INTEGER

The id code for the center of the dropoff census tract or the community area if the census tract has been hidden for privacy.

dropoff_longitude

INTEGER

The id code for the center of the dropoff census tract or the community area if the census tract has been hidden for privacy.

Total fares by month in 2016

0.1 0.5 0.3 0.2 0.1 0.15 0.35 0.225 0.45 0.5 0.1455 0.2245 0.1 0.5 0.3 0.2 0.1 0.15 0.35 0.225 0.45 0.5 0.1455 0.2245 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 1.705805 1.751192 1.975108 1.952152 1.943584 1.934479 1.745387 1.523947 1.29402 1.499771 1.295 1.245712

Month

Total Fares (in millions)