4-6 paragraph Analytics project introduction

profileJasperZ511626
Requirements.docx

Flight delay prediction.

Abstract:

Flights delay rate

-at Ronald Reagan Washington National Airport

Flights delay has been one of the most important issues when people are traveling. According to Bureau Transportation of Statistics, there are 836,017 number of flights were delayed in 2015, which is equivalent to 19.1%[1]. It not only highly affects the efficiency of airport operations, but also may disturb every passenger’s following plans. However, if we could dig into the historical flight data, we could get a better sense of the on-time rate of a flight based on dates, airline companies and destinations. We gathered the domestic large air carriers flights data in 2015 provided by U.S. Department of Transportation[2], which contains the departure and arrival cities of a flight, aircraft carriers, date and time of a flight and the arrival delay time in minutes. We conducted a series of analysis of the flight delay probabilities in Washington, DC airport. The airport administrators could smoothly schedule the flights and the ordinary passengers could arrange their plans more smartly.

Our analysis consists of three parts. We extracted all the flights that fly out DC every day and calculated the conditional probability of delay based on airline company and route. It will also tell people the possible delay time range of the flight. The second part utilized the dates  factor. We also constructed a time series model to obtain the short-term forecast of delay time with different airlines. The third part contains a series of hypothesis testing. We will use Chi Square test to determine the factors that might affect the length of delay time, one sample t test for comparing average delay time with different airlines, two sample t test for comparison between every two specific airlines, and permutation test to prove our previous results. Moreover, confidence intervals will be included to give a delay time interval. Regression models are also applied to predict future delay rate and delay time  of a specific airline, which might give passengers a basic sense of delaying and help them arrange their travel schedule.   

The Introduction: (4 -6 paragraphs)

•An Introduction is about the data science question. It is about the topics you plan to explore.

•The Introduction is not about the datasets, variables, methods or models.

•The introduction helps the reader to understand what the data science question is, what the

supporting topics and issues are, and what the overall area is about.

•An introduction allows the reader to “get to know” the data science question and related areas of interest.

•Ideally, an introduction helps the reader to *care* about the topics and to want to read more.

•The Intro should not contain any information about the dataset or the data cleaning, prep, processing, etc. Everything about the dataset goes into the Analysis section under the “About the Data” subsection.

•Introductions can and should include basis, background, history, the state of the art, images, references, etc.

•An introduction will also help the reader to understand who the topics affect and why the topics matter.

• Should not include the words like I, we, you, he, her, they, us, our, your.