Transport data analysis

jhh19970724
ppt.pdf

Institute for Transport Studies FACULTY OF ENVIRONMENT

TRAN5032M

Transport Data Collection & Analysis

Data Collection – Railway stream

Dr Chiara Calastri

This lecture

• Coursework briefing

o Overview

o Data sources

o Structure and format

• Other rail data sources

Coursework briefing

• Learning outcomes (or assessment criteria...):

• Reflection on what research questions can be answered with different

types of data

• Data analysis, including choosing adequate statistical tools

• Clear presentation of a report where a reader can follow your

reasoning and understand your results easily

• Reflection on implications of obtained results and what the next steps

should be

Coursework briefing

• Where to find it:

Learning ResourcesCourseworkCoursework MSc Railway OMP

• Individual work, no team activity!

• Two datasets:

• Primary data: data collected during the fieldwork (intercept

questionnaire)

• Secondary data: Network Rail Cancellations and Significant Lateness

(CaSL) multi-year data, provided by the Office of Rail and Road.

Primary data

• Introduce the dataset

• Which technique was used to collect the data?

• Who answered the survey?

• Analyse the data

• Present some general descriptive statistics

• Include graphs if and where appropriate

• State clearly what you will analyse and why: what is your hypothesis and how

will you test it? Why is it relevant?

• Discuss and conclude

• Present and interpret your results clearly, also comparing what you obtained

with your hypothesis/any relevant literature

• Identify strengths and weaknesses in data, the usefulness of results and

potential suggestions for future analyses/data collection

Primary data - analysis

• Identify one or more variables of interest to you

• Select a statistical technique for your analysis

• The statistical tools and analysis needs to be suitable for

the research question

• The research question is chosen by you

Primary data - Handling missing data

• Your dataset might have missing data

o Respondents who dropped out

o Respondents who did not answer certain questions

o Errors in transferring the data from forms to Excel can run checks?

• Missing data occur frequently in real datasets

• If too many data points are missing you can consider

excluding a respondent not reliable

• If only one or two answers are missing the data is still

usable Do not use the specific observation

Treat missing observations as a separate category

Primary data - reporting

• Use your experience (fieldwork, previous experience?)

• Use background knowledge/lectures/relevant literature

• Build the report with a clear structure that shows not only

your understanding but your ability to communicate your

work help is available!

Secondary data

• Introduce the dataset

• Who collected the data?

• Which technique was used to collect the data?

• Is it a representative study?

• Why was the data collected?

• Analyse the data

• Present your research question and descriptive statistics

• Include graphs if and where appropriate

• How will you test your hypothesis/es? Why is/are it/they relevant?

• Discuss and conclude

• Present and interpret your results clearly, also comparing what you obtained

with your hypothesis/any relevant literature

• Identify strengths and weaknesses in data, the usefulness of results and

potential suggestions for future analyses/data collection

The Office of Rail and Road

• The Office of Rail and Road (ORR) is the independent

economic and safety regulator for Britain’s railways, and

monitor of performance and efficiency for England’s

Strategic Road Network. They:

• Regulate & set targets for Network Rail

• Report on performance

• Regulate health & safety standards across rail

• Oversee competition and consumer rights

• Regulate HS1 (link to the channel tunnel)

Network Rail is the

owner and

infrastructure manager

of most of the railway

network in Great

Britain

ORR Statistics and reports

• ORR publish a range of

statistics about railway

performance, rail usage

and safety

• Data on many topics

are presented as

reports

• They give you an idea

of how data can be

described/visualised

ORR datasets

• ORR also publishes source data, which might/might not be

used in some reports. The datasets are accessible here:

https://tinyurl.com/y44j8v7s

The dataset (1/2)

• The file contains 5 sheets:

1. Delay minutes by Category of Delay and Train Operating Company

2. PPM* failures by Category of Delay and Train Operating Company

3. CaSL** failures by Category of Delay and Train Operating Company

4. Full Cancellations by Category of Delay and Train Operating Company

5. Part Cancellations by Category of Delay and Train Operating

Company

*: Public Performance Measure

**: Cancelled or Significantly Late

The dataset (2/2)

• Data from 2011-12 to 2018-19 and divided in periods

The dataset (2/2)

• Broken down by train operating company (TOC) and

category of delay

• Plenty of opportunities for analysis:

• Analysis at year or period level

• Comparison between different TOC

• Change in different types of delays over time

• Types of delays within a given TOC

• .....

• Up to you!

Format

• Typed written report including

o Front cover

o Index/List of contents

o Structured sections

o List of (used) references

• Accuracy matters

o Clear captions on figures and equation numbers

o Clear reference to figures, tables and cited work in the text

o Specify units of measurement

• Make sure you address all requirements in the text of the

task assignment

Marking criteria

• Marks will be awarded as follows:

1. Applying statistics (30%).

2. Reflecting on data quality (30%)

3. Recommending future work or improvements (30%)

4. Presentation of coursework (10%)

Referencing

• Make sure you support your statements either with your

experience or existing work.

• As a rule of thumb, include between 10-20 references

• Reference can be books, conference and journal articles,

government/technical/project reports.

• Use the Leeds Harvard referencing style in the References

section.

• References ≠ Bibliography!

Sourcing journal articles via

Google Scholar

Click here for

“Advanced

search”

..or use the

regular search

tool and refine

later

Additional search criteria

Be specific

Consider if

different names

are used in the

literature

Depending on

the subject, time

matters!

Citing and link to PDF versions

Click here to copy

the citation to

include in your

references

List of papers

citing this one

Direct link to a

PDF

Word count

• Maximum word count is 2000 (not including figures, tables

and references)

• You should not bypass the word count by making extra text

into a figure or table. Figures and tables have to be

explained in the text!

• You can write less than 2000, but on previous experience

this is the right word count for the task. If you have a lot less

than this, go back over your work and check that you have

made enough distinct points/arguments

Make it yours

• You will find reports and commentaries on this data

• Remember you shouldn’t simply report descriptive statistics

like most reports do

• Tip: do your analysis first

• Try to be creative with your

hypothesis

• You might find something that

existing analyses have not found!

Institute for Transport Studies FACULTY OF ENVIRONMENT

Other rail data sources

A quick overview

Network Rail

• Network Rail owns and operates the railway infrastructure

in England, Wales and Scotland. This includes

• 20,000 miles of track

• 30,000 bridges, tunnels and viaducts

• thousands of signals, level crossings and stations.

• They manage 20 of the UK's largest stations while all the

others, over 2,500, are managed by the country’s train

operating companies.

Network Rail data

• On the website, Network rails provide many reports as well

as datasets

Network rail datasets and

reports (among others)

• Annual expenditure (2012-2018)

• Bridge strikes

• Business expenses (also travel)

• Business performance

• Cable theft report

• Carbon and energy use

• Close calls

• Environmental incidents

• Equality diversity and inclusions and family policies

• Lost property

• Public and passenger accidents

• Safety

• Spend on information technology

• Suicide statistics

• Workforce count

Station 2013 2014 2015 2016

Birmingham New Street 136 555 601 666

Edinburgh 265 764 784 837

London Euston 870 864 934 899

Glasgow 275 655 651 661

Kings Cross 132 231 424 325

Leeds 154 526 526 566

Liverpool 66 257 233 219

Liverpool Street 431 416 53 329

Manchester 144 618 646 641

Paddington 296 401 454 389

London Victoria 910 1695 1418 1527

Totals 3679 6982 6724 7059

Suicide figures 2009-2019

Office of Rail and Road: wide

range of datasets

Rail ticket sales data

• The rail industry’s central ticketing system is LENNON

• LENNON holds information on the vast majority of national

rail tickets purchased in Great Britain

• Such dataset is used for (among others):

o Disaggregate demand analysis

o Time series analysis

o Rail passenger kilometres

• LENNON is a fundamental tool for rail forecasting (see

Passenger Demand Forecasting Handbook)

• It is not publicly available

Timetabling

• Rail delivery group (National Rail)

o Fares data

o Routeing data

o Timetable data

• These data are of a “technical” nature and not open-source

but can be accessed

• Detailed documentation to help users

• Reports to learn the basic statistics and workings of the

system

National Rail Travel Survey

(NRTS)

• Self-completion questionnaire handed in rail stations

• Main survey component is “at station”

• Small number of surveys conducted on-train

• Some of the information collected:

o origin and destination station;

o true origin and destination (postcode);

o time of departure and arrival;

o ticket type;

o season tickets and railcards held;

o journey purpose; and

o day of week of journey.

o access and egress mode to/from the station

National Travel Survey

• Ongoing survey collecting information on all journeys made

in a week by a sample of households

• Household characteristics also collected

• Limited information on long journeys

• Valuable data to assess modal competition

Other government data

• Living Costs and Food Survey

• International Passenger Survey

• Both are conducted over time

• Such surveys collect information such as:

o journey length

o number of rail trips made

o mode share

o expenditure on travel

o access modes to airports