I need desertation written on the below
10 months ago
50
dissertationpoints.pptx
NCGDissertation_USDatasets_ExampleProjects.pdf
Description2.html
dissertationpoints.pptx
Developing research ideas and formulating research questions
Chris Brunsdon
1
Research Projects in general
Most research projects share the same general structure
Broad area of interest may be too broad to study in any single research project
From http://www.socialresearchmethods.net/kb/strucres.php
Hourglass
You have done some background research
Reading the literature around your topic
Talking to supervisors
The aim is narrow down your Question
Formulate a hypothesis or research question
Then for the actual dissertation (not the proposal)
You will be engaged in direct measurement or observation of the question of interest
At the narrowest point
After basic data is collected you will
Begin to try to understand it
Perform analysis
After analysis you will
Start to formulate some initial conclusions
Generalize from the results of your study to other work
Hourglass
Background research
Narrow down your research question
Direct measurement
Analysis after basic data is collected
Reflections
Research Question
Initial research ideas are often very general
But good research questions need to be quite specific, so that: - appropriate methods can be chosen - required resources can be identified - criteria for success are clear - work can be planned realistically - useful objectives can be met
Research ideas
General ideas may flow from a number of basic aims
Professional - ‘I want to develop a new spatial analysis technique’
Methodological - ‘I want to use R-based image processing software’
Ethical - ‘I want do something that makes a difference to the environment’
Turning ideas into questions
A good research question is clear and specific
Answerable using data / resources available
Consider the subject matter:
who, what & why?
Consider the practical:
where, when & how?
Then review the question to see if it really requires research
The subject matter
‘Who’ are the ‘subjects’?
General: land cover change
Specific: tropical rainforest
‘What’ is the test?
General: remote sensing
Specific: fuzzy classification vs. other methods
‘Why’ considers is this a good thing to do?
General: to manage remote sensing uncertainty
Specific: to improve monitoring of specific forest resources
Practical issues
‘Where’ is it possible to do this?
General: Tropical rainforest areas, Data
Specific: Bolivian jungle
‘When’ - is there time to do this analysis?
General: this year
Specific: complete research by August 12th
‘How’ can the plans be implemented?
General: I have the necessary skills
Specific: I will have learn to use new RS software
Research question
If the general idea was to analyse land cover change, then the specific question might be: ‘Does fuzzy classification of remotely sensed imagery improve land cover change monitoring in tropical rainforest environments?’
It should be clear from thinking about the practical issues that it not realistic to answer this question robustly using an analysis of just one rainforest area!
Research Question
Your final research question should be realistic
The methods should be appropriate to meet the stated objectives
‘Comparison of Fuzzy and Boolean classification methods for analysing tropical rainforest change: a Bolivian case study’
Evaluate Your Own Research Question
Is the topic interesting to you?
Does the question deal with a topic or issue that interests me enough to spark my own thoughts and opinions?
Is the question easily and fully researchable?
What type of data do I need to answer the research question?
The research question, "What impact has the ‘Agenda for Change’ had on the provision of Primary care in Nottingham" will require certain data:
statistics on Primary care before and after (illness, location)
statistics on Primary care providers, (type, location)
information about practice before and after ‘Agenda for Change’ recommendations
Evaluate Your Own Research Question
Is the scope of this information reasonable
Given the above …
…the type and scope of the information needed
is the question too broad, too narrow, or okay?
What sources will have the information that I need to answer the research question?
journals, books, Internet resources, government documents, people
Can I access these sources?
Given my answers to the above questions, do I have a good quality research question that I actually will be able to answer by doing research?
image1.png
NCGDissertation_USDatasets_ExampleProjects.pdf
US Datasets
§ LODES § LODES documentation
§Census data § ACS documentation
§Chicago Transportation Network Providers (TNP) §Chicago Building Permits §Chicago Business Licenses
§ Business License documentation
LODES
§ Longitudinal Employer-Household Dynamics (LEHD) Origin-Destination Statistics (LODES) § 2002-2017
§ Census block-level 1. Origin-Destination commuting: jobs
totals are associated with both a home block and a work block
2. Workplace Area Characteristics: jobs totaled by work block
3. Residence Area Characteristics: jobs totaled by home block
US Census
§American Community Survey (ACS): US Census Bureau § Population, demographic, social,
housing, economic data § 1-, 3-, 5-year rolling averages: 2005 –
2018 § Census tract and larger geographies
(margins of error)
Chicago TNP
§Chicago Open Data Portal § Transportation Network Providers (TNP)
§ Individual-level data on Uber/Lyft/Via trips geocoded to Census tract centroids
§ November 2019 – current
Chicago Building Permits
§Chicago Open Data Portal § Transportation Network Providers (TNP)
§ Individual-level data on Uber/Lyft/Via trips geocoded to Census tract centroids
§ November 2019 – current § Building permits
§ All building permits issued by City of Chicago § 2006 - 2019
Chicago Business Licenses
§Chicago Open Data Portal § Transportation Network Providers (TNP)
§ Individual-level data on Uber/Lyft/Via trips geocoded to Census tract centroids
§ November 2019 – current § Building permits
§ All building permits issued by City of Chicago § 2006 - 2019
§ Business licenses § All business licenses issued by City of Chicago § 2006 - 2019
Example projects
1. What are the commuting characteristics of the Chicago metropolitan area (2002-2015)?
§ Visualize commuting pattern change over time
§ Build a spatial interaction model to predict future travel patterns given some planned change, e.g., additional flows generated for projected population increase or based on new transportation investments
Example projects
2. Do we see an increase in workplace jobs in blocks around a transit line after it is built?
§ Seattle Central Link light rail opened in 2009
§ Use quasi-experimental modeling to estimate influence of transit line construction on # of station- adjacent jobs
3. Do we see an increase in gentrification in blocks around a pedestrian/bike investment after it is built?
§ Look at counts of new building permits and other socio- demographic indicators (using ACS data) change before and after the construction of a new transportation investment
Example projects
Example projects
4. Are increases in CTA station ridership correlated with increases in crime around that station?
Example projects
5. What is the spatial impact of a hospital closure on the residences and workplaces of healthcare workers?
Example projects
§Other ideas: § Employment impact (in surrounding areas) of construction of a new stadium § Impact of the implementation of a community development program in
Chicago on building permits § Characteristics of TNP travel patterns by neighborhood § Economic impact of TNP accessibility in Chicago § Retail market area delineation
Description2.html
Pollution Equity in London
A Brief Description
These datasets are obtained from the London Datastore https://data.london.gov.uk - each one relates to a kind of pollution - either potentially problematic materials in the air, or levels of noise. The key files are
- Rail_Lden_London.geojson
- Road_Lden_London.geojson
These measure daily average decibel levels for road- and rail-originated noise in London. In addition, the file pm25.csv gives average levels of \(\textsf{PM}_{\textsf{2.5}}\) per Output Area (OA) in London. OAs are the smallest geographical census units in the UK ( https://data.london.gov.uk/dataset/pm2-5-map-and-exposure-data). \(\textsf{PM}_{\textsf{2.5}}\) is the concentration of solid particles and liquid droplets with a diameter less than 2.5 micrometres, and is thought to be the air pollutant which has the greatest impact on human health.
Finally the file LOAC.geojson contains a map layer with the boundary each of the OAs in London, together with their OAC (output area classification). The OAC is a geodemographic classification applied to each OA used to describe the social and economic composition of its population. There are three levels of detail for the classification: Supergroup, Group and Subgroup with column names Sub,Group and Super. Typical examples of the codes are:
-
Supergroup H (Urban Settlements) : The areas are characterised by a slightly younger age structure than nationally, with higher proportions of all groups aged 45 and under (covering the age groups 0 to 4 years, 5 to 14 years and 25 to 44 years). Ethnic groups are over-represented compared with the national picture and households are more likely to live in semi-detached or terraced housing.
-
Group H2 (Suburban Traits) : This group has a higher proportion of people aged 25 to 44 years than the supergroup and a higher proportion have Chinese, Black, African, Caribbean or Black British ethnicity, and are likely to live in flats.
-
Subgroup H2a (City Periphery) : The subgroup has a slightly older age profile than its parent group and a larger proportion were born in the UK. Residents are more likely to live in a terraced property or flats.
Thus, ‘drilling down’ to more detailed levels gives a greater level of description. Full details can be found here: https://www.ons.gov.uk/methodology/geography/geographicalproducts/areaclassifications/2011areaclassifications/penportraitsandradialplots - note that in our data the Supergroups are labelled A-H, but are referred to in the description as 1-8, similar for the Groups (but with roles of letters and numbers reversed) and Subgroups.
The noise pollution data can be read in to R using the following code:
library(tidyverse) library(sf) rail_noise <- st_read('Rail_Lden_London.geojson') head(rail_noise)
Note that the noise levels (NoiseClass) are categorical and stored as characters. They can be converted into ordered factors.
NoiseLevels <- c("<=54.9","55.0-59.9","60.0-64.9","65.0-69.9","70.0-74.9",">=75.0" ) Order_Levs <- function(x) ordered(x,levels=NoiseLevels) rail_noise <- rail_noise %>% mutate(NoiseClass=Order_Levs(NoiseClass)) ggplot(rail_noise) + geom_sf(mapping=aes(fill=NoiseClass,col=NoiseClass))
Here, the colour coding corresponds to the noise level category. The lowest category (\(\le \textsf{54.9}\)) is left out of the geographical layer here.
The OAC data can also be read in and visualised.Firstly, here is a quick glance at the data
oac <- st_read('LOAC.geojson')
## Reading layer `LOAC' from data source `/Users/chrisbrunsdon/Dropbox/NCG603_sandbox/NCG616/London/LOAC.geojson' using driver `GeoJSON' ## Simple feature collection with 25053 features and 21 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: 503574.2 ymin: 155850.8 xmax: 561956.7 ymax: 200933.6 ## Projected CRS: OSGB 1936 / British National Grid
head(oac)
## Simple feature collection with 6 features and 21 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: 526546.8 ymin: 170665 xmax: 544029.6 ymax: 193902 ## Projected CRS: OSGB 1936 / British National Grid ## OA11CD LSOA11CD WD11CD_BF WD11NM_BF LAD11CD LAD11NM RGN11CD ## 1 E00023264 E01004612 E05000626 Tooting E09000032 Wandsworth E12000007 ## 2 E00003359 E01000692 E05000111 Chislehurst E09000006 Bromley E12000007 ## 3 E00023266 E01004615 E05000626 Tooting E09000032 Wandsworth E12000007 ## 4 E00020264 E01004027 E05000548 Riverside E09000028 Southwark E12000007 ## 5 E00023263 E01004613 E05000626 Tooting E09000032 Wandsworth E12000007 ## 6 E00007412 E01001492 E05000204 Lower Edmonton E09000010 Enfield E12000007 ## LSOA11NM USUALRES HHOLDRES COMESTRES POPDEN HHOLDS AVHHOLDSZ PunCare ## 1 Wandsworth 032C 462 459 3 115.2 143 3.2 42 ## 2 Bromley 002D 269 259 10 36.7 133 1.9 24 ## 3 Wandsworth 034B 277 277 0 183.4 133 2.1 26 ## 4 Southwark 003E 415 415 0 96.1 191 2.2 22 ## 5 Wandsworth 033D 304 304 0 165.2 131 2.3 22 ## 6 Enfield 025E 427 427 0 165.5 150 2.8 39 ## PLMTACT area OA Sub Super Group geometry ## 1 15.36 0.041 E00023264 C3d C C3 MULTIPOLYGON (((527635.5 17... ## 2 12.64 0.073 E00003359 F1a F F1 MULTIPOLYGON (((543581.5 17... ## 3 25.27 0.015 E00023266 B1a B B1 MULTIPOLYGON (((526613.5 17... ## 4 13.98 0.044 E00020264 B3a B B3 MULTIPOLYGON (((533563.5 17... ## 5 15.46 0.019 E00023263 G1b G G1 MULTIPOLYGON (((527822 1720... ## 6 13.11 0.027 E00007412 G2b G G2 MULTIPOLYGON (((535109.5 19...
There are quite a few variables - probably the key ones are the Supergroup, Group and Subgroup OAC classifications, the Output Area ID code (to link with other data), and some of the census-derived variables (ie POPDEN for population density). Here the pattern of Supergroup classification is mapped:
ggplot() + geom_sf(data=oac,mapping=aes(fill=Super),col=NA) + scale_fill_brewer(type='qual',palette = 'Set3')
Clearly there is a geographical pattern.
The file pm25.csv contains average densities of \(\textsf{PM}_{\textsf{2.5}}\) (for 2013, averaged across Output Areas) - this can be read in and joined to the OAC file.
pm25 <- read_csv('pm25.csv') head(pm25)
## # A tibble: 6 × 3 ## OA11CD LAD11NM PM252013me ## <chr> <chr> <dbl> ## 1 E00024024 Westminster 18.0 ## 2 E00023833 Westminster 18.2 ## 3 E00023830 Westminster 18.7 ## 4 E00023831 Westminster 17.9 ## 5 E00024021 Westminster 17.2 ## 6 E00023887 Westminster 17.5
The variable OA11CD is the Output Area code. LAD11NM is the name of the local authority district (from the 2011 census) and PM252013me is the average level of \(\textsf{PM}_{\textsf{2.5}}\). Note that OA11CD is also in the oac object, and so one can join this table to that one, and map the \(\textsf{PM}_{\textsf{2.5}}\) levels.
oac <- oac %>% left_join(pm25,by='OA11CD') %>% rename(pm25=PM252013me) ggplot() + geom_sf(data=oac,mapping=aes(fill=pm25),col=NA) + scale_fill_viridis_c()
Suggested Research Questions
- Are some social groups more exposed to high \(\textsf{PM}_{\textsf{2.5}}\) levels?
- Are some social groups more exposed to rail or road based noise levels than others?
- Are different groups subject to different kinds of pollution?
- Are there links between the different kinds of pollution?