Business Problem Solving
Sophisticated analysis: The big guns of analysis
Dr. Stephen Hills
Learning objective
To undertake sophisticated analysis that can answer research questions about the issues identified in your problem disaggregation and prioritization to inform problem solutions.
Sophisticated analysis
The seven-steps process
How do you define a problem in a precise way to meet the decision maker’s needs?
How do you disaggregate the issues and develop hypotheses to be explored?
How do you prioritize what to do and what not to do?
How do you develop a workplan and assign analytical tasks?
How do you decide on the fact gathering and analysis to resolve the issues, while avoiding cognitive biases?
How do you go about synthesizing the findings to highlight insights?
How do you communicate them in a compelling way?
Step 5: Conduct critical analyses
Start with heuristics – short cuts or rules of thumb – to get an order of magnitude understanding of each component and assess priorities
Understand where there is a need for more work and for more complex techniques
Make frequent use of one-day answers
Sophisticated analysis
You may be faced with a complex problem that really does require a robustly quantified solution:
Have you adequately framed the problem you face, and the hypothesis you want to test, so that it’s clear you do need more firepower?
Is there data available to support using an advanced analytic tool?
Which tool is the right one for your problem?
Is there user-friendly software available to help you use some of these tools?
Which big gun to choose?
Selecting an analysis approach
Are you primarily trying to understand the drivers of causation of your problem (how much each element contributes and in what direction), or are you primarily trying to predict a state of the world in order to make a decision?
The first question leads you mostly down the left-hand branch into various statistical analyses, including creating or discovering experiments.
The second question leads you mostly down the right-hand side of the tree into forecasting models, the family of machine or deep learning algorithms, and game theory.
Data visualization: Where to live due to air quality
Data visualization: Clusters and hotspots – London air quality
Problem: Where to live in London for good health.
The air that we breath is key to our health.
Feeding in data on accident and emergency admissions against postcodes and data on air quality (measurement of particles in the air that affect respiratory health) shows a positive correlation that can inform decision-making on where to live in London.
Free tool from UK Government: https://dataingovernment.blog.gov.uk/2016/03/30/free-tools-to-quickly-show-postcode-data-on-a-map/
Regression models: Obesity
Regression models for cause: Obesity
Regression analysis is a powerful analytic tool to understand the underlying drivers of the problem of obesity.
It shows us where to look for solutions.
Data was gathered on 68 US cities for the outcome variable of obesity prevalence and hypothesised predictors of educational attainment, median household income, city walkability and climate comfort score (suitability of weather to physical activity).
Regression analysis found that education, income, walkability, and comfort score are all negatively associated with obesity prevalence.
In other words, as these predictors increase, obesity prevalence decreases.
Regression models: Obesity
Income accounts for 71% of the variance in obesity prevalence.
Having a household income of $80k vs. $60k is associated with a 7% point drop in obesity prevalence (a 30% reduction in obesity).
A model explaining 82% of variance in obesity included income, education, comfort, walkability, and an income/education term (used because income and education were highly correlated).
Experiments: RCTs and A/B Testing
Experiments: RCTs and A/B Testing
Randomised controlled trials involve randomly assigning participants to experience two different conditions.
Because participants are randomly assigned to condition, we can be confident that the two groups are equal in all respects other than experiencing A or B.
Therefore, we can be completely confident and differences in the outcome measured (e.g., purchasing behaviour) is due to having experienced A or B (e.g., presence of special offer).
Natural Experiments: Voter prejudice
Natural Experiments: Voter prejudice
RCTs are not always possible, but a natural variation can act like random allocation to a condition.
It is random whether or not a participant is presented with a delegate with a minority name or not.
Therefore, we can be confident that any difference between expected values and actual votes can be attributed to prejudice.
Simulations/Regression: Climate change
Simulation/Regression models for prediction : Climate change
Previously we looked at how a regression model can be used to determine the causes of obesity.
However, the same method (used exactly the same way) can also be used to predict an outcome.
We get a regression equation with coefficients (i.e., ratios) based upon the observed data.
We can then input hypothetical data into that equation to model what would happen in that hypothetical situation.
We could input different climate scenarios into a model to predict outcomes on the basis of a model based on observed data.
Game theory: Going to court
Game theory: Going to court
We can use game theory to work through our own choices and competitor choices.
To assess how we should respond to an opponent’s moved, we can simulate the position of competing parties and make a series of moves that the opposing party has to respond to.
Conclusions
Conclusions
To get to a solution for many complex problems may require sophisticated analytic tools.
To do this you need to understand your research question and the nature of your data.
RCTs are the gold standard for determining cause and effect, but where these are not possible you might be able to use a natural experiment or model causes using regression.
Regression can also be used to predict an outcome by constructing a model with observed data and inputting hypothetical data.
Game theory encourages you to think through different scenarios depending on the move of a competitor.
Bayesian statistics calculate probability of something under different conditions.
Workshop: Bayesian statistics and the Space Shuttle Challenger Disaster
Bayesian statistics: The Space Shuttle Challenger Disaster
The Space Shuttle Challenger Disaster was a problem solving failure
A failure to accurately risk assess O-ring damage, which could have been best assessed with Bayesian statistics.
Bayesian statistics are useful in incomplete data environments as a way of assessing conditional probability.
Conditional probability occurs in situations where a set of probable outcomes depends in turn on another set of conditions that are also probabilistic.
O-ring failure under low temperatures was the probable cause of the Challenger space shuttle erupting shortly after lift-off.
There was an unusually low temperature at launch of 31 degrees Fahrenheit (22 degrees below the minimum of previous launches).
In other words, there was incomplete data.
The disaster investigation subsequently found that O-rings are five times more responsive at 75 degrees that at 30 degrees.
Bayesian statistics: The Space Shuttle Challenger Disaster
Engineers only had data on flights where the temperature at launch was in the range of 53 to 81 degrees.
The data that they should have been looking at was the data on all flights in relation to temperature and O-ring damage.
This provides a different picture:
For temperatures below 65 degrees, all four (100%) flights had incidents.
Above 65 degrees only three out of twenty (15%) had damage incidents.
On the basis of this data, in groups, estimate likelihood of O-ring failure.
Bayesian statistics: The Space Shuttle Challenger Disaster
From the data, we can conclude that the overall prior probability of failure is 29% (failure of seven O-rings in 24 flights): 7/24*100 = 30.4%
However, the conditional probability (i.e., that incidents are more likely as temperature declines) of launching when the temperature is 31 degrees is 99.8%.
Although four out of four (100%) launches below 60 degrees were failures, this is a small sample size, so we need to fit a distribution to the data, thus reducing the likelihood of failure to below 100%.