Discussion Week 11
Hiding in Plain Sight:
Pattern Based Analysis and What Your Current Data Are Trying to Tell You
HCAD 610
1
Agenda
The Evolving Environment
Big Data
The Cloud
Opening the floodgates
New Computational Power
Standard Approaches, New Demands, New Capabilities
The Healthcare Data Challenge
Pattern Based Analytics
Health Applications
2
2
Executive Summary
This session will identify emerging analytical capabilities and their application, both to new technologies and for use with existing data.
During this session, case studies of identifying cost outliers and root causes for adverse outcomes are presented.
Students will gain an understanding of advances in analytics and their application to current operations.
3
Learning Objectives
4
Following this session, students will:
Be familiar with emerging concepts in data collection, processing, and analysis
Understand emerging concepts in data collection, processing, analysis, and the applicability of novel data technologies
Recognize ways to enhance processes, reduce costs, and improve quality
1946: ENIAC
5
2007: iPhone
19,000 vacuum tubes
Framing the Environment
Introductions
Within the span of a lifetime, computational power has gone from science fiction and a university project to everyday use
Today’s smart phones are more powerful than all the computers that paced man in the moon – combined
Technology continues to evolve at an ever-increasing pace
It’s All About Big Data
What is big data?
It’s everywhere: Hx, Sx, Dx, Tx, Rx, procedure codes, billing, sensors used to gather climate information, social media posts, digital pictures and videos, census charts, lab results, purchase transaction records, genome mapping, and mobile phone GPS signals…
Spans 4 dimensions:
Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even petabytes—of information.
Velocity: Sometimes 2 minutes is too late.
Variety: Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more.
Veracity: 1 in 3 business leaders don’t trust the information they use to make decisions.
Source: http://www-01.ibm.com/software/data/bigdata/
6
6
And Big Data in Healthcare
Everybody is talking about it
In Healthcare:
Structured data for analytics
Average length of stay = 3 days
approx. 120 patients per occupied bed per year
Each patient record could be as much as ~ 10,000 characters
A 1,000 bed facility = Data size ~ 1.2 GB per year
The vast amount of data created -- as much as 80 percent -- is unstructured (text, voice annotations, images)
Structured data SIZE for individual providers is not a major problem in this context
The key challenge is data sourcing, data extract, data consolidation, data cleaning, and data transformation
Source: http://www-01.ibm.com/software/data/bigdata/
7
7
And The Cloud
8
Cloud Computing
Originally, all data and processes resided on a central platform (the mainframe)
Advances in processing, speed, size, and storage drove functions to individual platforms
These same continuing developments eventually overwhelmed the local platforms
Cloud computing characteristics (NIST):
On-demand self-service.
Broad network access.
Resource pooling.
Rapid elasticity.
Measured service.
9
Big Data and the Cloud
“Cloud computing has opened another door to conducting data science on a large scale. With cloud computing, initial costs are minimised, the scaling of capacity is flexible, and access is more open and widespread. These characteristics look likely to generate an explosion of new understanding.”*
Absent a robust analytical mechanism, the cloud is simply fog at a higher elevation.
10
*http://www.newscientist.com/cloudup/article/in426
Big Data Challenges
“Along with the many opportunities, data-intensive science will also bring complex challenges. Many scientists are concerned that the data deluge will make it increasingly difficult to find data of relevance and to understand the context of shared data. Today, it is difficult for a person even to track and maintain their own health records. Now envision the magnitude, diversity and dispersed nature of data generated by life sciences, genetics and bioinformatics over the next 10 years.”
*http://www.newscientist.com/cloudup/article/in426
11
Drowning in Data
12
Source: President’s Council of Advisors on Science and Technology, Leadership Under Challenge: Information Technology R&D in a Competitive World An Assessment of the Federal Networking and Information Technology R&D Program 35 (Aug. 2007)
The data deluge refers to the situation where the sheer volume of new data being generated is overwhelming the capacity of institutions to manage it and researchers to make use of it.
Elementary, my dear Watson
Evolving clinical decision support system
February 2013:
first commercial application
utilization management decisions in lung cancer treatment
Memorial Sloan–Kettering Cancer and WellPoint
90 IBM Power 750 servers (plus additional I/O, network and cluster controller nodes in 10 racks) with a total of 2880 POWER7 processor cores and 16 Terabytes of RAM.
Hardware cost in the $ millions.
Widespread availability when?
13
We generate tremendous amounts of data
90% of all the world’s data has been generated over the last two years [Total 3.2 zettabytes (3.2×1021)]
In 5 years, the amount of digital information is expected to grow to 40 zettabytes
That’s 40,000,000,000,000,000,000,000 bytes
The Challenge
How to effectively use the data?
How do we make sense of data to help drive our decisions?
In the meantime…
Data Analysis Zone
15
Analytic Approaches
Who’s driving?
General purpose solutions (tools/platforms)
Excel, SAS, SPSS, Cognos, Tableau, QLikView…
New offerings emerges on a regular basis
Data mining
Business Intelligence (BI)
Statistics/Advanced Analytics
Prediction/Forecasting
Visual Exploration
Specialized solutions
Predictive applications
Forecasting
Scheduling
Process optimization
Both have relevance in healthcare
16
16
REPORTING
WHAT happened?
PREDICTING
WHAT WILL happen?
ANALYZING
WHY did it happen?
ACTING
MAKE IT happen!
Descriptive
Predictive
Prescriptive
17
The Goal is Prescriptive Use
17
18
Descriptive
Patient population management
Clinical quality and efficacy
Outliers for providers/patients
Coding errors and fraud
Predictive
Revenues in 30 and 90 days
Readmissions (patient with CHF will be readmitted to the hospital in 30 or 90 days)
Patient groups for risk adjustment
Prescriptive
Patient flow management
Accurate costing
Asset management
Norm
Outliers
Median
Average
For Example
18
Cost Forecast
2011 187 188 182 191 139 192 153 142 144 153 182 169 2012 142 140 133 128 165 124 125 159 156 125 192 197 2013 153 167 163 161 188 191 146 229 203 213 201 214
A Common Shortcoming
“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”
Sir Arthur Conan Doyle
(Sherlock Holmes, A Scandal in Bohemia)
25 June 1891
19
The Answer is often hiding in plain sight
20
Knowing Where To Look
How to discover the regions in the data that are most information-rich to identify targets of interest?
The Curse of Dimensionality
Exponential growth increases dimensionality of the data
Humans have difficulty visualizing, organizing, and analyzing data beyond 3 or 4 dimensions (data attributes)
Two Challenges of Big Data
Challenges of Big Data
21
Pattern Based Analytics applies Shannon’s Information Theory, combined with robust and scalable Machine Learning methods, to solve these two problems by
Finding the most information-rich, yet lower dimensionality, regions in the data that are characterized by “patterns”
Overcoming the Big Data Challenges
Cognitive Science and Patterns
“The decision-making processes of a human being are somewhat related to the recognition of patterns; for example, the next move in a chess game is based upon the present patterns on the board, and buying or selling stocks is decided by a complex pattern of information. The goal of pattern recognition is to clarify these complicated mechanisms of decision-making processes and to automate these functions.”
Source: Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, San Diego, CA: Academic Press, p. 1
Why Patterns?
Patterns are means for organizing large volumes of data
Patterns are shorthand for identifying complex, meaningful relationships involving multiple variables
They provide information on the underlying entity
The human brain functions in large part by identifying patterns
Patterns?
Cognitive Science on Patterns
24
“The ability to recognize patterns in the environment is critical for an organism’s survival.” (Sinha, 2002).
“The ability to spot existing or emerging patterns is one of the most (if not the most) critical skills in intelligent decision making, though we’re mostly unaware that we do it all the time.” (Miemis, 2010)
“Pattern recognition is the fundamental human cognition or intelligence.” (Pi et al., 2008)
Searching for Patterns
We do this for fun, but…
Not always for critical decision making.
25
Pattern Based Analytics
Pattern Based Analytics amplify human capability by
Automatically identifying patterns considering all dimensions
Reducing the complexity of understanding the data
Structuring results in prioritized, readily comprehensible graphic representation
Expanding a Natural Function
Pattern Based Analytics
Utility of Patterns
Pattern Based Analytics identify patterns
To understand what has happened, and why
To predict what might happen, understand why
To explore alternative actions in the context of possible future scenarios
Pattern Based Analytics
Pattern Based Analytics point to what is meaningful in a large, complex data set
Pattern Based Analytics is a “hypothesis generator”; it discovers the hypotheses that are not initially apparent
Encompassing all data points, including remote outliers
Eliminating process bias
Rapidly recognizing and prioritizing millions of patterns
Discovery
Identify poor providers
Compare costs versus satisfaction
Collect and prepare data
Identify outlier data with exploratory analysis
Discover key patterns automatically with a discovery engine to understand outlier behavior – why the poor performance…?
Drill into the patterns to identify possible solutions – maybe too many unnecessary procedures for poor performance, or expensive discharge dispositions?
Take action and monitor outlier behavior over time and assess impact of changes
29
Case Study
29
Example Data Characteristics
Data drawn from de-identified medical records
Source data contain ~4 million records, collected over a period of 5 years from ~100 health care providers
Data descriptors (183 attributes)
Person specific information such as gender, age, ethnicity, etc.
Encounter information such as :
Provider ID
Multiple diagnoses and codes
Multiple procedures and codes
Length stay, total costs, disposition, medical coverage type, etc.
Patient satisfaction quality indicator
Focus on cardiovascular disease
30
30
31
Zooming in (next slide)…
A Typical Management Tool
31
32
Hard to gain insight from this table!
A Typical Management Tool
32
33
Ranking attributes against cost
Quality has highest correlation
Quality measured by Patient Satisfaction Level
1 (lowest) to 5 (highest)
Then examine the (Quality, Cost) relationship
But Viewed via Patterns
33
34
High cost outliers (blue) identified with low quality
Graphic Representation Illustrates Areas of Concern
Low frequency, high impact outliers
34
35
48 patients
98 patients
Examining patterns for high cost, lowest quality
Top Pattern: Number of Procedures = 10 or more
Second Pattern: 2 - 4 procedures under Medicare
Delve into Contributing Factors
35
36
Patterns start to emerge.
Further Exploration: The “Who”
36
37
What are the key procedures involved in these outcomes?
Zooming In…
Top 4 providers (out of 111) account for 46% of poor outcomes within this pattern
37
38
Percutaneous transluminal coronary angioplasty (PTCA) is the dominant principal procedure!
How about secondary procedures?
Looking for Contributing Factors
38
39
Contributing Factors
Next:
Examine patient data to gain insight on additional procedure
Dominant secondary procedure
“Ins drug-elut coronary stent”
39
40
Patient Data for more specific sub-population may suggest options...
Contributing Factors
40
Interesting Findings
41
Analysis of high cost patients in top pattern
Average cost for 65 yrs+ PTCA females: $421K
Average cost for 65 yrs+ PTCA males: $212K
RESULT:
Proactively monitor this sub-population at high cost providers as they enter the system to potentially reduce costs and increase patient satisfaction
42
Secondary Findings
Post discharge Costs:
Home Health Service: $499,862.00
Skilled Nursing/Intermediate Care within admitting hospital : $282,779.50
57% cost saving in-house
42
Summary of Example
Data slicing and dicing did not reveal any insights
Exploration identified an unusual and non-optimal (Quality, Cost) relationship for Quality Level 1
Pattern-Based Discovery identified two dominant outlier patterns that explain this relationship
Drilling down into the top pattern provided a view into:
the dominant providers, principal and secondary procedures
the underlying data that provide insight into the role of additional procedures and other factors
Monitor 65yrs+ females undergoing PTCA at high cost providers to reduce costs and improve satisfaction
What–if analysis on Discharge Disposition suggested an option of transferring patients to SN/IC facilities to reduce costs
43
43
Advanced Applications
Precision Medicine
Understanding Efficacy of Treatment
What molecules in patient cells are important to specific treatment of a disease?
What combination and concentration of molecules are important for
Patient triaging as to efficacy of treatment
Additional understanding of the disease mechanism for further improvement in drug development
Dataset: Patient molecular data related to treatment of inflammatory disease as supplied by a biotech startup working with Big Pharma
Results:
Patterns detected in 10 seconds with 88% accuracy
Traditional analysis takes 2 days.
References
Eastwood, B. (2013). Big Data Analytics Use Cases for Healthcare IT. CIO, http://www.cio.com/slideshow/detail/126493?goback=%252Egde_2712281_member_5801855037156114432#slide1
Eastwood, B. (2013). Can Healthcare Big Data Reality Live Up to Its Promise? CIO, http://www.cio.com/article/738121/Can_Healthcare_Big_Data_Reality_Live_Up_to_Its_Promise_?page=3&taxonomyId=3147
Hey, T. (n.d.). Big Data is Transforming Science, NewScientist, http://www.newscientist.com/cloudup/article/in426
IBM (n.d.) IBM Big Data Platformhttp://www-01.ibm.com/software/data/bigdata
Miemis, V. (2010). Essential Skills for 21st Century Survival: Part I: Pattern Recognition, emergent by design. http://emergentbydesign.com/2010/04/05/essential-skills-for-21st-century-survival-part-i-pattern-recognition/
45
References
Pi, Y., Liao, W., Liu, M., & Lu, J. (2008). Theory of Cognitive Pattern Recognition, Pattern Recognition Techniques, Technology and Applications, Peng-Yeng Yin (Ed.), ISBN: 978-953-7619-24-4, InTech, http://www.intechopen.com/books/pattern_recognition_techniques_technology_and_applications/theory_of_cognitive_pattern_recognition
President’s Council of Advisors on Science and Technology (2007). Leadership Under Challenge: Information Technology R&D in a Competitive World - An Assessment of the Federal Networking and Information Technology R&D Program 35
Sinha, P. (2002). Recognizing complex patterns. Nature -- Neuroscience Supplement. Vol. 5, pp 1093-1097 Doi: 10.1038/nn949
46