Discussion Week 11

profileNk655
Week11-PatternBasedAnalytics.pptx

Hiding in Plain Sight:

Pattern Based Analysis and What Your Current Data Are Trying to Tell You

HCAD 610

1

Agenda

The Evolving Environment

Big Data

The Cloud

Opening the floodgates

New Computational Power

Standard Approaches, New Demands, New Capabilities

The Healthcare Data Challenge

Pattern Based Analytics

Health Applications

2

2

Executive Summary

This session will identify emerging analytical capabilities and their application, both to new technologies and for use with existing data. 

During this session, case studies of identifying cost outliers and root causes for adverse outcomes are presented. 

Students will gain an understanding of advances in analytics and their application to current operations. 

3

Learning Objectives

4

Following this session, students will:

Be familiar with emerging concepts in data collection, processing, and analysis

Understand emerging concepts in data collection, processing, analysis, and the applicability of novel data technologies

Recognize ways to enhance processes, reduce costs, and improve quality

1946: ENIAC

5

2007: iPhone

19,000 vacuum tubes

Framing the Environment

Introductions

Within the span of a lifetime, computational power has gone from science fiction and a university project to everyday use

Today’s smart phones are more powerful than all the computers that paced man in the moon – combined

Technology continues to evolve at an ever-increasing pace

It’s All About Big Data

What is big data?

It’s everywhere: Hx, Sx, Dx, Tx, Rx, procedure codes, billing, sensors used to gather climate information, social media posts, digital pictures and videos, census charts, lab results, purchase transaction records, genome mapping, and mobile phone GPS signals…

Spans 4 dimensions:

Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even petabytes—of information.

Velocity: Sometimes 2 minutes is too late.

Variety: Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more.

Veracity: 1 in 3 business leaders don’t trust the information they use to make decisions.

Source: http://www-01.ibm.com/software/data/bigdata/

6

6

And Big Data in Healthcare

Everybody is talking about it

In Healthcare:

Structured data for analytics

Average length of stay = 3 days

approx. 120 patients per occupied bed per year

Each patient record could be as much as ~ 10,000 characters

A 1,000 bed facility = Data size ~ 1.2 GB per year

The vast amount of data created -- as much as 80 percent -- is unstructured (text, voice annotations, images)

Structured data SIZE for individual providers is not a major problem in this context

The key challenge is data sourcing, data extract, data consolidation, data cleaning, and data transformation

Source: http://www-01.ibm.com/software/data/bigdata/

7

7

And The Cloud

8

Cloud Computing

Originally, all data and processes resided on a central platform (the mainframe)

Advances in processing, speed, size, and storage drove functions to individual platforms

These same continuing developments eventually overwhelmed the local platforms

Cloud computing characteristics (NIST):

On-demand self-service.

Broad network access.

Resource pooling.

Rapid elasticity.

Measured service.

9

Big Data and the Cloud

“Cloud computing has opened another door to conducting data science on a large scale. With cloud computing, initial costs are minimised, the scaling of capacity is flexible, and access is more open and widespread. These characteristics look likely to generate an explosion of new understanding.”*

Absent a robust analytical mechanism, the cloud is simply fog at a higher elevation.

10

*http://www.newscientist.com/cloudup/article/in426

Big Data Challenges

“Along with the many opportunities, data-intensive science will also bring complex challenges. Many scientists are concerned that the data deluge will make it increasingly difficult to find data of relevance and to understand the context of shared data. Today, it is difficult for a person even to track and maintain their own health records. Now envision the magnitude, diversity and dispersed nature of data generated by life sciences, genetics and bioinformatics over the next 10 years.”

*http://www.newscientist.com/cloudup/article/in426

11

Drowning in Data

12

Source: President’s Council of Advisors on Science and Technology, Leadership Under Challenge: Information Technology R&D in a Competitive World An Assessment of the Federal Networking and Information Technology R&D Program 35 (Aug. 2007)

The data deluge refers to the situation where the sheer volume of new data being generated is overwhelming the capacity of institutions to manage it and researchers to make use of it.

Elementary, my dear Watson

Evolving clinical decision support system

February 2013:

first commercial application

utilization management decisions in lung cancer treatment

Memorial Sloan–Kettering Cancer and WellPoint

90 IBM Power 750 servers (plus additional I/O, network and cluster controller nodes in 10 racks) with a total of 2880 POWER7 processor cores and 16 Terabytes of RAM.

Hardware cost in the $ millions.

Widespread availability when?

13

We generate tremendous amounts of data

90% of all the world’s data has been generated over the last two years [Total 3.2 zettabytes (3.2×1021)]

In 5 years, the amount of digital information is expected to grow to 40 zettabytes

That’s 40,000,000,000,000,000,000,000 bytes

The Challenge

How to effectively use the data?

How do we make sense of data to help drive our decisions?

In the meantime…

Data Analysis Zone

15

Analytic Approaches

Who’s driving?

General purpose solutions (tools/platforms)

Excel, SAS, SPSS, Cognos, Tableau, QLikView…

New offerings emerges on a regular basis

Data mining

Business Intelligence (BI)

Statistics/Advanced Analytics

Prediction/Forecasting

Visual Exploration

Specialized solutions

Predictive applications

Forecasting

Scheduling

Process optimization

Both have relevance in healthcare

16

16

REPORTING

WHAT happened?

PREDICTING

WHAT WILL happen?

ANALYZING

WHY did it happen?

ACTING

MAKE IT happen!

Descriptive

Predictive

Prescriptive

17

The Goal is Prescriptive Use

17

18

Descriptive

Patient population management

Clinical quality and efficacy

Outliers for providers/patients

Coding errors and fraud

Predictive

Revenues in 30 and 90 days

Readmissions (patient with CHF will be readmitted to the hospital in 30 or 90 days)

Patient groups for risk adjustment

Prescriptive

Patient flow management

Accurate costing

Asset management

Norm

Outliers

Median

Average

For Example

18

Cost Forecast

2011 187 188 182 191 139 192 153 142 144 153 182 169 2012 142 140 133 128 165 124 125 159 156 125 192 197 2013 153 167 163 161 188 191 146 229 203 213 201 214

A Common Shortcoming

“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”

Sir Arthur Conan Doyle

(Sherlock Holmes, A Scandal in Bohemia)

25 June 1891

19

The Answer is often hiding in plain sight

20

Knowing Where To Look

How to discover the regions in the data that are most information-rich to identify targets of interest?

The Curse of Dimensionality

Exponential growth increases dimensionality of the data

Humans have difficulty visualizing, organizing, and analyzing data beyond 3 or 4 dimensions (data attributes)

Two Challenges of Big Data

Challenges of Big Data

21

Pattern Based Analytics applies Shannon’s Information Theory, combined with robust and scalable Machine Learning methods, to solve these two problems by

Finding the most information-rich, yet lower dimensionality, regions in the data that are characterized by “patterns”

Overcoming the Big Data Challenges

Cognitive Science and Patterns

“The decision-making processes of a human being are somewhat related to the recognition of patterns; for example, the next move in a chess game is based upon the present patterns on the board, and buying or selling stocks is decided by a complex pattern of information. The goal of pattern recognition is to clarify these complicated mechanisms of decision-making processes and to automate these functions.”

Source: Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, San Diego, CA: Academic Press, p. 1

Why Patterns?

Patterns are means for organizing large volumes of data

Patterns are shorthand for identifying complex, meaningful relationships involving multiple variables

They provide information on the underlying entity

The human brain functions in large part by identifying patterns

Patterns?

Cognitive Science on Patterns

24

“The ability to recognize patterns in the environment is critical for an organism’s survival.” (Sinha, 2002).

“The ability to spot existing or emerging patterns is one of the most (if not the most) critical skills in intelligent decision making, though we’re mostly unaware that we do it all the time.” (Miemis, 2010)

“Pattern recognition is the fundamental human cognition or intelligence.” (Pi et al., 2008)

Searching for Patterns

We do this for fun, but…

Not always for critical decision making.

25

Pattern Based Analytics

Pattern Based Analytics amplify human capability by

Automatically identifying patterns considering all dimensions

Reducing the complexity of understanding the data

Structuring results in prioritized, readily comprehensible graphic representation

Expanding a Natural Function

Pattern Based Analytics

Utility of Patterns

Pattern Based Analytics identify patterns

To understand what has happened, and why

To predict what might happen, understand why

To explore alternative actions in the context of possible future scenarios

Pattern Based Analytics

Pattern Based Analytics point to what is meaningful in a large, complex data set

Pattern Based Analytics is a “hypothesis generator”; it discovers the hypotheses that are not initially apparent

Encompassing all data points, including remote outliers

Eliminating process bias

Rapidly recognizing and prioritizing millions of patterns

Discovery

Identify poor providers

Compare costs versus satisfaction

Collect and prepare data

Identify outlier data with exploratory analysis

Discover key patterns automatically with a discovery engine to understand outlier behavior – why the poor performance…?

Drill into the patterns to identify possible solutions – maybe too many unnecessary procedures for poor performance, or expensive discharge dispositions?

Take action and monitor outlier behavior over time and assess impact of changes

29

Case Study

29

Example Data Characteristics

Data drawn from de-identified medical records

Source data contain ~4 million records, collected over a period of 5 years from ~100 health care providers

Data descriptors (183 attributes)

Person specific information such as gender, age, ethnicity, etc.

Encounter information such as :

Provider ID

Multiple diagnoses and codes

Multiple procedures and codes

Length stay, total costs, disposition, medical coverage type, etc.

Patient satisfaction quality indicator

Focus on cardiovascular disease

30

30

31

Zooming in (next slide)…

A Typical Management Tool

31

32

Hard to gain insight from this table!

A Typical Management Tool

32

33

Ranking attributes against cost

Quality has highest correlation

Quality measured by Patient Satisfaction Level

1 (lowest) to 5 (highest)

Then examine the (Quality, Cost) relationship

But Viewed via Patterns

33

34

High cost outliers (blue) identified with low quality

Graphic Representation Illustrates Areas of Concern

Low frequency, high impact outliers

34

35

48 patients

98 patients

Examining patterns for high cost, lowest quality

Top Pattern: Number of Procedures = 10 or more

Second Pattern: 2 - 4 procedures under Medicare

Delve into Contributing Factors

35

36

Patterns start to emerge.

Further Exploration: The “Who”

36

37

What are the key procedures involved in these outcomes?

Zooming In…

Top 4 providers (out of 111) account for 46% of poor outcomes within this pattern

37

38

Percutaneous transluminal coronary angioplasty (PTCA) is the dominant principal procedure!

How about secondary procedures?

Looking for Contributing Factors

38

39

Contributing Factors

Next:

Examine patient data to gain insight on additional procedure

Dominant secondary procedure

“Ins drug-elut coronary stent”

39

40

Patient Data for more specific sub-population may suggest options...

Contributing Factors

40

Interesting Findings

41

Analysis of high cost patients in top pattern

Average cost for 65 yrs+ PTCA females: $421K

Average cost for 65 yrs+ PTCA males: $212K

RESULT:

Proactively monitor this sub-population at high cost providers as they enter the system to potentially reduce costs and increase patient satisfaction

42

Secondary Findings

Post discharge Costs:

Home Health Service: $499,862.00

Skilled Nursing/Intermediate Care within admitting hospital : $282,779.50

57% cost saving in-house

42

Summary of Example

Data slicing and dicing did not reveal any insights

Exploration identified an unusual and non-optimal (Quality, Cost) relationship for Quality Level 1

Pattern-Based Discovery identified two dominant outlier patterns that explain this relationship

Drilling down into the top pattern provided a view into:

the dominant providers, principal and secondary procedures

the underlying data that provide insight into the role of additional procedures and other factors

Monitor 65yrs+ females undergoing PTCA at high cost providers to reduce costs and improve satisfaction

What–if analysis on Discharge Disposition suggested an option of transferring patients to SN/IC facilities to reduce costs

43

43

Advanced Applications

Precision Medicine

Understanding Efficacy of Treatment

What molecules in patient cells are important to specific treatment of a disease?

What combination and concentration of molecules are important for

Patient triaging as to efficacy of treatment

Additional understanding of the disease mechanism for further improvement in drug development

Dataset: Patient molecular data related to treatment of inflammatory disease as supplied by a biotech startup working with Big Pharma

Results:

Patterns detected in 10 seconds with 88% accuracy

Traditional analysis takes 2 days.

References

Eastwood, B. (2013). Big Data Analytics Use Cases for Healthcare IT. CIO, http://www.cio.com/slideshow/detail/126493?goback=%252Egde_2712281_member_5801855037156114432#slide1

Eastwood, B. (2013). Can Healthcare Big Data Reality Live Up to Its Promise? CIO, http://www.cio.com/article/738121/Can_Healthcare_Big_Data_Reality_Live_Up_to_Its_Promise_?page=3&taxonomyId=3147

Hey, T. (n.d.). Big Data is Transforming Science, NewScientist, http://www.newscientist.com/cloudup/article/in426

IBM (n.d.) IBM Big Data Platformhttp://www-01.ibm.com/software/data/bigdata

Miemis, V. (2010). Essential Skills for 21st Century Survival: Part I: Pattern Recognition, emergent by design. http://emergentbydesign.com/2010/04/05/essential-skills-for-21st-century-survival-part-i-pattern-recognition/

45

References

Pi, Y., Liao, W., Liu, M., & Lu, J. (2008). Theory of Cognitive Pattern Recognition, Pattern Recognition Techniques, Technology and Applications, Peng-Yeng Yin (Ed.), ISBN: 978-953-7619-24-4, InTech, http://www.intechopen.com/books/pattern_recognition_techniques_technology_and_applications/theory_of_cognitive_pattern_recognition

President’s Council of Advisors on Science and Technology (2007). Leadership Under Challenge: Information Technology R&D in a Competitive World - An Assessment of the Federal Networking and Information Technology R&D Program 35

Sinha, P. (2002). Recognizing complex patterns. Nature -- Neuroscience Supplement. Vol. 5, pp 1093-1097 Doi: 10.1038/nn949

46