discussion

profileusername of me
ALHEChapter2_Healthcare_Data.pptx

Chapter 2: Healthcare Data, Information and Knowledge

Elmer Bernstam MD

Todd Johnson PhD

Trevor Cohen MD PhD

After reviewing these slides the viewer should be able to:

Define data, information, and knowledge

Understand how vocabularies convert data to information

Describe methods that convert information to knowledge

Distinguish informatics from other computational disciplines, particularly computer science

Describe the differences between data-centric and information-centric technology

Learning Objectives

Data are symbols or observations reflecting differences in the world. Example = 250.00 (Note: data is the plural of datum)

Information is data with meaning. Example = ICD-9 code of 250.00 means type 2 diabetes

Knowledge is information that is justifiably believed to be true. Example = obese patients are more likely to develop type 2 diabetes

Introduction

Computers generate and analyze binary information: zero (off) and one (on). Each zero or one is a bit; a series of 8 bits is a byte. Note that these bits and bytes have no meaning per se

Bits can occur as various data types

Integers such as 345 or 669988

Floating point numbers such as 14.1 or -1.23

Characters such as a or z

Character strings such as “hello” or “goodbye”

Introduction

Data can be aggregated into a variety of formats such as image files (JPG, GIG, PNG), text files, sound files (WAV, MP3) or video files (WMV, MP4)

Recognize that these formats do not define what information is available, just the category format

Data are the domain of computer scientists, but information is the domain of informatics and informaticians

Introduction

Information retrieval involves both computer science (data) and informatics (information). See image below

Introduction

Computer data not only lacks meaning, but must includes dates and other qualifiers to gain significance. For example, blood glucose = 127. Was that mg/dl, was the sample drawn fasting, etc.

Everything must be standardized, otherwise computer B will not understand data transmitted from computer A (i.e. data won’t be interoperable)

Data and Information

A modern way to convert medical information to knowledge is to use a clinical data warehouse (CDW)

EHRs are now a huge source of healthcare data and information. They contain both structured (coded e.g. ICD-9 codes) and unstructured text (free text or natural language)

Interpreting free text requires natural language processing (NLP)

Information to Knowledge

Data from EHRs, Radiology, Pathology, etc. are copied into a staging database where they are cleaned and loaded into another common database and associated with meta data (data that describes data). ICD-type data is an example of meta data

Tools can be applied to the data in the CDW, such as simple descriptive analytics that reports the number of patients with breast cancer, their age, menopausal status, etc. More about this in chapter 3

CDWs do a better job of analyzing and reporting aggregate healthcare data than the average EHR, which tends to focus on the individual

Clinical Data Warehouse

CDWs can be used to evaluate a critical clinical process, cost estimates and they can analyze potential solutions

CDWs are highly valuable for informatics and evidence based medical research

CDWs can help track infections and report trends to public health

Next slide shows a typical CDW schema

Clinical Data Warehouse

Clinical Data Warehouse

ETL = extract, transfer and load

Informatics for Integrating Biology and the Bedside (i2b2) is a Harvard project used by many other academic institutions in the US

The program is open source and modular and incorporates genomic and clinical information for research purposes

Data base consists of facts (diagnoses, lab results, etc.) queried by users and dimensions that describe the facts

With this model data can be aggregated from multiple hospitals

i2b2 platform https://www.i2b2.org

i2b2 star schema

In order to extract concepts from free text in EHRs or CDWs several systems have been developed. See below

Concept Extraction

Concept Extractor Gold Standard Precision Recall F-score (F­­1)
cTAKES17 Mayo clinic 0.80 0.65 0.72
MetaMap20 NLM 500 articles 0.32 0.53 0.40
MEDLEE21 Proprietary 0.86 0.77 0.81

With other industries such as banking, data and information are much closer (smaller semantic gap).

For example, banking data such as $100.50 is close to an account balance of $100.50. It leaves little leeway for a different interpretation

In healthcare, there are subjective factors (“I feel sick”) that are difficult to measure and vary from patient to patient and physician to physician. Lab results are more objective and easier to interpret

What Makes Informatics Difficult?

What Makes Informatics Difficult?

It is difficult to model all of healthcare. View the HL7 RIM model on next slide

Biomedical information is difficult due to incomplete, imprecise, vague, inconsistent and uncertain information

Humans can adapt to this dynamic and vague information but computers can not. Clinical decision support in EHRs is precise, when in reality it might need to be flexible over time

HL7 version 3 RIM model

Health IT is an attractive solution to our troubled healthcare system, but is it realistic?

Other IT fields have experienced serious “ups and downs” such as artificial intelligence

There is a large gap between healthcare data generated and information (semantic gap)

Is it too early to expect EHRs and computerization to change healthcare?

Why Health IT Fails Sometimes

Computer scientists focus on data, while informaticists focus on information

There is a gap between healthcare data and information (semantic gap)

The transformation of information into knowledge is a primary goal of informaticists

Clinical data warehouses are increasing used to research clinical questions and generate knowledge from information

Conclusions

19