EHR summary

lya925

EHR.pdf

Home >English homework help >EHR summary

Computable Phenotypes: Standardized Ways to Classify People Using Electronic Health Record Data

Computable Phenotypes: Standardized Ways

to Classify People Using Electronic Health

Record Data

by Lilia Verchinina, PhD; Lisa Ferguson, MSI; Allen Flynn, PharmD; Michelle Wichorek, PhD;

and Dorene Markel, MS, MHSA

Abstract

Computable phenotypes (CPs) are an increasingly important structured and reproducible method of

using electronic health record data to classify people. CPs have the potential to provide important benefits

to health information management (HIM) professionals in their everyday work. A CP is a precise

algorithm, including inclusion and exclusion criteria, that can be used to identify a cohort of patients with

a specific set of observable and measurable traits. With the use of CPs, a series of technical steps can be

taken to automatically identify people with specific traits, such as people with a particular disease or

condition. CPs were first used outside of the HIM domain for clinical trials and network-based research.

Because CPs are becoming more easily shareable, they have the potential to be used by HIM

professionals to help improve coding, reporting, management, sharing, and reuse of clinical information.

Keywords: computable phenotype, electronic health record (EHR), cohort, patient data

Introduction

Health information management (HIM) professionals are stewards of health information. They have

expertise in the use of information from electronic health records (EHRs) to meet business objectives and

to solve problems for clinicians, caregivers, patients, and families. As different methods of using patient

data from EHRs arise, HIM professionals strive to understand and support them. Here, we discuss the use

of a type of specification incorporating EHR data called a computable phenotype (CP). A CP is a precise,

shareable, reproducible, and documented method for using EHR data to categorize people for a variety of

purposes. CPs are currently being used in biomedical research. Because CPs specify how to identify

cohorts of individuals using EHR data, they can be useful in the work of HIM professionals.

To better explain CPs, we begin by defining what a phenotype is, and then we discuss the definition

of the term computable phenotype. According to the glossary at genome.gov, a phenotype is a composite

of an individual person’s “observable traits, such as their height, eye color, and blood type.”1 The prefix

pheno- means “showing.” A phenotype is something that can be observed. The suffix -type suggests a

typology, or classification scheme. Hence, a phenotype is a way of classifying organisms, including

people, based on characteristics they have that are observable and measurable.

Building on the definition of phenotype above, the term computable phenotype refers to a shareable

and reproducible algorithm precisely defining a condition, disease, complex patient characteristic, or

clinical event using only data processed by a computer, principally EHR data.2 As a shareable and

reproducible algorithm, a CP is a tool that can be used to identify patient cohorts with a specific medical

2 Perspectives in Health Information Management, Fall 2018

condition of interest associated with a specific set of observable and measurable traits. Further, because

they are shareable and provide formal specifications of diseases and events, CPs can serve as standards,

allowing patient cohorts to be compared and combined more easily than they are today.

Although this definition of a CP is a good start, the definition is not fully settled. Robinson asserts

that a CP is a standardized method for capturing phenotypic manifestations of disease.3 Richesson et al.

suggest that CPs are intended to be based solely on data that can be processed by a computer, that is, EHR

data.4 For our purpose here, we will consider CPs to be formally specified, written algorithms that detail a

list of EHR data and serve to define inclusion and exclusion criteria for a specific disease cohort.

We believe that CPs are growing in importance beyond their original use to establish patient cohorts

for clinical trials with EHR data. We anticipate that HIM professionals will soon be likely to encounter

CPs in their work because CPs can also be used to determine treatment eligibility and to assess the quality

of record coding, among other possible clinical, research, public health, and business uses.

A Look at Computable Phenotypes in General

To further explain CPs and how they are used, we outline a three-step process for generating patient

cohorts using a CP. On the basis of a literature review5–9 and our experience using CPs to inform EHR

data queries for research,10–13 we have developed a process diagram to show how CPs have generally been

used to classify patients (see Figure 1).

The process described in Figure 1 begins on the far left with a CP, or an algorithm that precisely

describes a condition, a disease, or a set of traits. Like all CPs, this CP provides a standard algorithm for

including and excluding individuals in a cohort based on criteria documented in EHR records.

To implement a CP locally, the user must transform the CP algorithm into a database query. The

query is created by mapping the inclusion and exclusion criteria documented in the CP to a defined set of

data elements and logical expressions that are specific to the EHR system, clinical data repository, or

other data source being used.14 This query is then run against the source database to generate a patient

cohort representing those individuals that meet the CP algorithm. Because different source databases have

different schemas, CPs cannot include actual database queries. Instead, the role of the CP is to provide a

standard algorithm, with inclusion and exclusion criteria, so that HIM professionals, database

administrators, clinicians, and others can construct similar database queries that fit their own source

databases.

Hence, as a specified algorithm, the CP accurately, but only generally, communicates a set of

standard inclusion and exclusion criteria. Its criteria define a condition or cohort so that the results of

local database queries are directly comparable across time and place. For this reason, CPs are particularly

helpful for research.

Because source databases typically include many tables of data, queries that correspond to CPs often

involve complex “join” functions that combine data from many tables in a source database according to

the criteria included in a CP. The complexity of this work increases further when data from multiple

sources are queried. In such cases, patient data must first be merged from the various sources with the use

of unique patient identifiers. Such unique identifiers have been described in the HIM literature

previously.15 This paper reports on work that has taken place over the past several years to develop and

share CPs in ways that support improved standardization of patient cohort development and clinical event

definitions across diverse sites for a variety of purposes.

Types of Data for Which CPs Can Be Used

Within source databases, which contain data extracted from EHRs, several types of data are

particularly well suited for identification with CPs. These types of data include vital signs, prescribed

medications, coded diagnoses and procedures, and clinical notes. They may also include International

Classification of Diseases (ICD) codes and Current Procedural Terminology (CPT) codes. What makes

CPs different from the locally developed queries that HIM professionals currently use is that CPs have the

Computable Phenotypes: Standardized Ways to Classify People Using Electronic Health Record Data

potential to be widely reproduced and to be shared as standards that formally specify patient cohorts and

clinical events.

Laboratory data are often used in CPs. For instance, to select a cohort and generate a list of patients

with Type 1 diabetes and not Type 2 diabetes, the results of blood tests may be included in the CP to

improve the accuracy of its criteria for Type 1 diabetes.

When medication data are used in a CP, sometimes pharmacy dispensing data can be helpful to infer

whether a prescribed medication is actually being taken by a patient.

Data from patient surveys and screening questionnaires may also be included in CPs. For example,

the American Diabetes Association recommends screening youth who have difficulty achieving treatment

goals about their mental health, including screening for depression and coping skills.16 Providers collect

these screening data. Once these data are stored in EHR databases, they become observable and

measurable traits that can be included in CPs and queried to identify patient cohorts and classify

individual patients.

Finally, organizational and provider-specific information are additional types of EHR data that can be

used to identify and classify patients.17, 18 These types of data help to distinguish patients by medical

services received19 and by the characteristics of the providers who provide the medical services.20

Common Uses of CPs

CPs have traditionally been used in domains outside of HIM. In clinical research, especially,

investigators regularly use CPs to identify a specific population, or cohort, of patients they wish to study.

With the use of CPs, cohorts are selected systematically according to a specified set of criteria that can be

checked using EHR data.

The effectiveness of using CPs for cohort identification in research is facilitated by the widespread

sharing and dissemination of CPs. Sharing CPs can accelerate their refinement and validation by multiple

healthcare organizations. The Phenotype KnowledgeBase, or PheKB, is an online collaborative

environment that provides tools for editing and improving CPs, as well as a place for storing and sharing

them publicly.21, 22

CPs have become increasingly relevant in this era of network-based clinical research. An example of

network-based research is the Electronic Medical Records and Genomics (eMERGE) national network,

which is organized and funded by the National Human Genome Research Institute. The eMERGE

network brings together data from multiple institutions, combining genetic research with EHR data to

support research on the genetic determinants of disease. A significant aspect of the eMERGE network

includes developing CPs for multiple genetic diseases, including cancer, epilepsy, chronic kidney disease,

and hearing loss.23, 24

CPs are also used by a national research network called PCORnet. Funded in 2014 by the Patient

Centered Outcomes Research Institute, PCORnet comprises 13 smaller regional Clinical Data Research

Networks, 20 disease-specific People-Powered Research Networks, and two Health Plan Research

Networks.25 At the national level, PCORnet has developed CPs to identify clinical obesity and other

medical conditions. PCORnet members must demonstrate their ability to use common CPs for network-

wide cohort identification in support of patient-centered research.26

Two Examples of CPs

Sickle Cell Disease

To provide an example of an actual CP, here we describe a CP for the identification of patients with

sickle cell disease. To develop this CP, Michalik et al. conducted a retrospective study using EHR data

from the Children’s Hospital of Wisconsin.27

The sickle cell disease CP has just two criteria:

4 Perspectives in Health Information Management, Fall 2018

1. ICD-9 diagnosis codes for sickle cell disease or “other sickle cell disease” in the patient’s EHR medical problem or diagnosis lists; and

2. a documented history of two outpatient visits, at least 30 days apart, or one hospitalization, related to sickle cell disease.

The researchers ran queries using the sickle cell disease CP against their institutional research data

warehouse, which contains EHR data, to generate a qualifying list of patients. The resulting list of

patients demonstrated a 99.4 percent positive predictive value (PPV) for confirmed sickle cell disease,

indicating that the CP is very accurate for the identification of this population.28

To further validate these results, the researchers asked a neighboring health system to also run the

sickle cell disease CP. In this second validation, 415 of 433 patients were confirmed to meet the inclusion

criteria, resulting in a PPV of 95.8 percent. This relatively high PPV, in combination with the first PPV,

signaled that the sickle cell disease CP could be more widely adopted and used to identify patients with

sickle cell disease.29 Its developers subsequently made the sickle cell disease CP available to others by

depositing it in the online PheKB database.30

Peripheral Arterial Disease

An example of a more complex CP is one developed by the Mayo Clinic to identify patients with

peripheral arterial disease (PAD).31 The PAD CP consists of five complex criteria, each involving a

variety of data domains. The first criterion is definitive for identifying a patient with PAD. At least two of

the second through fifth criteria can also establish a definitive case of PAD. The five criteria that

constitute the PAD CP are as follows:

1. Ankle brachial index below 0.9 OR ankle systolic blood pressure greater than 255 mmHg as a

result of nonatherosclerotic causes of PAD.

2. A diagnosis code for PAD from one of three ICD-9 code families is found in the EHR. 3. One of these three subcriteria:

a. One of the ICD-9-CM/CPT-4 codes for lower extremity artery angiography plus one concurrent code for noncoronary vessel stents.

b. One of the ICD-9-CM/CPT-4 codes for lower extremity artery surgical and percutaneous vascular interventions excluding the cases when one of the codes for

alternate reasons for this surgery are present.

c. One of the ICD-9-CM/CPT-4 codes for lower extremity amputation excluding the cases when one of the codes for nonvascular amputation is also present.

4. One of these two subcriteria: a. A lower extremity arteries phrase is found through natural language processing

(NLP).

b. An occlusive arterial disease phrase is found through NLP. 5. One of two medications for claudication are prescribed and in use by the individual.

This PAD CP is being used in a research project to identify patients with PAD.

How and Why CPs Are Relevant in HIM

CPs may be relevant for HIM professionals for several reasons. The first reason is the most obvious.

As the use of CPs shifts from clinical research into treatment, payment, and operations, HIM

professionals will likely be asked to work with CPs to standardize their everyday efforts. Because CPs

provide common and precise definitions of diseases, CPs could become part of the general tool kit that all

HIM professionals use for coding, reporting, managing, and sharing clinical information.

In addition, for the purpose of improving coding quality, CPs have the potential to automate

comparisons between what is coded and what the CPs would suggest could be coded. By automatically

Computable Phenotypes: Standardized Ways to Classify People Using Electronic Health Record Data

applying CPs, software applications can become capable of highlighting gaps between what is

documented, what is coded, and the precise definition of a disease or condition in a CP. The use of tools

such as CPs to help standardize and thereby potentially improve the quality of coding has obvious

financial implications.

We anticipate that CPs are likely to be shared by many organizations in the future. If this sharing

occurs and CPs begin to form a standard set of specifications for patient cohorts and clinical events, then

we would expect HIM professionals to have new opportunities to help define CPs and to collaborate with

database and information technology experts to implement and test CPs in HIM practice. This work could

result in significant efficiency gains if CPs can be developed once and used multiple times by many

organizations.

Furthermore, in an age of genomics, disease classification is expected to continue to become more

complex.32–35 Hence, CPs may eventually play a role by directly guiding the coding process as more

diseases come to have clinically relevant genetic variants identified.

CPs are also relevant for HIM record searching and sampling procedures. CPs can be used to discover

and sample records to identify precise populations of individuals who meet the criteria expressed in a CP.

Finally, CPs can support the HIM professional goal of minimum necessary data sharing. With the use

of CPs, it becomes possible to precisely explain why an individual’s documented observations either

include or exclude the individual as a person with a specific disease. For this reason, CPs offer predefined

and limited data that may often be the minimum necessary data to share for the purpose of documenting

or evaluating a specific condition.

Conclusion

CPs are standardized, shareable, and reproducible algorithms that precisely define a condition or

disease. They can be used to guide the development of database queries that result in cohorts of patients

who either do or do not exhibit certain observable and measurable traits. Although CPs have typically

been used to support research using EHR data, they are likely to become tools that HIM professionals will

use in their everyday work. HIM professionals now have the opportunity to begin evaluating how CPs can

support record coding, record sampling, appropriate EHR data sharing, and other aspects of their

professional work. Expansion of the use of CPs could further advance the field of HIM and increase the

value of EHR data in many aspects of treatment, payment, and operations as well as medical/clinical

research.

Lilia Verchinina, PhD, is a data analyst at the University of Michigan Brehm Center for Diabetes

Research in Ann Arbor, MI.

Lisa Ferguson, MSI, is a program manager at the University of Michigan Medical School Department

of Learning Health Sciences in Ann Arbor, MI.

Allen Flynn, PharmD, is a research analyst and technology lead at the University of Michigan

Medical School Department of Learning Health Sciences in Ann Arbor, MI.

Michelle Wichorek, PhD, is a project manager at the University of Michigan Brehm Center for

Diabetes Research in Ann Arbor, MI.

Dorene Markel, MS, MHSA, is the director of the University of Michigan Medical School

Department of Learning Health Sciences in Ann Arbor, MI.

6 Perspectives in Health Information Management, Fall 2018

Notes

1. National Human Genome Research Institute. “Phenotype.” Available at

https://www.genome.gov/glossary/index.cfm?id=152 (accessed January 18, 2018).

2. Richesson, R. L., et al. “Electronic Health Records Based Phenotyping in Next-Generation

Clinical Trials: A Perspective from the NIH Health Care Systems Collaboratory.” Journal of

the American Medical Informatics Association 20, no. e2 (2013): e226–e231.

3. Robinson, P. N. “Deep Phenotyping for Precision Medicine.” Human Mutation 33, no. 5

(2012): 777–80.

4. Richesson, R. L., et al. “Electronic Health Records-based Phenotyping.” Rethinking Clinical

Trials: A Living Textbook of Pragmatic Clinical Trials. 2014. Available at

https://sites.duke.edu/rethinkingclinicaltrials/ehr-phenotyping/ (accessed May 13, 2018).

5. Richesson, R. L., M. M. Smerek, and C. Blake Cameron. “A Framework to Support the

Sharing and Reuse of Computable Phenotype Definitions across Health Care Delivery and

Clinical Research Applications.” EGEMS 4, no. 3 (2016): 1232.

6. Newton, K. M., et al. “Validation of Electronic Medical Record-based Phenotyping

Algorithms: Results and Lessons Learned from the eMERGE Network.” Journal of the

American Medical Informatics Association 20, no. e1 (2013): e147–e154.

7. Mo, H., et al. “Desiderata for Computable Representations of Electronic Health Records-

driven Phenotype Algorithms.” Journal of the American Medical Informatics Association

22, no. 6 (2015): 1220–30.

8. Denny, J. C. “Chapter 13: Mining Electronic Health Records in the Genomics Era.” PLoS

Computational Biology 8, no. 12 (2012): e1002823.

9. Jensen, P. B., L. J. Jensen, and S. Brunak. “Mining Electronic Health Records: Towards

Better Research Applications and Clinical Care.” Nature Reviews Genetics 13, no. 6 (2012):

395–405.

10. Goodrich, D. E., et al. “Sex Differences in Weight Loss among Veterans with Serious

Mental Illness: Observational Study of a National Weight Management Program.” Women’s

Health Issues 26, no. 4 (2016): 410–19.

11. Littman, A. J., et al. “National Evaluation of Obesity Screening and Treatment among

Veterans with and without Mental Health Disorders.” General Hospital Psychiatry 37, no. 1

(2015): 7–13.

12. Janney, C. A., et al. “The Influence of Sleep Disordered Breathing on Weight Loss in a

National Weight Management Program.” Sleep 39, no. 1 (2016): 59–65.

13. Farmer, M. M., et al. “Depression Quality of Care: Measuring Quality over Time Using VA

Electronic Medical Record Data.” Journal of General Internal Medicine 31, no. 1 (2016):

36–45.

14. Richesson, R., et al. “Electronic Health Records-based Phenotyping.”

15. Godlove, T., and A. W. Ball. “Patient Matching within a Health Information Exchange.”

Perspectives in Health Information Management 12 (Spring 2015).

16. Silverstein, J., et al. “Care of Children and Adolescents with Type 1 Diabetes: A Statement

of the American Diabetes Association.” Diabetes Care 28, no. 1 (2005): 186–212.

17. Harrold, L. R., T. S. Field, and J. H. Gurwitz. “Knowledge, Patterns of Care, and Outcomes

of Care for Generalists and Specialists.” Journal of General Internal Medicine 14, no. 8

(1999): 499–511.

18. Krauss, J. C., et al. “Is the Problem List in the Eye of the Beholder? An Exploration of

Consistency across Physicians.” Journal of the American Medical Informatics Association

23, no. 5 (2016): 859–65.

https://www.genome.gov/glossary/index.cfm?id=152

https://sites.duke.edu/rethinkingclinicaltrials/ehr-phenotyping/

Computable Phenotypes: Standardized Ways to Classify People Using Electronic Health Record Data

19. Harrold, L. R., T. S. Field, and J. H. Gurwitz. “Knowledge, Patterns of Care, and Outcomes

of Care for Generalists and Specialists.”

20. Krauss, J. C., et al. “Is the Problem List in the Eye of the Beholder? An Exploration of

Consistency across Physicians.”

21. Kirby, J. C., et al. “PheKB: A Catalog and Workflow for Creating Electronic Phenotype

Algorithms for Transportability.” Journal of the American Medical Informatics Association

23, no. 6 (2016): 1046–52.

22. PheKB. “What Is the Phenotype KnowledgeBase?” Available at https://phekb.org/

(accessed January 18, 2018).

23. McCarty, C. A., et al. “The eMERGE Network: A Consortium of Biorepositories Linked to

Electronic Medical Records Data for Conducting Genomic Studies.” BMC Medical

Genomics 4 (2011): 13.

24. eMerge. “Phenotyping: Cohort Discovery Using EHR Data.” Available at

https://emerge.mc.vanderbilt.edu/phenotyping-cohort-discovery-using-ehr-data/ (accessed

January 18, 2018).

25. PCORnet. “About PCORnet.” Available at http://pcornet.org/about-pcornet/ (accessed

January 18, 2018).

26. PCORnet. “Data Network Request.” Available at http://pcornet.org/data-network-request/

(accessed January 18, 2018).

27. Michalik, D. E., B. W. Taylor, and J. A. Panepinto. “Identification and Validation of a

Sickle Cell Disease Cohort within Electronic Health Records.” Academic Pediatrics 17, no.

3 (2017): 283–87.

28. PheKB. Computable Phenotype for Identification of Patients with Sickle Cell Disease.

Available at https://phekb.org/sites/phenotype/files/Computable Phenotype Description.pdf

(accessed January 18, 2018.

29. Michalik, D. E., B. W. Taylor, and J. A. Panepinto. “Identification and Validation of a

Sickle Cell Disease Cohort within Electronic Health Records.”

30. PheKB. “Phenotype 615: Sickle Cell Disease.” Available at

https://phekb.org/phenotype/sickle-cell-disease-0 (accessed January 18, 2018).

31. PheKB. “Peripheral Arterial Disease – 2012.” Available at https://phekb.org/phenotype/16

(accessed May 14, 2018).

32. Jones, D. T., et al. “Dissecting the Genomic Complexity Underlying Medulloblastoma.”

Nature 488, no. 7409 (2012): 100–105.

33. Choi, M., et al. “Genetic Diagnosis by Whole Exome Capture and Massively Parallel DNA

Sequencing.” Proceedings of the National Academy of Sciences of the United States of

America 106, no. 45 (2009): 19096–101.

34. Curtis, C., et al. “The Genomic and Transcriptomic Architecture of 2,000 Breast Tumours

Reveals Novel Subgroups.” Nature 486, no. 7403 (2012): 346–52.

35. Hinoue, T., et al. “Genome-Scale Analysis of Aberrant DNA Methylation in Colorectal

Cancer.” Genome Research 22, no. 2 (2012): 271–82.

https://phekb.org/

https://emerge.mc.vanderbilt.edu/phenotyping-cohort-discovery-using-ehr-data/

http://pcornet.org/about-pcornet/

http://pcornet.org/data-network-request/

https://phekb.org/sites/phenotype/files/Computable%20Phenotype%20Description.pdf

https://phekb.org/phenotype/sickle-cell-disease-0

https://phekb.org/phenotype/16

8 Perspectives in Health Information Management, Fall 2018

Figure 1

Using Computable Phenotypes to Generate Patient Lists

Copyright of Perspectives in Health Information Management is the property of American Health Information Management Association and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.