Proposal Review

Eveteg
Chapter2HealthCareData.pdf

Chapter 2 Health Care Data

Central to health care information systems is the actual health care data that is collected and subsequently transformed into useful health care information. In this chapter we will examine key aspects of health care data. In particular, this chapter is divided into four main sections:

Health care data and information defined (What are health data and health information?) Health care data and information sources (Where does health data originate and why? When does health care data become health care information?) Health care data uses (How do health care organizations use data? What is the impact of the trend toward analytics and big data on health care data?) Health care data quality (How does the quality of health data affect its use?) Health Care Data and Information Defined Often the terms health care data and health care information are used interchangeably. However, there is a distinction, if somewhat blurred in current use. What, then, is the difference between health data and health information? The simple answer is that health information is processed health data. (We interpret processing broadly to cover everything from formal analysis to explanations supplied by the individual decision maker's brain.) Health care data are raw health care facts, generally stored as characters, words, symbols, measurements, or statistics. One thing apparent about health care data is that they are generally not very useful for decision making. Health care data may describe a particular event, but alone and unprocessed they are not particularly helpful. Take, for example, this figure: 79 percent. By itself, what does it mean? If we process this datum further by indicating that it represents the average bed occupancy for a hospital for the month of January, it takes on more meaning. With the additional facts attached, is this figure now information? That depends. If all a health care executive wants or needs to know is the bed occupancy rate for January, this could be considered information. However, for the hospital executive who is interested in knowing the trend of the bed occupancy rate over time or how the facility's bed occupancy rate compares to that of other, similar facilities, this is not yet the information he needs. A clinical example of raw data would be the lab value, hematocrit (HCT) = 32 or a diagnosis, such as diabetes. These are single facts, data at the most granular level. They take on meaning when assigned to particular patients in the context of their health care status or analyzed as components of population studies.

Knowledge is seen by some as the highest level in a hierarchy with data at the bottom and information in the middle (Figure 2.1). Knowledge is defined by Johns (1997, p. 53) as “a combination of rules, relationships, ideas, and experience.” Another way of thinking about knowledge is that it is information applied to rules, experiences, and relationships with the result that it can be used for decision making. Data analytics applied to health care information and research studies based on health care information are examples of transforming health care information into new knowledge. To carry out our example from previous paragraphs, the 79 percent occupancy rate could be related to additional information to lead to knowledge that the health care facility's referral strategy is working.

Figure 2.1 Health care data to health care knowledge

Where do health care data end and where does health care information begin? Information is an extremely valuable asset at all levels of the health care community. Health care executives, clinical staff members, and others rely on information to get their jobs accomplished. The goal of this discussion is not to pinpoint where data end and information begins but rather to further an understanding of the relationship between health care data and information—health care data are the beginnings of health care information. You cannot create information without data. Through the rest of this chapter the terms health care data and health care information will be used to describe either the most granular components of health care information or data that have been processed, respectively (Lee, 2002).

The first several sections of this chapter focus primarily on the health care data and information levels, but the content of the section on health care data quality takes on new importance when applied to processes for seeking knowledge from health care data. We will begin the chapter exploring where some of the most common health care data originate and describe some of the most common organizational and provider uses of health care information, including patient care, billing and reimbursement, and basic health care statistics. Please note there are many other uses for health information that go beyond these basics that will be explored throughout this text.

Health Care Data and Information Sources The majority of health care information created and used in health care information systems within and across organizations can be found as an entry in a patient's health record or claim, and this information is readily matched to a specific, identifiable patient.

The Health Insurance Portability and Accountability Act (HIPAA), the federal legislation that includes provisions to protect patients' health information from unauthorized disclosure, defines health information as any information, whether oral or recorded in any form or medium, that does the following:

Is created or received by a health care provider, health plan, public health authority, employer, life insurer, school or university, or health care clearinghouse Relates to the past, present, or future physical or mental health or condition of an individual, the provision of health care to an individual, or the past, present, or future payment for the provision of health care to an individual HIPAA refers to this type of identifiable information as protected health information (PHI).

The Joint Commission, the major accrediting agency for many types of health care organizations in the United States, has adopted the HIPAA definition of protected health information as the definition of “health information” listed in their accreditation manuals' glossary of terms (The Joint Commission, 2016). Creating, maintaining, and managing quality health information is a significant factor in health care organizations, such as hospitals, nursing homes,

rehabilitation centers, and others, who want to achieve Joint Commission accreditation. The accreditation manuals for each type of facility contain dozens of standards that are devoted to the creation and management of health information. For example, the hospital accreditation manual contains two specific chapters, Record of Care, Treatment, and Services (RC) and Information Management (IM). The RC chapter outlines specific standards governing the components of a complete medical record, and the IM chapter outlines standards for managing information as an important organizational resource.

Medical Record versus Health Record The terms medical record and health record are often used interchangeably to describe a patient's clinical record. However, with the advent and subsequent evolution of electronic versions of patient records these terms actually describe different entities. The Office of the National Coordinator for Health Information Technology (ONC) distinguishes the electronic medical record and the electronic health record as follows.

Electronic medical records (EMRs) are a digital version of the paper charts. An EMR contains the medical and treatment history of the patients in one practice (or organization). EMRs have advantages over paper records. For example, EMRs enable clinicians (and others) to do the following:

Track data over time Easily identify which patients are due for preventive screenings or checkups Check how their patients are doing on certain parameters'such as blood pressure readings or vaccinations Monitor and improve overall quality of care within the practice But the information in EMRs doesn't travel easily out of the practice (or organization). In fact, the patient's record might even have to be printed out and delivered by mail to specialists and other members of the care team. In that regard, EMRs are not much better than a paper record.

Electronic health records (EHRs) do all those things—and more. EHRs focus on the total health of the patient—going beyond standard clinical data collected in the provider's office (or during episodes of care)—and is inclusive of a broader view on a patient's care. EHRs are designed to reach out beyond the health organization that originally collects and compiles the information. They are built to share information with other health care providers (and organizations), such as laboratories and specialists, so they contain information from all the clinicians involved in the patient's care (Garrett & Seidman, 2011). Another distinguishing feature of the EHR (discussed in more detail in Chapter Three) is the inclusion of decision-support capabilities beyond those of the EMR.

Patient Record Purposes Health care organizations maintain patient clinical records for several key purposes. As we move into the discussion on clinical information systems in subsequent chapters, it will be important to remember these purposes, which remain constant regardless of the format or infrastructure supporting the records. In considering the purposes listed, the scope of care is

also important. Records support not only managing a single episode of care but also a patient's continuum of care and population health. Episode of care generally refers to the services provided to a patient with a specific condition for a specific period of time. Continuum of care, as defined by HIMSS (2014), is a concept involving a system that guides and tracks patients over time through a comprehensive array of health services spanning all levels and intensity of care. Population health is a relatively new term and definitions vary. However, the concept behind managing population health is to improve health outcomes within defined communities (Stoto, 2013). The following list comprises the most commonly recognized purposes for creating and maintaining patient records.

Patient care. Patient records provide the documented basis for planning patient care and treatment, for a single episode of care and across the care continuum. This purpose is considered the number-one reason for maintaining patient records. As our health care delivery system moves toward true population health management and patient-focused care, the patient record becomes a critical tool for documenting each provider's contribution to that care. Communication. Patient records are an important means by which physicians, nurses, and others, whether within a single organization or across organizations, can communicate with one another about patient needs. The members of the health care team generally interact with patients at different times during the day, week, or even month or year. Information from the patient's record plays an important role in facilitating communication among providers across the continuum of care. The patient record may be the only means of communication among various providers. It is important to note that patients also have a right to access their records, and their engagement in their own care is often reflected in today's records. Legal documentation. Patient records, because they describe and document care and treatment, are also legal records. In the event of a lawsuit or other legal action involving patient care, the record becomes the primary evidence for what actually took place during the care. An old but absolutely true adage about the legal importance of patient records says, “If it was not documented, it was not done.” Billing and reimbursement. Patient records provide the documentation patients and payers use to verify billed services. Insurance companies and other third-party payers insist on clear documentation to support any claims submitted. The federal programs Medicare and Medicaid have oversight and review processes in place that use patient records to confirm the accuracy of claims filed. Filing a claim for a service that is not clearly documented in the patient record may be construed as fraud. Research and quality management. Patient records are used in many facilities for research purposes and for monitoring the quality of care provided. Patient records can serve as source documents from which information about certain diseases or procedures can be taken, for example. Although research is most prevalent in large academic medical centers, studies are conducted in other types of health care organizations as well. Population health. Information from patient records is used to monitor population health, assess health status, measure utilization of services, track quality outcomes, and evaluate adherence to evidence-based practice guidelines. Health care payers and consumers are increasingly demanding to know the cost-effectiveness and efficacy of different treatment options and

modalities. Population health focuses on prevention as a means of achieving cost-effective care. Public health. Federal and state public health agencies use information from patient records to inform policies and procedures to ensure that they protect citizens from unhealthy conditions. Patient Records as Legal Documents The importance of maintaining complete and accurate patient records cannot be underestimated. They serve not only as a basis for planning patient care but also as the legal record documenting the care that was provided to patients. The data captured in a patient record become a permanent record of that patient's diagnoses, treatments, response to treatments, and case management. Patient records provide much of the source data for health care information that is created, maintained, and managed within and across health care organizations.

When the patient record was a file folder full of paper housed in the health information management department of the hospital, identifying the legal health record (LHR) was fairly straightforward. Records kept in the usual course of business (in this case, providing care to patients) represent an exception to the hearsay rule, are generally admissible in a court, and therefore can be subpoenaed—they are legal documentation of the care provided to the patients. With the implementation of comprehensive EHR systems the definition of an LHR remains the same, but the identification of the boundaries for it may be harder to determine. In 2013, the ONC's National Learning Consortium published the Legal Health Record Policy Template to guide health care organizations and providers in defining which records and record sets constitute their legal health record for administrative, business, or evidentiary purposes. The media on which the records are maintained does not determine the legal status; rather, it is the purpose for which the record was created and is maintained. The complete template can be found at www.healthit.gov/sites/default/files/legal_health_policy_template.docx.

Because of the legal nature of patient records, the majority of states have specific retention requirements for information contained within them. These state requirements should be the basis for the health care organization's formal retention policy. (The Joint Commission and other accrediting agencies also address retention but generally refer organizations back to their own state regulations for specifics.) When no specific retention requirement is made by the state, all patient information that is a part of the LHR should be maintained for at least as long as the state's statute of limitations or other regulation requires. In the case of minor children the LHR should be retained until the child reaches the age of majority as defined by state law, usually eighteen or twenty-one. Health care executives should be aware that statutes of limitations may allow a patient to bring a case as long as ten years after the patient learns that his or her care caused an injury (Lee, 2002). Although some specific retention requirements and general guidelines exist, it is becoming increasingly popular for health care organizations to keep all LHR information indefinitely, particularly if the information is stored in an electronic format. If an organization does decide to destroy LHR information, this destruction must be carried out in accordance with all applicable laws and regulations.

Another important aspect related to the legal nature of patient records is the need for them to be authenticated. State and federal laws and accreditation standards require that medical record entries be authenticated to ensure that the legal document shows the person or persons responsible for the care provided. Generally, authentication of an LHR entry is accomplished when the physician or other health care professional signs it, either with a handwritten signature or an electronic signature.

Personal Health Records An increasingly common type of patient record is maintained by the individual to track personal health care information: the personal health record (PHR). According to the American Health Information Management Association (AHIMA, 2016), a PHR “is a tool . . . to collect, track and share past and current information about your health or the health of someone in your care.” A PHR is not the same as a health record managed by a health care organization or provider, and it does not constitute a legal document of care, but it should contain all pertinent health care information contained in an individual's health records. PHRs are an effective tool enabling patients to be active members of their own health care teams (AHIMA, 2016).

Patient Record Content The following components are common to most patient records, regardless of facility type or record system (AHIMA, 2016). Specific patient record content is determined to a large extent by external requirements, standards, and regulations (discussed in Chapter Nine). Keep in mind, a patient record may contain some or all of the documentation listed. Depending on the patient's illness or injury and the type of treatment facility, he or she may need additional specialized health care services. These services may require specific documentation. For example, long-term care facilities and behavioral health facilities have special documentation requirements. Our list is intended to introduce the common components of patient records, not to provide a comprehensive list of all possible components. The following provides a general overview of record content and the person or persons responsible for capturing the content during a single episode of care. It reveals that the patient record is a repository for a variety of health care data and information that is captured by many different individuals involved in the care of the patient. Identification screen. Information found on the identification screen of a health or medical record originates at the time of registration or admission. The identification data generally includes at least the patient name, address, telephone number, insurance carrier, and policy number, as well as the patient's diagnoses and disposition at discharge. These diagnoses are recorded by the physicians and coded by administrative personnel. (Diagnosis coding is discussed following in this chapter.) The identification component of the data is used as a clinical and an administrative document. It provides a quick view of the diagnoses that required care during the encounter. The codes and other demographic information are used for reimbursement and planning purposes. Problem list. Patient records frequently contain a comprehensive problem list, which identifies significant illnesses and operations the patient has experienced. This list is generally maintained over time. It is not specific to a single episode of care and may be maintained by the attending or

primary care physician or collectively by all the health care providers involved in the patient's care. Medication record. Sometimes called a medication administration record (MAR), this record lists medicines prescribed for and subsequently administered to the patient. It often also lists any medication allergies the patient may have. Nursing personnel are generally responsible for documenting and maintaining medication information in acute care settings, because they are responsible for administering medications according to physicians' written or verbal orders. History and physical. The history component of the report describes any major illnesses and surgeries the patient has had, any significant family history of disease, patient health habits, and current medications. The information for the history is provided by the patient (or someone acting on his or her behalf) and is documented by the attending physician or other care provider at the beginning of or immediately prior to an encounter or treatment episode. The physical component of this report states what the physician found when he or she performed a hands-on examination of the patient. The history and physical together document the initial assessment of the patient for the particular care episode and provide the basis for diagnosis and subsequent treatment. They also provide a framework within which physicians and other care providers can document significant findings. Although obtaining the initial history and physical is a one-time activity during an episode of care, continued reassessment and documentation of that reassessment during the patient's course of treatment is critical. Results of reassessments are generally recorded in progress notes. Progress notes. Progress notes are made by the physicians, nurses, therapists, social workers, and other staff members caring for the patient. Each provider is responsible for the content of his or her notes. Progress notes should reflect the patient's response to treatment along with the provider's observations and plans for continued treatment. There are many formats for progress notes. In some organizations all care providers use the same note format; in others each provider type uses a customized format. A commonly used format for a progress note is the SOAP format. Providers are expected to enter notes divided into four components: Subjective findings Objective findings Assessment Plan Consultation. A consultation note or report records opinions about the patient's condition made by another health care provider at the request of the attending physician or primary care provider. Consultation reports may come from physicians and others inside or outside a particular health care organization, but this information is maintained as part of the patient record. Physician's orders. Physician's orders are a physician's directions, instructions, or prescriptions given to other members of the health care team regarding the patient's medications, tests, diets, treatments, and so forth. In the current US health care system, procedures and treatments must be ordered by the appropriate licensed practitioner; in most cases this will be a physician. Imaging and X-ray reports. The radiologist is responsible for interpreting images produced through X-rays, mammograms, ultrasounds, scans, and the like and for documenting his or her interpretations or findings in the patient's record. These findings should be documented in a timely manner so they are available to the appropriate provider to facilitate the appropriate

treatment. The actual digital images are generally maintained in the radiology or imaging departments in specialized computer systems. These images are typically not considered part of the legal patient record, per se, but in modern EHRs they are available through the same interface. Laboratory reports. Laboratory reports contain the results of tests conducted on body fluids, cells, and tissues. For example, a medical lab might perform a throat culture, urinalysis, cholesterol level, or complete blood count. There are hundreds of specific lab tests that can be run by health care organizations or specialized labs. Lab personnel are responsible for documenting the lab results into the patient record. Results of the lab work become part of the permanent patient record. However, lab results must also be available during treatment. Health care providers rely on accurate lab results in making clinical decisions, so there is a need for timely reporting of lab results and a system for ensuring that physicians and other appropriate care providers receive the results. Physicians or other primary care providers are responsible for documenting any findings and treatment plans based on the lab results. Consent and authorization forms. Copies of consents to admission, treatment, surgery, and release of information are an important component of the patient record related to its use as a legal document. The practitioner who actually provides the treatment must obtain informed consent for the treatment. Patients must sign informed consent documents before treatment takes place. Forms authorizing release of information must also be signed by patients before any patient-specific health care information is released to parties not directly involved in the care of the patient. Operative report. Operative reports describe any surgery performed and list the names of surgeons and assistants. The surgeon is responsible for documenting the information found in the operative report. Pathology report. Pathology reports describe tissue removed during any surgical procedure and the diagnosis based on examination of that tissue. The pathologist is responsible for documenting the information contained within the pathology report. Discharge summary. Each acute care patient record contains a discharge summary. The discharge summary summarizes the hospital stay, including the reason for admission, significant findings from tests, procedures performed, therapies provided, responses to treatments, condition at discharge, and instructions for medications, activity, diet, and follow-up care. The attending physician is responsible for documenting the discharge summary at the conclusion of the patient's stay in the hospital. With the passage of the Accountable Care Act (ACA) and other health care payment reform measures, organizations and communities have begun to shift focus from episodic care to population health. By definition, population health focuses on maintaining health and managing health care utilization for a defined population of patients or community with the goal of decreasing costs. Along with other key components, successful population health will require extensive care coordination across care providers and community organizations. Care managers are needed to interact with patients on a regular basis during and in between clinical encounters (Institute for Health Technology Transformation, 2012). Needless to say, this will have a significant impact on the form and structure of the future EHRs. These care managers will document all plan findings, clinical and social, within the patient's record and rely on other providers' notes and findings to effectively coordinate care. Baker, Cronin, Conway, DeSalvo,

Rajkumar, and Press (2016), for example, describes a new tool to support “person-centered care by a multidisciplinary team,” the comprehensive shared care plan (CSCP), which will rely on HIT to enable collaboration across settings. A stakeholder group organized by the US Department of Health and Human Services developed key goals for the CSCP as they envision it:

It should enable a clinician to electronically view information that is directly relevant to his or her role in the care of the person, to easily identify which clinician is doing what, and to update other members of an interdisciplinary team on new developments. It should put the person's goals (captured in his or her own words) at the center of decision making and give that individual direct access to his or her information in the CSCP. It should be holistic and describe clinical and nonclinical (including home- and community-based) needs and services. It should follow the person through high-need episodes (e.g., acute illness) as well as periods of health improvement and maintenance (Baker et al., 2016). Figures 2.2 through 2.5 display screens from one organization's EHR.

Claims Content As we have seen in the previous section, health care information is captured and stored as a part of the patient record. However, there is more to the story: health care organizations and providers must be paid for the care they provide. Generally, the health care organization's accounting or billing department is responsible for processing claims, an activity that includes verifying insurance coverage; billing third-party payers (private insurance companies, Medicare, or Medicaid); and processing the payments as they are received. Centers for Medicare and Medicaid Services (CMS) currently requires health care providers to submit claims electronically using a set of standard elements. As early as the 1970s the health care community strived to develop standard insurance claim forms to facilitate payment collection. With the nearly universal adoption of electronic billing and government-mandated transaction standards, standard claims content has become essential. Figure 2.5 Sample EHR lab report

Source: Epic.

Depending on the type of service provided to the patient, one of two standard data sets will be submitted to the third-party payer. The UB-04, or CMS-1450, is submitted for inpatient, hospital-based outpatient, home health care, and long-term care services. The CMS-1500 is submitted for health care provider services, such as those provided by a physician's office. It is also used for billing by some Medicaid state agencies. The standard requirements for the parallel electronic counterparts to the CMS-1450 and CMS-1500 are defined by ANSI ASC X12N 837I (Institutional) and ANSI ASC X12N 837P (Professional), respectively. Therefore, the claims standards are frequently referred to as 837I and 837P.

UB-04/CMS-1450/837I

In 1975, the American Hospital Association (AHA) formed the National Uniform Billing Committee (NUBC), bringing the major national provider and payer organizations together for the purpose of developing a single billing form and standard data set that could be used for processing health care claims by institutions nationwide. The first uniform bill was the UB-82. It has since been modified and improved on, resulting, first, in the UB-92 data set and now in the currently used UB-04, also known as CMS-1450. UB-04 is the de facto institutional provider claim standard. Its content is required by CMS and has been widely adopted by other government and private insurers. In addition to hospitals, UB-04 or 837I is used by skilled nursing facilities, end stage renal disease providers, home health agencies, hospices, rehabilitation clinics and facilities, community mental health centers, critical access hospitals, federally qualified health centers, and others to bill their third-party payers. The NUBC is responsible for maintaining and updating the specifications for the data elements and codes that are used for the UB-04/CMS-1450 and 837I. A full description of the elements required and the specifications manual can be found on the NUBC website, www.nubc.org (CMS 2016a; NUBC, 2016). CMS-1500/837P The National Uniform Claim Committee (NUCC) was created by the American Medical Association (AMA) to develop a standardized data set for the noninstitutional or “professional” health care community to use in the submission of claims (much as the NUBC has done for institutional providers). Members of this committee represent key provider and payer organizations, with the AMA appointing the committee chair. The standardized claim form developed and overseen by NUCC is the CMS-1500 and its electronic counterpart is the 837P. This standard has been adopted by CMS to bill Medicare fee-for-service, and similar to UB-04 and 837I for institutional care, it has become the de facto standard for all types of noninstitutional provider claims, such as those for private physician services. NUCC maintains a crosswalk between the 837P and CMS-1500 explaining the specific data elements, which can be found on their website at www.nucc.org (CMS, 2013; NUCC, 2016).

It is important to recognize that the UB-04 and the CMS-1500 and their electronic counterparts incorporate standardized data sets. Regardless of a health care organization's location or a patient's insurance coverage, the same data elements are collected. In many states UB-04 data and CMS-1500 data must be reported to a central state agency responsible for aggregating and analyzing the state's health data. At the federal level the CMS aggregates the data from these claims forms for analyzing national health care reimbursement and clinical and population trends. Having uniform data sets means that data can be compared not only within organizations but also within states and across the country. Diagnostic and Procedural Codes Diagnostic and procedural codes are captured during the patient encounter, not only to track clinical progress but also for billing, reimbursement, and other administrative purposes. This diagnostic and procedural information is initially captured in narrative form through physicians' and other health care providers' documentation in the patient record. This documentation is subsequently translated into numerical codes. Coding facilitates the classification of diagnoses and procedures for reimbursement purposes, clinical research, and comparative studies.

Two major coding systems are employed by health care providers today:

ICD-10 (International Classification of Diseases) CPT (Current Procedural Terminology), published by the American Medical Association Use of these systems is required by the federal government for reimbursement, and they are recognized by health care agencies nationally and internationally. The UB-04 and CMS-1500 have very specific coding requirements for claim submission, which include use of these coding sets.

ICD-10-CM The ICD-10 classification system used to code diseases and other health statuses in the United States is derived from the International Classification of Diseases, Tenth Revision, which was developed by the World Health Organization (WHO) (CDC, 2016) to capture disease data. The precursors to the current ICD system were developed to enable comparison of morbidity (illness) and mortality (death) statistics across nations. Over the years this basic purpose has evolved and today ICD-10-CM (Clinical Modification) coding plays major role in reimbursement to hospitals and other health care institutions. ICD-10-CM codes used for determining the diagnosis related group (DRG) into which a patient is assigned. DRGs are in turn the basis for determining appropriate inpatient reimbursements for Medicare, Medicaid, and many other health care insurance beneficiaries. Accurate ICD coding has, as a consequence, become vital to accurate institutional reimbursement.

The National Center of Health Statistics (NVHS) is the federal agency responsible for publishing ICD-10-CM (Clinical Modification) in the United States. Procedure information is similarly coded using the ICD-10-PCS (Procedural Coding System). ICD-10-PCS was developed by CMS for US inpatient hospital settings only. The ICD-10-CM and ICD-10-PCS publications are considered federal government documents whose contents may be used freely by others. However, multiple companies republish this government document in easier-to-use, annotated, formally copyrighted versions. In general, the ICD-10-CM and ICD-10-PCS are updated on an annual basis (CMS, 2015, 2016b). Exhibit 2.1 Excerpt from ICD-10-CM 2016 Malignant neoplasms (C00-C96)

Malignant neoplasms, stated or presumed to be primary (of specified sites), and certain specified histologies, except neuroendocrine, and of lymphoid, hematopoietic, and related tissue (C00-C75)

Malignant neoplasms of lip, oral cavity, and pharynx (C00-C14)

C00 Malignant neoplasm of lip Use additional code to identify: alcohol abuse and dependence (F10.-) history of tobacco use (Z87.891) tobacco dependence (F17.-)

tobacco use (Z72.0) Excludes 1: malignant melanoma of lip (C43.0) Merkel cell carcinoma of lip (C4A.0) other and unspecified malignant neoplasm of skin of lip (C44.0-) C00.0 Malignant neoplasm of external upper lip Malignant neoplasm of lipstick area of upper lip Malignant neoplasm of upper lip NOS Malignant neoplasm of vermilion border of upper lip C00.1 Malignant neoplasm of external lower lip Malignant neoplasm of lower lip NOS Malignant neoplasm of lipstick area of lower lip Malignant neoplasm of vermilion border of lower lip C00.2 Malignant neoplasm of external lip, unspecified Malignant neoplasm of vermilion border of lip NOS C00.3 Malignant neoplasm of upper lip, inner aspect Malignant neoplasm of buccal aspect of upper lip Malignant neoplasm of frenulum of upper lip Malignant neoplasm of mucosa of upper lip Malignant neoplasm of oral aspect of upper lip C00.4 Malignant neoplasm of lower lip, inner aspect Malignant neoplasm of buccal aspect of lower lip Malignant neoplasm of frenulum of lower lip Malignant neoplasm of mucosa of lower lip Malignant neoplasm of oral aspect of lower lip C00.5 Malignant neoplasm of lip, unspecified, inner aspect Malignant neoplasm of buccal aspect of lip, unspecified Malignant neoplasm of frenulum of lip, unspecified Malignant neoplasm of mucosa of lip, unspecified Malignant neoplasm of oral aspect of lip, unspecified C00.6 Malignant neoplasm of commissure of lip, unspecified C00.7 Malignant neoplasm of overlapping sites of lip C00.8 Malignant neoplasm of lip, unspecified Source: CMS (2016b). Exhibits 2.1 and 2.2 are excerpts from the ICD-10-CM and ICD-10-PCS classification systems. They show the system in its text form, but large health care organizations generally use encoders, computer applications that facilitate accurate coding. Whether a book or text file or encoder is used, the classification system follows the same structure.

CPT and HCPCS The American Medical Association (AMA) publishes an updated CPT each year. Unlike ICD-9-CM, CPT is copyrighted, with all rights to publication and distribution held by the AMA. CPT was first developed and published in 1966. The stated purpose for developing CPT was to provide a uniform language for describing medical and surgical services. In 1983, however, the government adopted CPT, in its entirety, as the major component (known as Level 1) of the

Healthcare Common Procedure Coding System (HCPCS). Since then CPT has become the standard for physician's office, outpatient, and ambulatory care coding for reimbursement purposes. Exhibit 2.3 is a simplified example of a patient encounter form with HCPCS/CPT codes.

Exhibit 2.2 Excerpt from ICD-10 PCS 2017 OCW Section 0 Medical and Surgical Body System C Mouth and Throat Operation W Revision: Correcting, to the extent possible, a portion of a malfunctioning device or the position of a displaced device Body Part Approach Device Qualifier A Salivary Gland 0 Open 3 Percutaneous X External 0 Drainage Device C Extraluminal Device Z No Qualifier S Larynx 0 Open 3 Percutaneous 7 Via Natural or Artificial Opening 8 Via Natural or Artificial Opening Endoscopic X External 0 Drainage Device 7 Autologous Tissue Substitute D Intraluminal Device J Synthetic Substitute K Nonautologous Tissue Substitute Z No Qualifier Y Mouth and Throat 0 Open 3 Percutaneous 7 Via Natural or Artificial Opening 8 Via Natural or Artificial Opening Endoscopic X External 0 Drainage Device 1 Radioactive Element 7 Autologous Tissue Substitute D Intraluminal Device J Synthetic Substitute K Nonautologous Tissue Substitute Z No Qualifier Source: CMS (2016c).

Exhibit 2.3 Patient Encounter form Coding Standards Pediatric Associates P.A. 123 Children's Avenue, Anytown, USA

Office Visits 99211 Estab Pt—minimal Preventive Medicine—New 99212 Estab Pt—focused 99381 Prev Med 0–1 years 99213 Estab Pt—expanded 99382 Prev Med 1–4 years 99214 Estab Pt—detailed 99383 Prev Med 5–11 years

99215 Estab Pt—high complexity 99384 Prev Med 12–17 years 99385 Prev Med 18–39 years 99201 New Pt—problem focused 99202 New Pt—expanded Preventive Medicine—Established 99203 New Pt—detailed 99391 Prev Med 0–1 years 99204 New Pt—moderate complexity 99392 Prev Med 1–4 years 99205 New Pt—high complexity 99393 Prev Med 5–11 years 99394 Prev Med 12–17 years 99050 After Hours 99395 Prev Med 18–39 years 99052 After Hours—after 10 pm 99054 After Hours'sundays and Holidays 99070 10 Arm Sling 99070 11 Sterile Dressing Outpatient Consult 99070 45 Cervical Cap 99241 99242 99243 99244 99245 Immunizations, Injections, and Office Laboratory Services 90471 Adm of Vaccine 1 81000 Urinalysis w/ micro 90472 Adm of Vaccine > 1 81002 Urinalysis w/o micro 90648 HIB 82270 Hemoccult Stool 90658 Influenza 82948 Dextrostix 90669 Prevnar 83655 Lead Level 90701 DTP 84030 PKU 90702 DT 85018 Hemoglobin 90707 MMR 87086 Urine Culture 90713 Polio Injection 87081 Throat Culture 90720 DTP/HIB 87205 Gram Stain 90700 DTaP 87208 Ova Smear (pin worm) 90730 Hepatitis A 87210 Wet Prep 90733 Meningococcal 87880 Rapid Strep 90744 Hepatitis B 0–11 90746 Hepatitis B 18+ years Diagnosis Patient Name No. Date Time Address DOB Name of Insured ID Insurance Company Return Appointment ___________________________________________________ As coding has become intimately linked to reimbursement, directly determining the amount of money a health care organization can receive for a claim from insurers, the government has increased its scrutiny of coding practices. There are official guidelines for accurate coding, and health care facilities that do not adhere to these guidelines are liable to charges of fraudulent

coding practices. In addition, the Office of Inspector General of the Department of Health and Human Services (HHS OIG) publishes compliance guidelines to facilitate health care organizations' adherence to ethical and legal coding practices. The OIG is responsible for (among other duties) investigating fraud involving government health insurance programs. More specific information about compliance guidelines can be found on the OIG website (www.oig.hhs.gov) and will be more thoroughly discussed in Chapter Nine. Health Care Data Uses The previous sections of this chapter examine how health care data is captured in patient records and billing claims. Even with this brief overview you can begin to see what a rich source of health care data these records could be. However, before health care data can be used, it must be stored and retrieved. How do we retrieve that data so that the information can be aggregated, manipulated, or analyzed for health care organizations to improve patient care and business operations? How do we combine this patient care data created and stored internally with other pertinent data from external sources?

As we discussed previously in the chapter, data need to be processed to become information. We also noted that data and information may be considered along a continuum, one person's data may be another person's information depending on the level of processing required. In this section of the chapter we will focus on the use of data analysis to transform data into information. There is a lot of discussion about the current and future impact of so-called big data on the health care community. We will start the discussion of data analysis by looking at the basic elements required to perform effective health care data analysis, followed by a comparison of “small” data analysis examples to the emerging big data.

Regardless of the scope of the data or the tools used, health care data analysis requires basic elements. First, there must be a source of data, for example, the EHR, claims data, laboratory data, and so on. Second, these data must be stored in a retrievable manner, for example, in a database or data warehouse. Next, an analytical tool, such as mathematical statistics, probability models, predictive models, and so on, must be applied to the stored data. Finally, to be meaningful, the analyzed data must be reported in a usable manner. Databases and Data Warehouses A database generally refers to any structured, accessible set of data stored electronically; it can be large or small. The back end of EHR and claims systems are examples of large databases. A data warehouse differs from a database in its structure and function. In health care, data warehouses that are derived from health care information systems may be referred to as clinical data repositories. The data in a data warehouse come from a variety of sources, such as the EHR, claims data, and ancillary health care information systems (laboratory, radiology, etc.). The data from the sources are extracted, “cleaned,” and stored in a structure that enables the data to be accessed along multiple dimensions, such as time (e.g., day, month, year); location; or diagnosis. Data warehouses help organizations transform large quantities of data from separate transactional files or other applications into a single decision-support database. The important concept to understand is that the database or data warehouse provides organized storage for data so that they can be retrieved and analyzed. Before useful information can be

obtained, the data must be analyzed. In the most straightforward uses, the data from the data stores are aggregated and reported using simple reporting or statistical methods.

Small versus Big Data Data stores and data analytics are not new to health care. However, the scope and speed with which we are now capable of analyzing data and discovering new information has increased tremendously. Big data is not a data store (warehouse or database), nor is it a specific analytical tool, but rather it refers to a combination of the two. Experts describe big data as characterized by three Vs (the fourth V—veracity, or accuracy—is sometimes added). These characteristics are present in big but not small data:

Very large volume of data A variety (e.g., images, text, discrete) of types and sources (EHR, wearable fitness technology, social media, etc.) of data The velocity at which the data is accumulated and processed (Glaser, 2014; Macadamian, n.d.) Harris and Schneider (2015) describe a useful metaphor for explaining the difference between big data and traditional data storage and analysis systems. They tell us to consider “even enormous databases, such as the Medicare claims database as ‘filing cabinets,’ while big data is more like a ‘conveyor belt.’ The filing cabinet no matter how large, is static, while the conveyor belt is constantly moving and presenting new data points and even data sources” (p. 53). They further provide the following examples of questions answered by big versus small data in health care:

What are the effects of our immunization programs? versus Is my child growing as expected? What are some the healthiest regions? versus Is this medication improving my (or my patients') blood pressure? Small Data Examples Disease and Procedure Indexes Health care management often wants to know summary information about a particular disease or treatment. Examples of questions that might be asked are What is the most common diagnosis among patients treated in the facility? What percentage of patients with diabetes is African American? What is the most common procedure performed on patients admitted with gastritis (or heart attack or any other diagnosis)? Traditionally, such questions have been answered by looking in disease and procedure indexes. Prior to EHRs and their resulting databases, disease and procedure indexes were large card catalogues or books that kept track of the numbers of diseases treated and procedures occurring in a facility by disease and procedure codes. Now that repositories of health care data are common, the disease and procedure index function is generally handled as a component of the EHR. The retrieval of information related to diseases and procedures is still based on ICD and CPT codes, but the queries are limitless. Users can search the disease and procedure database for general frequency statistics for any number of combinations of data. Figure 2.6 is an example of a screen resulting from a query for a specific patient, Iris Hale, who has been identified as a member of both the Heart Failure and Hypertension registries.

Many other types of aggregate clinical reports are used by health care providers and executives. Ad hoc reporting capability applied to clinical databases gives providers and executives access to any number of summary reports based on the data elements from patient health and claims records.

Health Care Statistics Utilization and performance statistics are routinely gathered for health care executives. This information is needed for facility and health care provision planning and improvement. Statistical reports can provide managers and executives a snapshot of their organization's performance.

Two categories of statistics directly related to inpatient stays are routinely captured and reported. Many variations of these reports and others that drill down to more granular level of data also exist.

Census statistics. These data reveal the number of patients present at any one time in a facility. Several commonly computed rates are based on these census data, including the average daily census and bed occupancy rates. Discharge statistics. This group of statistics is calculated from data accumulated when patients are discharged. Some commonly computed rates based on discharge statistics are average length of stay, death rates, autopsy rates, infection rates, and consultation rates. Outpatient facilities and group practices, specialty providers, and so on also routinely collect utilization statistics. Some of the more common statistics are average patient visits per month (or year) and percentage of patients achieving a health status goal, such as immunizations or smoking cessation. The number of descriptive health care statistics that can be produced is limitless. Health care organizations also track a wide variety of financial performance, patient satisfaction, and employee satisfaction data. Patient and employee data generally come from surveys that are routinely administered. The body of data collected and analyzed is driven by the mission of the organization, along with reporting requirements from state, federal, and accrediting organizations. Figure 2.6 Sample heart failure and hypertension query screen

Source: Cerner Corporation (2016). Used with permission.

Health care organizations also look to data to guide improved performance and patient satisfaction. Performance data are essential to health care leaders; however, because they are generally managed within a quality or performance improvement department and are not derived from health care data, per se, they will not be discussed in depth in this chapter. A few significant external agencies that report performance data, however, will be discussed in Chapter Nine.

Although each organization will determine which daily, monthly, and yearly statistics they need to track based on their individual service missions, Rachel Fields (2010) in an article published by Becker's Hospital Review provides a list of ten common measures identified by a panel of five hospital leaders, as shown in Table 2.1.

Big Data Examples Health care organizations today contend with data from EHRs, internal databases, data warehouses, as well as the availability of data from the growing volume of other health-related sources, such as diagnostic imaging equipment, aggregated pharmaceutical research, social media, and personal devices such as Fitbits and other wearable technologies. No longer is the data needed to support health care decisions located within the organization or any single data source. As we begin to manage populations and care continuums we have to bring together data from hospitals, physician practices, long-term care facilities, the patient, and so on. These data needs are bigger than the data needs we had (and still have) when we focused primarily on inpatient care.

Big data is a practice that is applied to a wide range of uses across a wide range of industries and efforts, including health care. There is no single big data product, application, or technology, but big data is broadening the range of data that may be important in caring for patients. For instance, in the case of Alzheimer's and other chronic diseases such as diabetes and cancer, online social sites not only provide a support community for like-minded patients but also contain knowledge that can be mined for public health research, medication use monitoring, and other health-related activities. Moreover, popular social networks can be used to engage the public and monitor public perception and response during flu epidemics and other public health threats (Glaser, 2014).

Table 2.1 Ten common hospital statistical measures

Source: Fields (2010).

Daily Monthly Yearly 1. Quality measures, such as Infection rates Patient falls Overall mortality 2. Patient census statistics By physician By service line 3. Discharged but not final billed 4. Point-of-service cash collections 5. Percentage of charity care 6. Percentage of budget spent for each department 7. Door-to-discharge time 8. Patient satisfaction scores 9. Colleague satisfaction scores 10. Market share and service line development As important and perhaps more important than the data themselves are the novel analytics that are being developed to analyze these data. In health care we see an impressive range of analytics:

Post-market surveillance of medication and device safety Comparative effectiveness research (CER) Assignment of risk, for example, readmissions Novel diagnostic and therapeutic algorithms in areas such as oncology Real-time status and process surveillance to determine, for example, abnormal test follow-up performance and patient compliance with treatment regimes Determination of structure including intent, for example, identifying treatment patterns using a range of structured and unstructured and EHR and non-EHR data Machine correction of data-quality problems The potential impact of applying data analytics to big data is huge. McKinsey & Company (Kayyil, Knott, & Van Kuiken, 2013) estimates that big data initiatives could account for $300 to $450 billion in reduced health care spending, or 12 to 17 percent of the $2.6 trillion baseline in US health care costs. There are several early examples of possibly profound impact. For example, an analysis of the cumulative sum of monthly hospitalizations because of myocardial infarction, among other clinical and cost data, led to the discovery of arthritis drug Vioxx's adverse effects and its subsequent withdrawal from the market in 2004. A Deloitte (2011) analysis identified five areas of analysis that will be crucial in the emerging era of providers being held more accountable for the care delivered to a patient and a population:

Population management analytics. Producing a variety of clinical indicator and quality measure dashboards and reports to help improve the health of a whole community, as well as help identify and manage at-risk populations Provider profiling/physician performance analytics. Normalizing (severity and case mix–adjusted profiling), evaluating, and reporting the performance of individual providers (PCPs and specialists) compared to established measures and goals Point of care (POC) health gap analytics. Identifying patient-specific health care gaps and issuing a specific set of actionable recommendations and notifications either to physicians at the point of care or to patients via a patient portal or PHR Disease management. Defining best practice care protocols over multiple care settings, enhancing the coordination of care, and monitoring and improving adherence to best practice care protocols Cost modeling/performance risk management/comparative effectiveness. Managing aggregated costs and performance risk and integrating clinical information and clinical quality measures Health Care Data Quality Up to this point, this chapter has examined health care data and information with a focus on the origins and uses of such. Changes to the health care delivery system and payment reform are amending the ways in which we use health care information. Traditionally, patient clinical and claims records were used primarily to document episodic care or, at best, the care received by an individual across the continuum, as long as that care was provided through a single organization. In today's environment, care providers, care coordinators, analysts, and researchers are all looking to EHRs and electronic claims records as a source of data beyond the episodic scope. Any discussion of health care data analytics and big data include the EHR as a key data source. This expanded use of electronic records and the push for bigger and better data analytics has raised the bar for ensuring the quality of the health care data. Quality

health care data has always been important, but the criteria for what constitute high-quality data have shifted.

There are many operational definitions for quality. Two of the best known were developed by the well-known quality “gurus,” Philip B. Crosby and Joseph M. Juran. Crosby (1979) defines quality as “conformance to requirements” or conformance to standards. Juran (Juran & Gryna, 1988) defines quality as “fitness for use,” products or services must be free of deficiencies. What these definitions have in common is that the criteria against which quality is measured will change depending on the product, service, or use. Herein lies the problem with adopting a single standard for health care data quality—it depends on the use of the data. EHRs evolved from patient medical records, whose central purpose was to document and communicate episodes of patient care. Today EHRs are being evaluated as source data for complex data analytics and clinical research. Before an organization can measure the quality of the information it produces and uses, it must establish data standards. And before it can establish data standards it must identify all endorsed uses of the EHR.

Consider this scenario. EHRs contain two basic types of data: structured data that is quantifiable or predefined and unstructured data that is narrative. Within a health care organization, the clinicians using the EHR for patient care prefer unstructured data, because it is easier to dictate a note than to follow a lengthy point and click pathway to create a structured note. The clinicians feel that the validation screens cost time that is too valuable for them to waste. The researchers within the organization, however, want as much of the data in the record as possible to be structured to avoid missing data and data entry errors. What should the organization adopt as its standard? Structured or unstructured data? Who will decide and based on what criteria? This discussion between the primary use of EHR data and secondary, or reuse, of data is likely to continue. However, to effectively use EHR data to create new knowledge, either through analytics or research, will require HIT leaders to adopt the more stringent data quality criteria posed by these uses. Wells, Nowacki, Chagin, and Kattan (2013) identify missing data as particularly problematic when using the EHR for research purposes. They further identify two main sources of missing EHR data: Data were not collected. A patient was never asked about a condition. This is most likely directly related to the clinician's lack of interest in what would be considered irrelevant to the current episode of care. Few clinicians will take a full history, for example, at every encounter. Documentation was not complete. The patient was asked, but it was not noted in the record. This is common in the EHR when clinicians only note positive values and leave negative values blank. For example, if a patient states that he or she does not have a history of cancer, no note will be made, either positive or negative. For a researcher this creates issues. Is this missing data or a negative value? Although there is no single common standard against which health care data quality can be measured, there are useful frameworks for organizations to use to evaluate health care quality (once the purpose for the data is clearly determined).

The following section will examine two different frameworks for evaluating health care data quality. The first was developed by the American Health Information Management Association

(AHIMA) (Davoudi et al., 2015), the second by Weiskopf and Weng (2013). The AHIMA framework is set in the context of managing health care data quality across the enterprise. The Weiskopf and Weng framework was delineated after in-depth research into the quality of data specifically found within an EHR, as currently used. Common health data quality issues will be examined using each framework.

AHIMA Data Quality Characteristics AHIMA developed and published a set of health care data quality characteristics as a component of a comprehensive data quality management model. They define data quality management as “the business processes that ensure the integrity of an organization's data during collection, application (including aggregation), warehousing, and analysis” (Davoudi et al., 2015). These characteristics are to be measured for conformance during the entire data management process.

Data accuracy. Data that reflect correct, valid values are accurate. Typographical errors in discharge summaries and misspelled names are examples of inaccurate data. Data accessibility. Data that are not available to the decision makers needing them are of no value to those decision makers. Data comprehensiveness. All of the data required for a particular use must be present and available to the user. Even relevant data may not be useful when they are incomplete. Data consistency. Quality data are consistent. Use of an abbreviation that has two different meanings is a good example of how lack of consistency can lead to problems. For example, a nurse may use the abbreviation CPR to mean cardiopulmonary resuscitation at one time and computer-based patient record at another time, leading to confusion. Data currency. Many types of health care data become obsolete after a period of time. A patient's admitting diagnosis is often not the same as the diagnosis recorded on discharge. If a health care executive needs a report on the diagnoses treated during a particular time frame, which of these two diagnoses should be included? Data definition. Clear definitions of data elements must be provided so that current and future data users will understand what the data mean. This issue is exacerbated in today's health care environment of collaboration across organizations. Data granularity. Data granularity is sometimes referred to as data atomicity. That is, individual data elements are “atomic” in the sense that they cannot be further subdivided. For example, a typical patient's name should generally be stored as three data elements (last name, first name, middle name—“Smith” and “John” and “Allen”), not as a single data element (“John Allen Smith”). Again, granularity is related to the purpose for which the data are collected. Although it is possible to subdivide a person's birth date into separate fields for the month, the date, and the year, this is usually not desirable. The birth date is at its lowest practical level of granularity when used as a patient identifier. Values for data should be defined at the correct level for their use. Data precision. Precision often relates to numerical data. Precision denotes how close to an actual size, weight, or other standard a particular measurement is. Some health care data must be very precise. For example, in figuring a drug dosage it is not all right to round up to the nearest gram when the drug is to be dosed in milligrams.

Data relevancy. Data must be relevant to the purpose for which they are collected. We could collect very accurate, timely data about a patient's color preferences or choice of hairdresser, but are these matters relevant to the care of the patient? Data timeliness. Timeliness is a critical dimension in the quality of many types of health care data. For example, critical lab values must be available to the health care provider in a timely manner. Producing accurate results after the patient has been discharged may be of little or no value to the patient's care. Table 2.2 Terms used in the literature to describe the five common dimensions of data quality

Source: Weiskopf and Weng (2013). Reproduced with permission of Oxford University Press.

Completeness Correctness Concordance Plausibility Currency Accessibility Accuracy Agreement Accuracy Recency Accuracy Corrections made Consistency Believability Timeliness Availability Errors Reliability Trustworthiness Missingness Misleading Variation Validity Omission Positive predictive value Presence Quality Quality Validity Rate of recording Sensitivity Validity Weiskopf and Weng Data Quality Dimensions Weiskopf and Weng (2013) published a review article in the Journal of the American Medical Informatics Association that identified five dimensions of EHR data quality. They based their findings on a pool of ninety-five articles that examined EHR data quality. Their context was using the EHR for research, that is, “reusing” the EHR data. Although different terms were used in the articles, the authors were able to map the terms to one of the five dimensions (see Table 2.2):

Completeness: Is the truth about a patient present? Correctness: Is an element that is in the EHR true? Concordance: Is there agreement between elements in the EHR or between the EHR and another data source? Plausibility: Does an element in the EHR make sense in light of other knowledge about what that element is measuring? Currency: Is an element in the EHR a relevant representation of the patient state at a given point in time? Perspective Problems with Reusing EHR Data: Examples from the Literature Botsis, T., Hartvigsen, G., Chen, F., & Weng, C. (2010). Secondary use of EHR: Data quality issues and informatics opportunities. Summit on Translational Bioinformatics, 2010, 1–5.

The authors report on data quality issues they encountered when attempting to use data that originated in an EHR to conduct survival analysis of pancreatic cancer patients treated at a large medical center in New York City. They found that of 3,068 patients within the clinical data warehouse, only 1,589 had appropriate disease documentation within a pathology report. The sample size was further reduced to 522 when the researchers discovered incompleteness of key study variables. Other instances of incompleteness and inaccuracies were found within the remaining 522 subjects' documentation, causing the researchers to make inferences regarding some of the non-key study variables.

Bayley, K. B., Belnap, T., Savitz, L., Masica, A. L., Shah, N., & Fleming, N. S. (2013). Challenges in using electronic health record data for CER. Medical Care, 51(8 Suppl 3), S80–S86. doi:10.1097/mlr.0b013e31829b1d48

The authors conducted research to determine the “strengths and challenges” of using EHRs for CER across four major health care systems with mature EHR systems. They looked at comparing the effectiveness of antihypertensive medications on blood pressure control for a population of patients with hypertension who were being followed by primary care providers within the health systems. Data quality problems that were identified included the following:

Missing data Erroneous data Uninterpretable data Inconsistent data Text notes and noncoded data The authors concluded that the potential for EHRs as a source of longitudinal data for comparative effectiveness studies in populations is high, but they note that “improving data quality within the EHR in order to facilitate research will remain a challenge as long as research is seen as a separate activity from clinical care.” The authors further identify completeness, correctness, and currency as “fundamental,” stating that concordance and plausibility “appear to be proxies for the fundamental dimensions when it is not possible to assess them directly.”

Strategies for Minimizing Data Quality Issues As a beginning point, health care data standardization requires clear, consistent definitions. One essential tool for identifying and ensuring the use of standard data definitions is to use a data dictionary. AHIMA defines a data dictionary as “a descriptive list of names (also called ‘representations’ or ‘displays’), definitions, and attributes of data elements to be collected in an information system or database” (Dooling, Goyal, Hyde, Kadles, & White, 2014, p. 7) (see Table 2.3).

Regardless of how well data are defined, however, errors in entry will occur. These errors can be discussed in terms of two types of underlying cause'systematic errors and random errors. Systematic errors are errors that can be attributed to a flaw or discrepancy in the system or in

adherence to standard operating procedures or systems. Random errors, however, are caused by carelessness, human error, or simply making a mistake.

Consider these scenarios: A nurse is required to document vital signs into each patient's EHR at the beginning of each visit. However, the data entry screen is cumbersome and often the nurse must wait until the end of day and go back to update the vital signs. On occasion the EHR locks up and does not allow the nurse to update the information. This is an example of a systematic error. A physician uses the structured history and physical module of the EHR within her practice. However, to save time she cuts and pastes information from one visit to another. During cutting and pasting, she fails to reread her note and leaves in the wrong encounter date. Although there are some elements of systematic error in this situation (not following protocol), the error is primarily a random error. Effective systems are needed to ensure preventable errors are minimized and errors that are not preventable are easily detected and corrected. Clearly, there are multiple points during data collection and processing when the system design can reduce data errors.

The Markle Foundation (2006, p. 4) argues that comprehensive data quality programs are needed by health care organizations to prevent “dirty data” and subsequently improve the quality of patient care. They propose that a data quality program include “automated and human strategies”:

Standardizing data entry fields and processes for entering data Instituting real-time quality checking, including the use of validation and feedback loops Designing data elements to avoid errors (e.g., using check digits, algorithms, and well-designed user interfaces) Developing and adhering to guidelines for documenting the care that was provided Building human capacity, including training, awareness-building, and organizational change Health care data quality problems are exacerbated by inter-facility collaborations and health information exchange. Imagine standardizing processes and definitions across multiple organizations.

Certainly, information technology has tremendous potential as a tool for improving health care data quality. Through the use of electronic data entry, users can be required to complete certain fields, prompted to add information, or warned when a value is out of prescribed range. When health care providers respond to a series of prompts, rather than dictating a free-form narrative, they are reminded to include all necessary elements of a health record entry. Data quality is improved when these systems also incorporate error checking. Structured data entry, drop-down lists, and templates can be incorporated to promote accuracy, consistency, and completeness (Wells et al., 2013). To date some of this potential for technology-enhanced improvements has been realized, but many opportunities remain. As noted in the Perspective many of the data in existing EHR systems are recorded in an unstructured format, rather than in data fields designated to contain specific pieces of information, which can lead to poor health care data quality. Natural language processing (NLP) is a promising, evolving technology that

will enable efficient data extraction from the unstructured components of the EHR, but it is not yet commonplace with health care systems. A clear example of data quality improvement achieved through information technology is the result seen from incorporating medication administration systems designed to prevent medication error. With structured data input and sophisticated error prevention, these systems can significantly reduce medication errors. The challenge for the foreseeable future is to balance the need for structured data with the associated costs (time and money). Further in the future, new challenges will appear as the breadth of data contained in patient records is likely to increase. Genomic and proteomic data, along with enhanced behavioral and social data, are likely to be captured (IOM, 2014). These added data will introduce new quality issues to be resolved.

Table 2.3 Excerpt from data dictionary used by AHRQ surgical site infection risk stratification/outcome detection

Source: Agency for Healthcare Research and Quality (2012).

Table Field Datatype Description PATIENT Include patients who had surgery that meet inclusion CPT, SNOMED, or ICD-9 criteria between 1/1/2007 and 1/30/2009. PATIENT DOB Date The birthdate for the patient PATIENT PATIENT_ID Integer A unique ID for the patient PATIENT DATA_SOURCE_ID Varchar(10) An identifier for the source of the patient record data (UU, IHC, DH for example) DIAGNOSIS Include ICD-9 CM discharge codes within one month of surgery. A list of included codes is in table 2 of Stevenson et al. AJIC vol 36 (3) 155–164. DIAGNOSIS DIAGNOSIS_ID Integer A unique ID for the diagnosis DIAGNOSIS DIAGNOSIS_CODE Varchar(64) The code for the patient's diagnosis DIAGNOSIS DIAGNOSIS_CODE_SOURCE Varchar(64) The nomenclature that the diagnosis code is taken from (ICD9, etc.) DIAGNOSIS CLINICAL_DTM Date The date and time of the diagnosis's onset or exacerbation MICROBIOLOGY Include all Microbiology specimens taken within one month before or after a surgery. (For risk, this might be expanded to one year or more.) MICROBIOLOGY MICRO_ID Integer A unique ID for the procedure MICROBIOLOGY SPECIMEN_CODE Varchar(64) The site that the specimen was collected from MICROBIOLOGY SPECIMEN_CODE_SOURCE Varchar(64) The nomenclature that the specimen code is taken from (SNOMED, LOINC, etc.) MICROBIOLOGY PATHOGEN_CODE Varchar(64) The code of the pathogen cultured from the collected specimen MICROBIOLOGY PATHOGEN_CODE_SOURCE Varchar(64) The nomenclature that the pathogen code is taken from (SNOMEN, LOINC, etc.)

MICROBIOLOGY COLLECT_DTM Date The date and time the specimen was collected ENCOUNTER Include all Encounters within one month before or after surgery. ENCOUNTER ENCOUNTER_ID Integer A unique ID for the visit. This will serve to tie all of the different data tables together via foreign key relationship. ENCOUNTER ADMIT_DTM Date The admission date and time for a patient's visit ENCOUNTER DISCH_DTM Date The discharge date and time for a patient's visit ENCOUNTER ENCOUNTER_TYPE Varchar(64) The type of patient encounter such as inpatient, outpatient, observation, etc. Summary Without health care data and information, there would be no need for health care information systems. Health care data and information are valuable assets in health care organizations, and they must be managed similar to other assets. To that end, health care executives need an understanding of the sources of health care data and information and recognize the importance of ensuring the quality of health data and information. In this chapter, after defining health care data and information, we examined patient record and claims content as sources for health care data. We looked at disease and procedure indexes and health care statistics as examples of basic uses of the health care data. The emerging use of data analytics and big data were introduced and the chapter concluded with a discussion of two frameworks for examining health care data quality and a discussion of how information technology, in general, and the EHR, in particular, can be leveraged to improve the quality of health care data.