Managerial Epidemiology
Chapter 5
Sources of Data for Use in
Epidemiology
Learning Objectives
• Discuss criteria for assessing the quality
and utility of epidemiologic data
• Indicate privacy and confidentiality issues
that pertain to epidemiologic data
• Discuss the uses, strengths, and
weaknesses of various epidemiologic data
sources
Criteria for the Quality and
Utility of Epidemiologic Data
• Nature of the data
• Availability of the data
• Completeness of population
coverage
– Representativeness
– Generalizability (external validity)
– Thoroughness
• Strengths versus limitations
Nature of the Data
• Refers to the source of data, e.g.,
vital statistics, case registries,
physicians’ records, surveys of the
general population, or hospital and
clinic cases.
• Will affect the types of statistical
analyses and inferences that are
possible.
Availability of the Data
• Refers to investigator’s access to
data.
• For example, medical records and
other data with personal identifiers
may not be used without patients’
consent.
Completeness of Population
Coverage
• Representativeness—the degree to which
a sample resembles a parent population.
• Generalizability (external validity)— ability
to apply findings to a population that did
not participate in the study.
• Thoroughness—the care taken to identify
all cases of a given disease.
Strengths versus Limitations
• The utility of the data for various
types of epidemiologic research.
• Factors inherent in the data may limit
their usefulness.
– Incomplete diagnostic information.
– Case duplication.
Online Sources of Epidemiologic
Data • Online bibliographic databases include
MEDLINE, TOXLINE, and commercial
databases.
• National Library of Medicine’s PubMed®
– MEDLINE is the main part of PubMed®
– Premier source of health-related literature
• TOXLINE—keyed to toxicology and includes
information on drugs and chemicals
Selected Internet Addresses
• American Public Health Association—
http://www.apha.org
• Centers for Disease Control and
Prevention—http://www.cdc.gov
• PubMed®—
http://www.ncbi.nlm.nih.gov/sites/entr
ez
Confidentiality
• Privacy Act of 1974
– Prohibits the release of confidential data without the consent of the individual
• Freedom of Information Act
– Mandates the release of government information to the public, except for personal and medical files
• The Public Health Service Act
– Protects confidentiality of information collected by some federal agencies, e.g., NCHS
The HIPAA Privacy Rule • Refers to the Health Insurance Portability and
Accountability Act of 1996
• Sections of HIPAA “…require the Secretary of
HHS to publicize standards for the electronic
exchange, privacy and security of health
information…”
• Categories of protected health information
pertain to individually identifiable data re:
– The individual’s physical and mental health
– Provision of health care to the individual
– Payment for provision of health care
Data Sharing
• Refers to the voluntary release of
information by one investigator or
institution to another for the purpose of
scientific research.
• Can enhance data quality and increase
knowledge from research.
• Key issue is the primary investigator’s
potential loss of control over information.
Record Linkage
• Joining data from two or more
sources, e.g., employment records
and mortality data.
• Applications include genetic research,
planning of health services, and chronic disease tracking.
Statistics Derived from the
Vital Registration System
• Mortality statistics
• Birth statistics: certificates of birth
and fetal death.
Mortality Statistics
• Mortality data are nearly complete, as most deaths in the U.S. and other developed countries are unlikely to be unreported.
• Death certificates include demographic information about the deceased and cause of death (immediate cause and contributing factors).
Limitations of Mortality Data
• Certification of cause of death.
– For example, in an elderly person with
chronic illness, exact cause of death may be
unclear.
• Lack of standardization of diagnostic criteria.
• Stigma associated with certain diseases, e.g., AIDS, may lead to inaccurate reporting.
Limitations of Mortality Data
(cont’d)
• Errors in coding by nosologist
• Changes in coding
– Revisions in the (ICD) International
Classification of Disease.
– Sudden increases or decreases in a
particular cause of death may be due to changes in coding.
Birth Statistics: Certificates of Birth
and of Fetal Death
• Birth certificate includes information that may affect the neonate, such as congenital malformations, birth weight, and length of gestation.
• Sources of unreliability: – Mothers’ recall of events during pregnancy
may be inaccurate.
– Conditions that affect neonate may not be present at birth.
Birth Statistics (cont’d)
• Varying state requirements for fetal death
certificates.
• Both types of certificates have been used
in studies of environmental influences
upon congenital malformations.
• Both provide nearly complete data.
Reportable Disease Statistics
• Federal and state statutes require health care
providers to report those cases of diseases
classified as reportable and notifiable.
– Include infectious and communicable
diseases that endanger a population, e.g.,
STDs, measles, foodborne illness.
Limitations of Reportable
Disease Statistics
• Possible incompleteness of population
coverage.
– For example, asymptomatic persons
would not seek treatment.
• Failure of physician to fill out required
forms.
• Unwillingness to report cases that carry a social stigma.
Screening Surveys
• Conducted on an ad hoc basis to identify
individuals who may have infectious or
chronic diseases. Examples: breast
cancer screenings, health fairs.
• Clientele are highly selected.
– Individuals who participate are concerned
about the particular health issue.
Multiphasic Screening
• Administration of 2 or more screening
tests during a single screening program
• Ongoing screening programs often are
carried out at worksites.
• Potential biases from worker attrition
• Data can be useful for research on
occupational health problems.
• Data may not contain etiologic information.
Disease Registries • Registry--a centralized database for collection
of data about a disease
• Coding algorithms are used to maintain patient confidentiality.
• Applications of registries:
– Patient tracking
– Identification of trends in rates of disease
– Case-control studies
• Example: SEER program
Surveillance, Epidemiology, and
End Results (SEER) Program
• Conducted by the National Cancer
Institute (NCI)
• Collects cancer data from different cancer
registries across the U.S.
• Provides information about trends in
cancer incidence, mortality, and survival
Morbidity Surveys of the
General Population
• Morbidity surveys collect data on the
health status of a population group.
• Obtain more comprehensive information
than would be available from routinely
collected data
• Example: National Health Interview
Survey
National Health Survey
• Authorized under the National Health Survey Act of 1956 to obtain information about the health of the U.S. population.
• Refers generically to a group of surveys and not a single survey.
• In response to the Act, the National Center for Health Statistics (NCHS) conducts three separate and distinct programs.
NCHS Survey Programs
• National Health Interview Survey (NHIS)
• Health Examination Survey (HES)
• Various surveys of health resources – National Hospital Discharge Survey
– National Ambulatory Medical Care Survey
National Health Interview
Survey (NHIS)
• General household health survey of the
U.S. civilian noninstitutionalized
population
• Studies a comprehensive range of
conditions such as diseases, injuries,
disabilities, and impairments
Health Examination Survey
(HES)
• Provides direct information about morbidity through examinations, measurements, and clinical tests
– Identifies conditions previously unreported or
undiagnosed
– Provides information not previously available
for a defined population
• Now known as the Health and Nutrition Examination Survey (HANES)
Behavioral Risk Factor
Surveillance System (BRFSS)
• Collects data on behaviorally related
phenomena
– Behavioral risks for chronic diseases
– Preventive activities
– Healthcare utilization
• The largest telephone survey in the world
California Health Interview
Survey (CHIS)
• Provides information on the health and demographic characteristics of California residents
• Uses telephone survey methods
• Topics include
– Physical and mental health
– Health behaviors
– Health insurance coverage and utilization
• Conducted on a continuing basis
Insurance Data
• Sources include:
– Social Security--provides data on disability
benefits and Medicare.
– Health insurance--provides data on those
who receive care through a prepaid medical
program.
– Life insurance--provides information on
causes of mortality; also provides results of
physical examinations.
Limitations of Insurance Data
• Data may not be representative of entire population, as the uninsured are excluded.
Clinical Data Sources
• Hospital data
• Diseases treated in special clinics
and hospitals
• Data from physicians’ practices
Hospital Data
• Consists of both inpatient and outpatient data
• Deficiencies of data:
– Not representative of any specific population
– Different information collected on each patient
– Settings may differ according to social class of patients; e.g., specialized clinics, emergency rooms
Diseases Treated in Special
Clinics and Hospitals
• Data cannot be generalized because
patients are a highly selected group.
• Case-control studies can be done with
unusual and rare diseases.
– However, it is not possible to
determine incidence and prevalence
rates without knowing the size of the
denominator.
Data from Physicians’
Practices
• Limited application due to:
– Confidentiality of patient data
– Highly selected group of patients
– Lack of standardization of information
collected
• Useful for the purposes of:
– Verification of self-reports
– Source of exposure data
Absenteeism Data
• Records of absenteeism from work or school
• Possible deficiencies:
– Data omit people who neither work nor
attend school.
– Not all people who are ill take time off.
– Those absent are not necessarily ill.
• Useful for the study of rapidly spreading conditions
School Health Programs
• Provide information about
immunizations, physical exams, and self-
reports of illness
• Have been used in studies of
intelligence, mental retardation, and
disease etiology
• Paffenbarger, et al. used information
from health records of college students
to track causes of chronic diseases.
Morbidity Data from the Armed Forces
• Reports from physicals, hospitalizations, and selective service examinations
• Data have been used for:
– Studies of disease etiology.
• Study of twins serving in Korean War or WWII to
determine influence of “nature and nurture” on
cause of disease.
– Studies investigating genetic factors in obesity
Other Data Sources Relevant
to Epidemiologic Studies
• U.S. Bureau of the Census publications:
– Statistical Abstract of the United States
– County and City Data Book
– Decennial Censuses of Population and
Housing
– Historical Statistics of the United States,
Colonial Time to 1970
U.S. Bureau of the Census
• Provides information on the general, social, and
economic characteristics of the U.S. population
• U.S. Census is administered every 10 years.
– Attempts to account for every person and his
or her residence
– Characterizes population according to sex,
age, family relationships, and other
demographic variables