Managerial Epidemiology: Week 6
Chapter 14
Molecular and Genetic
Epidemiology
Learning Objectives
• Differentiate between molecular and
genetic epidemiology
• Describe principles of inheritance and
sources of genetic variation
• Define epidemiologic approaches for
the identification of genetic
components to disease
Peeking into the “Black Box”
• Many risk factors can be quantified
through questionnaires, records, and
easily measured attributes (such as
blood pressure and anthropometrics).
• The biological mechanism(s) through
which these factors influence disease
is not always apparent (i.e., a “black
box”).
Value of Mechanistic Insight
• Biologic plausibility is a criterion for
causality.
• Linking lifestyle risk factors with
measures of biologic effect
strengthens interpretations of
causality.
• This linkage, in turn, provides
stronger support for interventions.
Why Distinguish Between
Molecular and Genetic
Epidemiology?
• The basic tenets and principles of
molecular and genetic epidemiology are
the same.
• However, there are specific features
regarding design, analysis and
interpretation inherent in the latter.
Definition of Genetic
Epidemiology
• A discipline that seeks to unravel
the role of genetic factors and their
interactions with environmental
factors in the etiology of diseases,
using family and population study
approaches.
Key Aspects of This Definition
• Inherited susceptibility does not mean
inherited disease--environment
matters!
• When families are studied, the
observations (study subjects) are no
longer independent.
• This dependence requires special
considerations for the analysis of
data.
Genetic Epidemiology is a
Method to Answer:
• Does a disease cluster in families?
• If so, is that clustering likely a result of shared
non-genetic risk factors?
• If the clustering is not accounted for by shared
lifestyle or common environment, is the pattern
of disease consistent with inherited effects?
• If so, where is the putative gene?
What Diseases or Risk Factors
Cluster in Families?
• Heart disease
• Various cancers
• Alcoholism
• Others
Epidemiologic Assessment of
Clustering
• Case-control study
• Comparison of the frequency of a
positive family history
• Expectation under genetic influence
Clustering of “Non-Genetic”
Exposures in Families
• Employment (e.g., several family
members with medical degrees)
• Radon from soil
• Religious preferences
• Lead in paint
• Others?
Major Point of This Section
• You cannot tell easily whether
clustering of a risk factor or disease
within a family is due to genetics,
culture, or shared environment
(including social or political factors).
• Clustering within a family will also
occur simply due to bad luck!
Other Correlates of Family
History
• Large family size
• Age of relatives (for an age-related
disease)
• Gender distribution (consider
testicular cancer, prostate disease,
ovarian cysts)
Analysis Approach
• Model Y (case/control status) =
established risk factors.
• Add family history variable to denote
“genetic” influence (i.e., share genes
with an individual who has the
outcome of interest).
Analysis Issues
• Try to compare (and control if necessary)
differences between cases and controls
with regard to size of family.
• Not easy to adjust for age of family
members or their risk factors.
• What types of data can you ask your
cases and controls to provide about their
relatives?
Motivation for Case-Control
Family Studies
• To rule out influence of shared
environment, family size differences, and
age on differences in the frequency of
family history between cases and
controls
• Need to enumerate the relatives of cases
and controls, and determine the disease
status and risk factor profile for each
relative
Conduct of Family Studies
• Ascertain “probands” (index cases).
• Define family (siblings? children?
parents? grandparents?)
• Invite family members to participate
• Collect data (and, typically, biological
samples)
How to Select Control Families
• Must decide how to identify controls
– From spouse’s side of proband’s family?
– Or select a random sample from the
population?
• Will controls be motivated to
participate?
• Must take HIPAA rules into account
Analysis Issues
• Exclude the index cases and controls
• Model disease (or behavior) of
interest based on age, sex, known
risk factors
• Evaluate evidence for genetic effect
through statistical significance of
variable(s) that indicate “relationship
to index case”
Analysis Issues (cont’d)
• Simplest “genetic” variable (1 if
relative of case, 0 if relative of control)
• Can also construct indicator variables
to designate type of relative (parent,
sibling, more distant relative)
• If not significant after including other
risk factors, then no evidence for
genetic influence
Evidence of Genetic Influence,
so far….
• Cases are more likely to have a family
history of disease than controls.
• The excess risk to relatives is not
accounted for by age, sex, and other risk
factors.
• What does that tell us about the
underlying genetic influence? (nothing)
Other Approaches to Identify
Genetic Influences
• Twin studies
• Segregation analysis
• Linkage analysis
Twin Studies
• A “natural experiment” of sorts
• Monozygotic (MZ) twins are genetically
identical.
• Dizygotic (DZ) twins share, on average,
the same proportion of genes as siblings.
• Greater concordance (for dichotomous
traits) or correlation (for continuous traits)
for MZ than DZ twins is evidence of a
genetic influence.
Linkage Analysis
• One way to distinguish cultural inheritance
from genetic inheritance is to track a
region of our DNA that is transmitted from
parents to offspring in the same manner
as the disease/outcome of interest.
• This procedure works well for diseases
that follow simple rules of inheritance
(e.g., autosomal dominant or recessive).
Segregation Analysis
• Historically, linkage analysis required
knowledge of the mode of
transmission of the putative gene
[dominant versus recessive, allele
frequency, lifetime or age-specific risk
(penetrance)].
• Segregation analysis has been used
to estimate these parameters.
Genetic Epidemiology of
Complex Diseases
• “Complex diseases” are ones for which
the genetic influence may be modest and
environmental factors contribute to
disease risk.
• Segregation analysis is not typically done
for “complex diseases.”
• Modern approaches ignore models of
inheritance (non-parametric methods).
Use of Epidemiology to
Understand Genetic Variation
• The methods of genetic epidemiology
have been applied historically to
identify genes.
• Typically, epidemiologists are not
interested in mapping genes, but
rather in figuring out how genes
interact with environment to influence
disease risk and outcome.
Molecular Epidemiology
• Related individuals are not necessarily required
for studies of the association of genetic
variation with risk of disease.
• Both cohort and case-control designs can be
used.
• Because genetic code (germline DNA) is
unchanged since conception, one readily can
employ retrospective designs.
Common Strategies for Genetic
Marker Selection
• Genome-wide approach with anonymous
DNA markers (1,000,000 SNPs on a chip)
• SNPs or simple tandem repeat markers in
“candidate” genes based on a priori
knowledge about presumed function
• SNPs in candidate genes with known
functional effect on level or activity of
protein product
Primer on Single Nucleotide
Polymorphisms (SNPs)
• Because of our redundant genetic code,
some SNPs will not alter the encoded
amino acid (e.g., GGA, GGG, GGT and
GGC all encode proline).
• SNPs that change an amino acid may not
necessarily lead to change in function of
transcribed protein.
More on SNPs
• SNPs that don’t change an amino acid
may still lead to alternate splicing of the
transcript (and therefore be functionally
important).
• SNPs in promoter region may influence
level of protein product–not activity (and
therefore be biologically significant).
• SNPs in non-coding regions may still have
functional effect.
Caveats About SNP Studies
• If you’re interested in gene x environment
interactions–best to focus on SNPs with known
functional effect.
• Human biology is complex: are alterations in
one component of a pathway compensated for
by another?
• Most SNPs are likely to be modest risk factors–
requiring large sample sizes to determine
statistically significant association.
Realistic Expectations
• Almost every gene is modified after translation
into protein (e.g., glycosylation, acetylation,
methylation).
• Thus, the correlation between DNA sequence
and protein is far from perfect.
• Most GWAS “hits” are in “gene deserts.”
• May be necessary to examine multiple SNPs
within a gene and several genes within a
pathway.
Molecular Epidemiology –
Beyond Genetics
• Biomarkers of exposure and disease
extend beyond DNA.
• Viral or bacterial load
• Morphometric analysis of tissues/cells
• Hormone or lipid levels in blood or
urine
• Other examples?
Conclusion
• Molecular and genetic epidemiology represent
specialty areas of expertise.
• These specialty areas utilize and apply
advances in molecular biology and molecular
genetics of disease to:
– Unravel disease etiology.
– Enable novel approaches for early detection.
– Inform more effective interventions by targeting
those at greatest risk.