Molecular Genetics
MCB162: Human Genetics
Goals of class: 1. Explain central concepts of modern human genetics 2. Illustrate how molecular and genomic approaches are changing human genetics 3. Discuss professional practice of human genetics in the era of personal genomics 4. Explore scientific literature and databases
Genetics Timeline
1865 Gregor Mendel's paper – Classical Genetics is born 1869 Miescher isolates DNA (“nuclein”) from white blood cells and other cells 1902 Suton describes the chromosome theory of heredity 1910 Morgan shows that genes reside on chromosomes 1927 Physical changes in genes are called mutations 1931 Crossing over is the cause of recombination 1941 Tatum and Beadle show that genes code for proteins (“one gene, one protein”) 1944 Avery, McLeod and McCarty isolate DNA as the genetic material 1952 The Hershey-Chase experiment proves the genetic information of phages to be
DNA 1953 DNA structure is resolved to be a double helix by Watson and Crick,
with critical assistance from Rosalind Franklin 1959 Lejeune identifies trisomy 21 as the cause for Down syndrome – first evidence
that chromosome abnormalities can underlie human disease 1961 Brenner, Jacob and Meselson show that mRNA ferries genetic information from
DNA to the protein synthesis machinery: central dogma 1961 First systematic screen for a metabolic defect in newborns: phenylketonuria 1966 The genetic code is cracked
Part I: The Pioneers' Years
Part II: The Molecular Genetics Revolution
1972 Cohen and Boyer create the first recombinant DNA 1973 First animal gene (frog) cloned in Bacteria 1976 First biotechnology company founded: Genentech 1977 DNA is sequenced for the first time by Fred Sanger 1978 Riggs and Itakura produce human recombinant insulin and license it to Genentech 1983 Mullis discovers the polymerase chain reaction (PCR) 1984 First disease gene cloned: Huntington’s disease 1984 Jeffreys introduces technique for DNA fingerprinting to identify individuals
genetically 1989 Collins and Tsui sequence the first human gene (CFTR) 1990 First gene therapy trial on a four-year old boy with an immune deficiency 1991 The genetically modified FLAVR SAVR tomato is FDA approved (Calgene) 1995 The genome of Haemophilus influenzae is sequenced 1996 Saccharomyces cerevisiae is the first eukaryote genome sequence to be released 1997 Wilmut clones Dolly the sheep from the cell of an adult ewe 1998 The first genome sequence for a multicellular eukaryote, C. elegans is released 1999 The complete sequence of the first human chromosome (22) is released
2003 Successful completion of Human Genome Project 2004/05 Rat and chicken genomes sequenced. Dog and chimpanzee genomes sequenced;
HapMap project completed; Cancer Genome Atlas project initiated 2006 23 and me, Navigenics are founded; DeCODE Genetics offers personal genetics
test 2007 Shinya Yamanaka reprograms adult human cells to embryonic stem cells
The first two individual human genomes are released. The ENCODE project analyzes the function of 1% of the human genome
2009 Mouse, cow, and corn genome are published. Representative Indian and Korean genomes are released.
2010 First data release from the 1000 Genomes Project. A representative Japanese genome is published. First African genomes are released. The Neanderthal genome is sequenced.
2011 Exome sequencing is applied routinely to many rare human diseases. 2012 The tomato genome is sequenced. Multiple breast, lung, melanoma, leukemia,
kidney cancer genomes are released. ENCODE releases whole human epigenomes.
2014 The RoadMap Epigenomics project will catalogue epigenetic information across tissues and developmental time.
2016 NIH announces 1-million people Precision Health initiative. AstraZeneca announces it will sequence 2 million genomes. Earth Microbiome and Earth BioGenome projects.
Part III: The Genomics Revolution
The era of personal genetics?
The era of personal genetics?
TAS2R38
OCA2 ABCC11
There are genetic variants in these
genes
Each variant is associated with
changes in a trait LCT
• How do you identify genes? • How do you associate genes with traits? • How are genetic variants identified and their effect measured? • How are genetic variants distributed among populations?
• How much of a trait is genetically determined? • What is the heritability of a trait?
The era of personal genetics?
In most cases:
•Genetic analysis leads to the calculation of an “odds ratio”
•The heritability of a trait is not 100% due to environmental influence
• In most cases, traits are polygenic
Normal human karyotype after G-banding of mitotic chromosomes
2
The human genome: 24 chromosomes of varying sizes and shapes
Centromere: Functions in chromosome segregation during mitosis
Telomeres: chromosome ends 3.2 billion base-pairs!
Condensation = 10,000 fold
Fundamental importance of chromatin in DNA compaction and organization
12
Passing the genome on: from cell to cell, person to person
13
Mitosis and meiosis are the two basic events underlying genetic inheritance
MITOSIS
MANY CELL TYPES SAME DNA
ONE PERSON
HIGH SPECIALIZED CELL TYPE NOVEL DNA COMBINATIONS
NEW PEOPLE!
MEIOSIS
PROPAGATION OF DNA
REPLICATE DNA REPLICATE AND RECOMBINE DNA
Mitosis ensures the faithful distribution of DNA at each cell division
G0 (quiescent)
Most cells in the adult body are quiescent except:
• bone marrow • epithelium (intestine) • germ cells (males) • adult stem cells (wound healing, tissue regeneration, maintenance)
Mitosis and meiosis are the two basic events underlying genetic inheritance
Meiosis
Mitosis and
differentiation
Mitosis versus Meiosis
No change in ploidy
haploid gametes from diploid cells
KEY terms: homologues vs. sisters
Meiosis: the essence of heredity
germline
oocyte
Meiosis
zygote fertilization mitosis
differentiation mitosis
sperm
diploidhaploid
Meiosis
Credit: Dr. Neil Hunter
MEIOSIS Generating haploid gametes
from diploid cells
! Crossing-over occurs almost exclusively between homologs (Mom and Dad chromosomes), not between sister chromatids
! Cross over sites are not distributed randomly along chromosomes
! Each chromosome engages in 1-2 cross overs
End of meiosis I
Possible gametes
homologs segregated
recombinant
involved
4-haploid gametes from ldiploid precursor cell
A matter of interpretation…
- Meiotic DNA recombination likely did not evolve to generate genetic diversity by mixing genomes through crossovers.
- Meiotic DNA recombination, instead, ensures the proper segregation of homologues at meiosis I, thereby permitting the formation of haploid gametes.
unclear
sister-chromatid cohesion
Recombination Promotes Chromosome Pairing
Credit: Dr. Neil Hunter
Parental chromosomes = homologues
(shown here post replication)
-
Recombination Promotes Chromosome Pairing
homolog pairing
Recombination Promotes Chromosome Pairing
synaptonemal complex
( chromatid loops
Crossovers Direct Homolog Disjunction
crossing over
Crossovers Direct Homolog Disjunction
mono-oriented sister kinetochores + sister cohesion ensures segregation of homologues
Crossovers Direct Homolog Disjunction
protection of centromere cohesion
cleavage of cohesion
Accurate Segregation Produces Haploid Gametes
euploid gametes
But doesn’t meiosis also produce genetic diversity?
Independent assortment of maternal and paternal homologs at meiosis I produces the first level of genetic diversity
Possible combinations = 223 or 8.4 million possible arrangements
Real life example of how crossover site position and site number can vary
Parent Child
Recombination at crossovers adds another level of genetic diversity
R
R
Birth
In utero development
Throughout life
Puberty
m ei
os is
Oocytes blocked in MI (Diplotene / diakinesis)
m ei
os is
ovulation
Male and female meiosis are very different!
diploid . multiplies
fixed # of
At Not begunoocyte ←Birth-7 meiosis 1- until post replication stage puberty(
f - -
diploid
until oocyte is ovulated .
Itv2 each)
*The timing of meiosis142 are
different inmates -8 females.
0 10 20 30 40 50 0
2×1006
4×1006
6×1006 ~6 million (18-22 weeks gestation)
~1 million (birth)
~300,000 (menarche) ~1,000
(menopause)
Attrition
post-natal Atresia
ongoing Atresia
o o
c y
te /f
o ll
ic le
n u
m b
e r
age (yrs)
Oocyte Numbers Decline Dramatically With Age due to Quality Control Processes
quality control process (most destroyed)
- →
quality control -
E age when Ovulation
begins
Male and female meiosis are very different!
¾ of female meiotic products are discarded: polar bodies
Meiosis II only happens after fertilization
primary oocytes are blocked during meiosisI
Male versus female meiosis
Diploid germ cell population rapidly multiplies during early development
Cells enter meiosis during fetal development
Developing oocytes pause in Mei i I f decade !
Oocytes undergo continuous QC and their numbers plummet
Oocytes complete Meiosis I only upon ovulation
Oocytes complete meiosis II only if fertilized
One oocyte is produced from one diploid germ cell precursor (and three polar bodies)
Cells keep proliferating and create a population of diploid germ cells
Meiosis begins at puberty, continues throughout life (no pause)
A very large number of spermatocytes are produced
Four spermatocytes are produced from one diploid germ
cell precursor
.
.
.
Aneuploidy in gametes:
2% Sperm
15-70% Oocyte (increases with maternal age)
Incidence and Impact of Aneuploidy in Humans
≥25% of all conceptions are miscarried (≥1 million/yr in the U.S.)
Impact of zygotic aneuploidy:
~20% of pre-implantation embryos
35% of spontaneous abortions
4% of stillbirth
0.3% of livebirths
chromosomes 13, 18, 21, X or Y ~12,000/yr in the U.S.
inctfrrect# of chromosomes
wrong # of chromosomes (abnormal)
before implanted on
↳day 2N3 the Uterus
'Yeotthallv If wrong # of chromosome 112,3 , 4. .. embrio can't survive except
z
↳Most gene poor chromosomes
of all , carries the smallest genes
B B B B B B B B B B B B B B B B B B B
B B B B B B
B
B
B
B
B
J J J J J J J J J J J J J J J J J
J J J J J J J
J
J
J
J
J
J
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 0
0.02
0.04
0.06
0.08
0.1
0.12
B Down Syndrome Risk
J All Trisomies Risk
Maternal Age
* Trisomic for chromosomes 21, 13, 18, X or Y - all other trisomies cause miscarriage
Fr eq
ue nc
y of
L iv
eb irt
hs *
Altered Maternal Crossing-Over Is Associated With Infertility, Pregnancy Miscarriage And Chromosomal Diseases
§ Aneuploidy is associated with defective crossing over (in utero) and progressive loss of cohesion.
§ Crossover rates measured in children born to older mothers are higher than in children born to young mothers, implying that advancing maternal age selects for oocytes with more crossovers.
§ Extra crossovers may act to buffer the maternal age effect.
§ Variation in human crossover rate is heritable.
§ Women who inherit higher crossover rates have slightly more children.
in oocytes that are blocked in meiosis1 . when cohesiont pulled the wrong way by spindle fiber
more crossover
insurance less likely to dissolve .(mechanism) ensures homologs tobe more cohesive .
Trisomy 21 diagnosis correlates with maternal age
How is diagnostic performed at 16 weeks?
Why is 16 week diagnostic different from live births?
Why focus on trisomy 21 and not other trisomies?
Maternal
① •non- thVasIV ly w/ Ultrasounds.
• collect DNA from the fetus by collecting amniotic fluidW/ huge needle . (strong risk
② of miscarriage), • Also from MOM'S blood
.
③
②some parents may discontinuethe pregnancy o fetus dies→ Mlscartage °
Parents have experience w/ down syndrome✓ so gets tested earlier .③ o represents 971. of trisomy in humans . ° most tolerated trisomy
Key terms (slide set 1): Chromosome structure � Telomere, centromere, metacentric, acrocentric
Mitosis DNA replication Sister chromatids Homologues Meiosis Crossovers Chromosome segregation Independent assortment Haploid versus diploid Euploid gametes Oocytes, polar bodies Aneuploidy Non-disjunction Trisomy
Slide set 2: Organization of the human genome
The Human Genome Project (HGP)
1988: Human Genome Organization is founded to coordinate international efforts aimed at mapping and sequencing the human genome
1990: HGP is presented to Congress as a 15-year plan
1992: Low-resolution genetic linkage map of entire human genome published 1994: Completion of second-generation DNA clone libraries representing each human
chromosome by LLNL and LBNL. Genetic-mapping achieved 1 year ahead of schedule
1997: Physical maps – high-throughput DNA sequencing 1998: Celera Genomics formed to sequence much of human genome in 3 years using HGP-
generated resources 1999: Chromosome 22 is sequenced. 1 billion base-pair mark 2000: Chromosomes 5, 16, 19 and 21 are sequenced
President Clinton announces the completion of a working draft of the human genome by the NIH and Celera Genomics
President Clinton signs executive order prohibiting federal departments and agencies from using genetic information in hiring or promoting workers
2001: Publication of Initial Working Draft Sequence February 12, 2001 2003-4: “Final” draft of the human genome.
→ 13N14years .
→ genetic non-discrimination .
Nature (2001) vol 409
Science (2001) vol 291
Nature – Sept 2004
Still missing: ~200 million bp of highly repetitive peri-centromeric and centromeric DNA
Heterochromatin Sequence of the human genome in book format at the National Human Genome Research Institute
BTW: who’s genome was it?
-Still missing about 200 million base pairs of Peri-centromeric and Centromeric DNA -we don’t know if any variance of any change still trying to figure out : where is the variation at and why it matters???
(a
public→ anonymous
private→Craig venter (owner)
& his dog
Genome of DNA Discoverer Is Deciphered (NY Times) By NICHOLAS WADE Published: June 1, 2007 The full genome of James D. Watson, who jointly discovered the structure of DNA in 1953, has been deciphered, marking what some scientists believe is the gateway to an impending era of personalized genomic medicine. http://jimwatsonsequence.cshl.edu
hid atzhemier gene to protect his children .
Human Genome Organization: Genes, “junk”, and repeats…
(circa 2010)
The human genome carries ~20,000 genes
Class discussion: • how would you discover & annotate genes? • why is this even a question?
(exons)
-How much of our genomes code for proteins? - 1.1% -4% codes for RNA genes: tRNA; rRNA; siRNA; snRNA -45% are transposons --> selfish parastic RNA elements and want to make more copies of themsleves -6% is heterchromatin
How would you discover & annotate genes? -Find promoter -coding regions -Homology searches: important genes have evolved over a long period of time so it will be largely conserved
Orthologues: sequencein the human genome that are homologous or conserved with other genes
RNA-sequence or transcript domix -extract on it, bring it back to DNA, sequence, and map it back to human genome -which human genomes are actually transcribed into RNA, because the idea is that if it is transcribed into RNA are likely to be a gene -will only be able to catch genes expressed in the specific tissue -its is cell type specific and there are issues with sensitivity
① knockout→ phenotype - RNAsequencing→ house keeping -
CHIP-seq
②Computational -
look for conservation in otherspecies genomes - look for evidence of protein coding there
From Nature (2001), 409:860-921
“Gene” (i.e. protein-coding) density varies among human chromosomes
“Gene-rich”
“Gene-poor”
-"Gene" protein-coding density varies among human chromosomes --> Ex. 19 is gene rick while 18 is gene poor
1St publication . -
gene information IS NOT uniform. (density)
fewest
density
Gene density varies along each human chromosome together with GC content and CpG density
• GC percent correlates gene-rich regions • G-banding (from Giemsa staining) correlates with AT-rich, gene-poor regions
Chr17
CpG islands are GC-rich, CpG dense regions (~ 1 kilobase) that are highly enriched at the 5’-end of genes
• CpG islands serve as promoters for 60% of human genes, particularly “housekeeping” genes
Human genome browser: http://genome.ucsc.edu
light→ Go rich dark→ AT rich
stained by DNA bindingdie . ↳ likesto bind AT regions
# I class ofpromoters in the human housekeeping genes. C60%)
Human genome assembly (version)
coordinateschromosome
Position of gene
Filled boxes = EXONS (protein-coding)
Last exon and 3’UTR
Typical structure of a human gene
BRCA2
First exon and 5’UTR
CpG island promoter
Thin line (no box) = INTRONS Orange signal above gene is from mRNA-seq. Introns are spliced out.
BRCA2 protein-coding sequence space: 10.2 kilobase BRCA2 genomic sequence space: 84.2 kilobase
BRCA2 : found in breast& ovarian cancer
Cpa island overlaps the 1st exon when mutated > essential for life; important In DNA repair
orange signals for exons (from mRNA
- seq)
-Introns are transcribed THEN
spliced
exon's
what are introns ? 70184 are introns
→→ →transpoohs : repeating units
right④Strand
“Average” structure of human protein-coding gene: • 27 kb of genomic sequence • 9 exons (122 bp average) • Encodes for a 2.6 kb mRNA • Including 5’-UTR (250 bp) and 3’-UTR (750 bp) • Average human protein (530 AA).
Transcription / Splicing Capping / Polyadenylation
CAP AAAAAAAA
Huge variation in gene organization and size • Gene size: 0.4 – 2,400 kb • Exon number: 1 – 363 • Exon size: <10 bp – 7.6 kb • Intron size: 20 bp – 100s of kb • polypeptide: 10 - >38,000 AA (Titin)
-"Average human-protein-coding gene" -structure includes a 5' UTR, then exons, and then 3' UTR after transcription/splicing it includes a cap and poly-A tail -usually 27kb of genomic sequence -9 exons -encodes for a 2.6 kb mRNA -average human protein (530 AA) -thin and then thick exon means the thin part isn’t coded anymore (3' untranslated region)
-where does transcription happen? —inside the nucleus
-where does splicing occur? —happens during transcription!!
so in the nucleus -mature mRNA leaves the nucleus -where does BRCA2 live?
—part of DNA repair so in the nucleus
- structure includes
5 ' UTR→ exons→ 3
' UTR
after transcription it includes a
cap & poly A -tail
° thin→ thick exon means the thin part isn't coded anymore ( 3 ' UTR)
Nuclear vs. mitochondrial genomes
3.2 billion bp ~16,500 bp
Exons ONLY!
Structure of the mitochondrial genome
Mutations in mtDNA can lead to several human diseases
Leber’s Hereditary Optic Neuropathy (LHON)
- presents in midlife as acute or subacute central vision loss leading to central scotoma and blindness
- disease results from a defect in the respiratory chain
- Maternal inheritance in familial cases
Why so few genes in the mitochondrial genome?
Recap: RNA and protein transport in eukaryotes.
Who codes for mitochondrial gene?
→ The nuclear genome .
-
-
-
#
> mitochondrial mutation .
From degeneration of optic nerve
→ Most of genes that code for mitochondrial protein are located in the nucleus .
cristae There are alot of mitochondrial genes that are made in the nucleus
.
nucleus (transcribes the gene in the nucleus) •He •• where does transcription happen?
• →- - inside the nucleus
⑥ (Mito
of splicing ?
I ⑧protein Mitochondria - Also nucleus b/c it happensneed to be during transcription . 7 Imported backRibosome /
mitochondria
→ Most of genes that code for mitochondrial protein are located in the nucleus .
WHY ? 1. Mitochondria = hostile environment b/c it is generating alot of Oz species which is toxic for DNA
Ribosomal RNA genes - Ribosomal RNA is a central component of the ribosome
- Plays both a structural and a catalytic role: the peptidyl-transferase activity responsible for adding an AA to a growing polypeptide chain is catalyzed by rRNA!!
- ~2 million ribosomes/cell - 75% of transcriptional output! - rRNA are transcribed by a specialized RNA polymerase: RNA Pol I - rRNA transcription happens in a specialized nuclear sub-compartment: the
nucleolus
Protein-coding genes are “easy” to identify (~20,000 in total) Non-coding RNA genes are much more difficult to identify The human genome encodes at least 5-6,000 RNA genes
Red: RNA Blue: Protein
Large ribosomal subunit from H. marismortui (from Moore and Steitz. Annu Rev Biochem. 2003;72:813-850
Staining of HeLa cells with Anti-nucleolin, the most abundant protein in the nucleolus
-
↳no membrane .
nucleolus
2
Telomeres: chromosome ends rDNA repeats
• rDNA repeats localize to p arm of acrocentric chromosomes (13p12, 14p12, 15p12, 21p12, 22p12)
• Each chromosome carries hundreds of copies of tandemly repeated rRNA genes
p-arm
]
↳p-arms very short
why so many ? b/c we make so much
Acrocentric chromosomes are sometimes involved in Robertsonian translocations
Translocation: a chromosome translocation is a chromosome abnormality caused by rearrangement of parts between nonhomologous chromosomes
Carriers are phenotypically normal
When there
is a short arm
& a long arm .
NO
shredded ( centromere
Carriers of Robertsonian translocations often produce unbalanced gametes
Higher risks for Down syndrome & reproductive issues
Depending on HOW homologs pair .
Problems begin
PSEUDOGENES
• Initially thought to represent non-functional (dead) copies of gene parts
Why do we care?
• Pseudogenes occupy a significant portion of our genome
• Pseudogenes represent interesting events of genome evolution and variation
• Pseudogenes can be functional in some (rare) instances (undead genes?) • Some pseudogenes are transcribed • Some pseudogenes code for proteins
Pei et al., (2012) Genome Biology
-Why do we care? --> occupy a significant portion of our genome --> represent interesting events of genome evolution and variation --> can be functional in some rare instances
But we know they can code for proteins
- -
Examples of clustered non-processed pseudogenes
Non-processed pseudogenes - Result from the duplication of (parts of) the genomic sequence of a gene
(includes both exons and introns)
Two forms of pseudogenes reflect the mechanisms by which they formed
Adult hemoglobin: a2b2 >97%Fetal hemoglobin: a2g2 Fetal hemoglobin (yolk sac): z2e2 Adult hemoglobin: a2d2 <3%
—occurs at the level of DNA that will be copied somewhere else in the genome ex. alpha-globin cluster -pseudogenes arise from copies that got changed slightly ex. NF-1 gene -mutations in NF1 can cause Neurofibromatosis type 1
①
-
higheraffinity for oxygen . allowsfetusto grab02 frommom's It
Examples of dispersed non-processed pseudogenes
Mutations in NF1 can cause Neurofibromatosis Type 1 (among other diseases): - autosomal dominant disorder characterized by cafe-au-lait spots, Lisch nodules in the eye,
and fibromatous tumors of the skin. - Individuals with the disorder have increased susceptibility to the development of benign and
malignant tumors. - The worldwide incidence of NF1 is 1 in 2,500 to 1 in 3,000 individuals
Psedogenes arises from copies that
changed slightly ex) NF-t gene
Processed pseudogenes - Result from reverse-transcription of mRNAs
(only carry exons – no introns)
2. processed -result from reverse-transcritpion of mRNAs -only have exons -NO introns -result from RT of abundant mRNAs
②
Processed pseudogenes often originate from highly transcribed genes - Result from reverse-transcription of abundant mRNAs
Pei et al., (2012) Genome Biology
Highlyexpressed =most likely to end up as a processed pseudogene
KEY terms (slide set 2): • Gene structure • GC content / gene density • CpG island • Heterochromatin • Mitochondrial genome • Ribosomal DNA array • Nucleolus • Translocation (incl. balanced translocation) • Un-processed pseudogenes • Duplication • Processed pseudogenes • Reverse-transcription
Human Genome Organization: of “junk” and repeats…
10113120
Tandem repeats
Satellite repeats: repeated millions of times in tandem over peri-centromeric regions (5 – 170 bp in length per repeat)
Mini-satellite repeats: 10-100 bp, dispersed through genome, hyper- variable between individuals. Used for genetic testing (forensics, paternity). Includes telomeric repeats.
Micro-satellite: arrays of simple repeats, dispersed throughout. Most frequent: (CA)n, (AT)n, An
A. Paternity test (Mother, Child, Father 1, Father 2) B. Forensics
motheredgamers -
↳ In an arrangement head to tail w/ arrays of these repeats .
distinguished by lengths &positions on chromosomes
- largest - corresponds to DNA sequence either at the centromere
or around the region a
similar but smaller - Not localized
-
-
anywhere onany -
chromosome
- commonly used for genetic testing
- Main distinguish point = size .
*satelites take up 6.5 '
l. of our genome because there are SO MANY repeats.
Tandemly repeated DNA at centromeres
Note: rDNA array (most actively transcribed region of the genome) is sandwiched between silent, heterochromatic satellite repeats Nucleolus
DAPI-dense heterochromatin
5 week mouse liver cell Blue = DAPI = DNA Green = euchromatin
InaChrocentric DNA -
rDNA mostheavily transcribed region of the genome
dark blue -regions Ihave a lot
ofsatellite repeats
-LINE-1 RETROTRANSPOSITION MECHANISM -dont know for sure -LINE is somewhere in the genome 1. has its own promoter; generatess a trasncript(copy of genome) 2. transcript leaves the nucleas and make proteins (ORF1 3. ORF1 catches its own mRNA 4. makes a RNP with ORF 2 5. goes back in the nucleus 6. ORF2 will find a region w TTTTT and nicks them 7. the AAAAs of the polyA will pair 8. RT will start making the RNA; makes a DNA copy -our cells fight against this; often the RT is blocked before the end -reflects attempts to go back in the genome -can create somatic mosaicism (different cells) -typically thought to occur in parental germline -new data suggests that somatic tissues are targeted instead
-WHICH TISSUE ARE LINE MOST ACTIVE IN -germ cells !!
-INTERSPERSED REPEATED ELEMENTS: LINEs and SINEs -parasitic transposable elements -make up half of our genome 1. LINE(Long Interspersed Nuclear Element): -autonomous -codes for genes and proteins - LINE-1 is the most successful -typical form: has its own promoter; 2 genes; and makes a transcript that is protein coding 2. SINE (Short Interspersed Nuclear Element): -non-autonomous -do not code for anything -Alu family is most successful -need the RT from LINE 3. retrovirus-like -HIV -originate from a virus -autonomous -sometimes have mutations that make them non autonomous 4. DNA transposon fossils -work by cut paste -don’t need RT -occurred millions of years ago; no longer in our genome
Interspersed repeated elements: LINEs and SINEs (not to scale)
6 kb
280 bp
4 - ReaFeats are NOT tendon
451. ' spreadout in the genome
(big)→ I mRNA long interspersed nuclearelement wyz open reading frame (ORF)
promoter region
-can initiate transcription.
enzymatic activity 1
Us ,%PE→DNA Short 4
SINES (much smaller)
(typeofSINES)
weird case If poly - ATail processed
non-codingMa pseudogenes that was
ate made
captured by from mRNA reverse transcriptase &became a dimer
Transposons - DNA element thathave the ability tomakemany
copiesof itself & invades the genome of thehost . ('selfish-DNA") genomic parasite
Dispersed DNA repeats
% genome
21%
13%
8%
3%
This element can transpose which ( make more copies of itself) all on its own because it carries the key gene pol which stands for a reverse transcriptase
Can’t make copies of themselves by themselves they need help
-non-coding – Key activity needed = reverse transcriptase SINE depends on LINES
Envelope gene: lost compared to retrovirus which
makes capsids Most likely from a retrovirus
infection which lost the ability to make a capsid which
means they became endogenous genomics
parasites they stay inside ourselves and doesn’t make
viral particles
It’s a different version of the left but went
through deletion and no longer can make
copies of themselves
-Called fossils because they invaded our genome
years and years ago. -It’s still can be
recognized in our genome
Just like the retroviruses there are deleted versions of these transposons.
No longer able to make copies of themselves because they’re considered fossils. Transposes chain that was originally there has completely decayed and it’s no longer able to function properly
Most (599,900) are actually mutated & have decayed to the point that they can no longer make copies. There are only about 100 that are considered hotline. That can move around(hopping) , make copies ,and have full coding potential.
\
> Like a
Long Terminal repeats G promoter -- -
V -
LINE-1 retrotransposition mechanism
Beck et al., 2011 Ann Rev Genomics Human Genet
RNA binding protein that likes to bIND into it it’s OWN RNA. (Highest affinity for its own RNA) What it does is it recruits ORF2p at the end of ORF1 coded RNA.
endonuclease looks for genomes part with a bunch of A's & it will cleave to expose the Poly A tail of the RNA & poly T tail of the DNA. Now this creates a hook between RNA in one of the DNA strands
In cleaving a three prime hydroxyl is created which can be extended by polymerase.
insertion event of a new LINE 1 Into a different chromosome
How do these repeats Hop ? -
startshere !
* One of the 100 hotlines
LINE 1
copying Itself
= Target prime reverse transcription
poly-A-Tail
exported ° "
t d
'
byRibosome
-
-
LINE 1
copying
something But could make mistake & bind to other mRNA
.
else
If you are a SINE element or an ALU element or a viable number tenant repeat ( VNTR) ORF1p can make a mistake occasionally an will bind to for ex. An alu element ORF1p will bring ORF2p with it and it will reverse transcribes an ALU RNA then the ALU RNA will integrate itself in the same mechanism somewhere in the genome. If ORF1P binds to a cellular mRNA it will make a process pseudogene. ( when proteins from the LINE 1 element jump onto one of our own mRNA’s and reverse transcribe this back into DNA
Beck et al., 2011 Ann Rev Genomics Human Genet
Effects of LINE-1-mediated retrotransposition on the human genome
Ex: phosphorylase kinase deficiency, Alport syndrome, and Ellis–van Creveld syndrome
Ex: Hemophilia – Factor VIII insertion
Impact on human diseases
Def causes mutation for the gene most likely making it Non functional
Even if line 1 and integrates into an intron. It can alter the splicing pattern of the gene
Our genome is good on recognizing Homology And at triggering DNA recombination between homologous repeats. LINE1 on chromosome X improperly base pairs and engages in homologous recombination with another LINE 1 on a different chromosome. Could have created a translocation.
1. Could start expressing a gene in the middle of the gene. Effects are the true mental and could create proteins that have a dominant negative affect on the original proteins function
2. could cause transcription to end prematurely. Will not be able to produce before protein
3. The region LINE 1 was inserted become silenced and it be and it spreads can lead to inappropriate silencing of the human gene.
EX of how hopping can be BADfor us.
can effectively break the-
gene inserting into
l
S crossing over
2 normal
q ,
I e- region patterns where
altered of spring transcription
Stops 3
leading to inversion&
huge structural rearrangements.
Don't coagulate easily
4 BYOOD coating
LINE-1 retrotransposition can create somatic mosaicism
LINE-1 retrotransposition was typically thought to occur in parental germline.
New data suggests that somatic tissues are targeted instead, especially neurons.
Impact on cancer is known.
Where do these transpones Hop & make More copies ? ① = events of Hopping
Iv creates
cells that are now
no longer genetically identical
.
-can target developing tissues or even
adult tissues - Very active in Historically adult neurons LINE 1 promoter
MOST active In *Right after fertilization
→
germ cells & in & will be Inherited the zygote to every single cell BUT can also occur In .
Defense mechanisms EXIST ! ! somatic tissue→ creating mosaicism .
L1 transposition can mediate exon shuffling
• 21% of L1 surveyed contained non-L1 DNA • Length of carried over DNA varied from 30-970 bp • Up to 1% of our genome might have been generated in this way
Also mediates pseudogene formation, sometimes retrogene formation
Are repeats just “junk” or do they play a biological role? GOOD Different from Trash .
(exon transduction) (Take exon from 1 gene & insert it in another
) NIFTYan"eye Bwuotnsosmfptime
but continue
30km illion basepairs
could lead to the evolution of NEW
functions in the human genome .
Gene B gained an exon (E3) from gene A IS shuffled into geneB will make a new protein that ISbad or develop a new function .
(could acquire new skill of exon)
Are all repeated DNA sequences “junk”?
Alu’s might boost gene expression in conditions of stress
In general: repeats are strong drivers of evolution: • Promote genomic rearrangements, exon shuffling, retrogene formation • Contribute new genes: RAG1 and 2, telomerase • Contribute regulatory signals (promoters, transcription terminators, binding sites)
Schmidt et al. (2012) Cell.
CTCF, a key DNA binding Zinc finger protein involved in organizing mammalian genome in well-defined chromatin loops shows both conservation in mammals and evidence for lineage-specific expansions.
Vestiges of repeated elements (SINEs) can often be found next to these expanded CTCF binding sites suggesting that these repeats are involved in “spreading” new CTCF sites.
process pseudogene or
die w/o
- (adaptive immunesystem)
conserved
: -
Transposons present
Transposable Elements: An Abundant and Natural Source of Regulatory Sequences for Host Genes
Lynch (2016) Science.
Gradual accumulation of rare mutations that create new binding sites
Rapid spreading of pre-existing binding sites through transposable mobile elements
Gene 1 Gene 2
Gene 3
Gene 1
Gene 2
Gene 3
HOW do you evolve transcription networks ?
evolved binding sites
mutation frequency = low Transposonshave binding sites Hops everywhere & makes. copies of itself Including some that land around the
promoter region f-open regions) NOW the 2 transcription factors
onlyneedto acquire a fewmutations to learn that the binding sites are present . C learn howto recognize) - can spread DNA binding sites.
Science (2016)
Two examples of the impact of transposons on the evolution of gene networks involved in major evolutionary innovations
Cell Reports (2015)
Remote control action by TEs
Judd and Feschotte. eLife 2018
Induction or silencing of LTR5HS elements leads to changes of expression of thousands of genes, including reciprocal changes in hundreds of them
Breaking news: October 2020
New exon 2
• New (Cryptic) exon 2 codes for an extended protein
• New protein is more stable and is the one responsible for male sex determination
• New exon is derived from retrotransposon!
④ Insertto
of transposons
NOTfunctional .
The X and the Y X Is larger than
Y
Structure of the human X and Y chromosomes
heterochromatic
euchromatic
Sex-specific portion of X
X chromosome: • 1,000 genes (gene-poor) • Enriched for germ cell-
specific genes and neuro- developmental genes
• Depleted in CpG island promoters and housekeeping genes
• Highly enriched for LINE- 1 retrotransposons
Y chromosome: • 38 protein coding genes • Numerous pseudogenes
X-linked traits: • Are more likely to present in males • Are often more severe in males • Are often inherited from
asymptomatic mother
neurodeveloplemental genes on X. Autism is 3 times more likely in females than males.
depletion in housekeeping genes
LINE-1 transposons on X is 2-3 times higher than on an autosome
small amt of coding genes. Lots of pseudogenes! heterochromatin silenced DNA doesn't code for anything (50% of Y chromosome)
X linked Diseases: Red Green Colorblindness Hemophilia- Queen Victoria's fam Duhchenne Muscular Dystrophy (little or no protein) - Becker- mutation causing 30-50% of protein. non lethal.
Rett Syndrome: MECP2 mutation. mental, developmental,verbal.
Only observed in females. Males are lethal prior to development.Why? Males only have one copy of X so they will have phenotype if they carry the gene. They are haploid.
In females, other X can balance mutation phenotype. One chromosome is silenced...X silencing only affects 50% of cells. half cells will be normal and half will be mutant. Phenotype will be much less severe
gene poor
b/c males IX- linked
only have genes
-
x-inactivation IS RANDOM 50150 Chance of
expressing mutation → females
1001. → males
PAR
Pseudo-autosomal regions (PAR) are sites of obligatory crossovers in male meiosis
Dumont, Genetics (2017)
Sex chromosome synapsis in spermatocyte cell spreads. (A) Schematic of the mature Synaptonemal Complex (SC) composed of both lateral elements and transverse filaments. The SC serves as a scaffold for the attachment and organization of chromatin loops in meiosis I. (B) Pachytene spermatocyte immunostained for SYCP3, a component of the lateral elements of the SC, and kinetochore-associated proteins visualized by CREST antibodies. (C) Synapsis of the sex chromosomes (circled) is restricted to the PAR.
—PAR: green regions that behave as autosomes. Identical on X and Y (have homology). —During male meiosis, X and Y pair within PAR region bc they are identical. There is an obligate crossover in PAR1 region
SYP3 holds homologs together like glue.
2 regions that are Identical in the x &y b/c in meiosis 2 chromosomes
need topair . PAR region =only region
bothcan
synapse
Organization of the major pseudo-autosomal region (PAR1)
PAR region has many genes, homology on X and Y.
Boundary: X and Y are different
SHOX: DNA binding protein. HOX= homeobox which is a DNA binding domain, a Transcription Factor
SRY gene is next to boundary on Y chromosome. Testis determining factor.
expressed 1h BOTH chromosomes .
exception to X- inactivation in females .
Henes 11
male
gender
SRY encodes for the sole Testis Determining Factor:
• XX individuals with a portion of Y corresponding to SRY translocated on autosome are males (sterile) • XY females have mutations in SRY • Transfer of human SRY into XX mouse embryos leads to male (sterile) phenotype
• SRY encodes for an HMG-box transcription factor • SRY upregulates the expression of SOX9, another transcription factor which leads to the differentiation of uncommitted support cells to Sertoli cells, not to granulocytes.
SRY binds to promoters- upregulates SOX9-
Primoridial germ cells- Sry activates SOX9
if SRY present, cells become sertoli cells. if no SRY, become female germ cells granulocytes
sex- reverse phenotype .
Not committed
to either sex
Aneuploidy of the X and Y lead to human disorders
XO: Turner syndrome • Phenotypically female but sterile, affects in in 2-5,000 live births • Short stature (SHOX deficiency), lymphedema, broad chest, heart defects, low set ears • Caused by non-disjunction in the father in majority of cases (can also be caused by Xp
deletions) • While viable, XO karyotypes are found in 15% of stillbirths, often due to congenital heart
defects
XXY: Klinefelter syndrome • Most common sex aneuploidy in males (1 in ~1,500 live births) • Reduced testosterone levels and associated physical changes • Hypogonadism and reduced fertility • Increased risks of autoimmune disorders, breast cancer, osteoporosis • Caused by non-disjunction in either the male or female meiosis (XY + X or Y + XX)
B
If PAR regions fail to disjoin at crossover of X&Y usually in the father
Increased female diseases since they carry 2X chromosomes one of the 2X is inactive
missing the PAR region -
sperm makes no contribution . (otherwise most non-disjunction→ mom's side) only has11 dose I
-
Are
undergoing x-inactivation at random
Evolution of X and Y chromosomes
The SRY gene evolved prior to the divergence between
marsupials and placental mammals
1- Acquisition of a Testis Determining Factor 2- Suppression of recombination around TDF. Acquisition of additional male- specific genes. Extension of “no recombination” zone 3- Harmful mutations on the Y chromosome are fixed. Inactive genes are gradually lost, leading to “degeneration”. According to rate of decay (~4.6 genes per million year, there might be no more gene on the Y in 10 million years)
The transcaucasian mole vole has lost its Y chromosome
Identical autosomes. TDF Evolved on one of the autosomes. Suppression of crossover allowed evolution
Cannot exchange info so they evolved independently. On X genes can be repaired by homology. On Y there is no homology so genes are lost.
Autosome
chromosome
Surprise: the lack of interhomolog X-Y recombination has been replaced by intra chromosomal recombination between repeated genes located on large palindromes Sequence of DNA that is mirror
inverted
Conventional pathway: recombination between homologs
Y-specific pathway: recombination between sisters mediated by inverted repeats
The Y chromosome engages in unusual recombination events
This puts the Y chromosome at risk for rearrangements
2009
Gene conversion occurs in X Which can repair broken genes
Iso dicentric chromosome
acentric will be lost during meiosis
palindromes are inverted repeats specific genes are on the arm
Ion cross ever
-
>
3
←y >
2 Centromere
> no centromere
> -
snever an homolog
>
ONLY MY chromosomes
production ofabnormal chromosomes . - unstable - inviable meiosis
-death of all sperm cells. - will leadto more issues.
The recombination lifestyle of the Y chromosome also has implications for its evolution
98% similarity between human and chimp genomre
Crosses are the concert palindromes. The Y is constantly renewing itself
NO diaghot→ Very divergent
98% inversion identical
Human Genetic
Variation
What is the extent of individual genetic variation?
• What are the types of individual genetic variation? • Single Nucleotide Polymorphisms (SNPs): a single nucleotide variant
observed to vary among unrelated individual at an “appreciable” frequency.
• Structural variants (SVs): variation that arises via deletion, insertion, or rearrangement of the DNA. Types of SVs include indels, inversions, translocations. Size of SVs varies from >1bp up to >100s kb.
• What is the distribution of these variants?
• What do these variants tell us about human traits, disease risk factors, human demography?
SNPs are the most common type of genetic difference
Image courtesy of Biosciences for Farming in Africa
• DNA found in nucleus of all cells
• Human genome: 3.2 billion nucleotides (A,C,T,G)
genome 1 genome 2
A
G
C
A
C
G
T
C
Slide courtesy of Dr. Megan Dennis
Complex structural variation are less frequent but can affect large regions
“Reference” genomic region
Deletion
Duplication >1 kbp termed “segmental duplication”
Inversion
Slide courtesy of Dr. Megan Dennis
nonprocessed pseudogenes .
I SNP for every base pair .
BH 2 avg genome : aa.a % identical
which means there IS 1 SNP lKbp (1000bp)
genome 3 billion → SNP 3 Million .
The Hap Map project: cataloging human genetic diversity
Objective: to genotype at least one common SNP every 5 kilobases (kb) across the euchromatic portion of the genome Population: 270 individuals from four geographically diverse populations
• 30 mother–father–adult child trios from the Yoruba in Ibadan, Nigeria • 30 trios of northern and western European ancestry living in Utah • 45 unrelated Han Chinese individuals in Beijing, China • 45 unrelated Japanese individuals in Tokyo, Japan
Phase III: 1,115 samples with expanded geographical coverage (incl. populations from Italy, Kenya (Masai), Mexico, India (Gujurati) and African- American descent) – focus on 1.6 million SNPs.
1.1 million SNPs 3.1 million SNPs
2005 2007
Dramatic reduction of sequencing costs post HGP powers genomics for all revolution
Cost per genome
year
$3 billion $1,000
genome
“Next-gen” sequencing
DNA is fragmented to small pieces
Fragment nucleotides are
sequenced producing 150 bp
“reads”
Introduction
Genome
Illumina “Next Gen” sequencing
Slide courtesy of Dr. Fereydoun Hormozdiari
Newest sequencing technologies can generate up to 10 billion reads per run (or over 100 human genomes) in only a few days
We now have sequenced over a million human genomes…
http://biologiaevolutiva.org/tmarques/
...and hundreds of ape genomes...
Slide courtesy of Dr. Megan Dennis
Variants are identified by mapping reads to a “reference” genome
Identify single nucleotide variants
C T- G AG T CA Reference genome
Map reads
A
A
A A A
A
A A A A
A
G G G G G
G
--
--
TT
TT
TT
C
C C
C
TT A GC A --A
Coverage: how many times was a base sequenced?
Can also identify SVs with these read mappings
sequencing reads
"
deletion "
V
¥¥¥:{aemieeoanaen. -More coverage the better.
Inversion& duplication a Iot more difficult to identify but still can bedone
> 30x will definefly be able to identify SNP w/ high confidence . reads
The 1000 Genomes Project
Goal I
Figure 1: • Most known SNPs common to all three populations • Most novel SNPs unique to specific populations • Novel SNPs mostly correspond to low frequency or rare variants • African populations show 2-3 times more diversity
SNP analysis
Learn 1. All populations share a certain#of SNP - likely old events of genetic variation - shared by all 2. Reflects population history (unique to each population)
( l 't of gene)
- COMMON for all 3
European African POP pop Chinese
+
Japanese
NewSNPs - unique to specific pop
some shared some unique .
Figure 1: Abundance of structural variants. • Only 50% of short insertions/deletions and long structural variants are novel • Vast majority of mid-size variants (0.1-10 kb) are new (note Alu and LINE)
SNPs are only a small part of the genetic diversity pie SNPs are one type of variations
E § S
EE K
- Bigger the events decrease MDNA I increase
lowerthe frequency - The larger the structural variant the more likely it is to interrupt critical genes
While not as frequent as SNPs, SVs have strong impact
Campbell and Eichler, October 2013
SNV: Single Nucleotide Variant Indel: short insertion / deletion MEI: Mobile Element Insertion (Alu / LINE) CNV: Copy Number Variant
Structural variants
Not just changing 1- base pair . Butchanging MANY bp.
Makes a big qq.es identical difference In
10x YOU take Into
E genetic diversity
less account e l "
everythingelse e p - E
F x E U
←
E 100011 s s
t
o less E c g E
§ I
Frequency of SNPs Highest gain loss of but effect is not strongest entire chromosomes
We are all mutants!
• Each individual shows on average: ➢ 10-11,000 non-synonymous SNPs in protein-coding genes ➢ 200 in-frame insertions/deletions ➢ 90 premature stops ➢ 40 splice site disruptive variants ➢ 200-250 deletions accompanied by frameshifts
• Collectively, each person carries approx 250-300 loss of function variants in known genes, including 50-100 in disease-causing genes
• 30-50 true de novo germline mutations (SNPs) in family trios • Mutation rate ~ 10-8 per base-pair per generation
• Alus hopping constitute most Mobile Element Insertion events with a rate of 2– 4.6x10-2 per genome per generation or approximately 1 in 20 births • LI insertions are rarer, occurring at 3–4x10-3 per genome per generation (1 per approximately 100–150 births)
(typically breaks the protein) ( by stop codons)
Diploid : Helps buffer
new mutations that occur per generation the mutations . &
extra copy
prevents the loss
of function . - l mutation1hundred million bp - NOdifferent from other species
Born w/ AIU that
had a new insertion
SNPs are mostly contributed from the paternal lineage!
➢ 76% of SNPs the parental origin of which could be determined were from the paternal germline
➢ Rate of de novo mutations is increasing with paternal age although magnitude of effect varies with studies
➢ Evidence that some mutations might favor the growth of spermatogonial stem cells.
Ex: FGFR3 mutations that give rise to achondroplasia are almost exclusively paternal and rise strongly with father’s age
➢ Large structural variants (CNV>150 kb) also appear to be biased towards paternal germline. Segurel et al., (2014)
New Novel
↳Slope→ positive but magnitude differs
positive Trend -
Form of ← dwarfism -exclusive to paternal
Germcells produced throughout life - so many copies of the genomes
are made
much more likely For mutations to arise .
2012 Update from 1000 Genomes Project
accomplished goal to reach 1000 genomes .
won't impact Important gene most likely→lower effect on the fitness of embryo .
Where ? brain function,neurocognition77 > mitochondrial functiongenes .
2015 Update from 1000 Genomes Project
SVA: transposon NUMT: Nuclear Mitochondrial DNA
Big picture -SNPS dominate - tnostarepriviatetospecicfk Populations .
- 3x more bp being changed by structural variants compared to SNPs
Why else do we care about mapping SNPs?
❑ SNPs are modern day genetic markers along the genome
• The segregation of these markers can be easily followed in relation to any measurable trait (for instance disease, or any other property you might be interested in). This enables gene mapping.
❑ SNPs inform us about the structure of the human genome and about recombination rates
(Well come back to this later…)
Good
marker 1.Dense
Ztehdto bespread out evenly can identify on genome ancestry . fairly randomly - Andspecifically 3. can Identify actually momgeneordad genes Measukcaniearp structure of recombination In the them . human genomepretty cheaply .
Implications of genetic variation for human population history
Bustamante et al., Nature (July 2011)
Cataloguing human genetic diversity is a top priority
9 years ago
Implications of SNP mapping #1: African populations carry the majority of genetic diversity
First African genomes Nature February 2010
> African pop → reflect ancestral population colder& diverse) -all other pop evolved from the African ( others only carry a few of the diversity
Main findings from Bantu and Khoisan genomes
Difference in muscle fibers and oxygen consumptio n between BC and DA groups
Hunter gatherers use poison arrows, can survive long time without food endurance runners... In other regions they are sprinters and have muscle fibers that are different from other populations
Bushman or more different within populations compared to European and Asian person
Kalahari desert region BC hunter gatherer DA-Batu
Muscle physiology diff according diff hunting methods .
ore diff than those
Europe vs Asia
ant diversityT
1000 Genomes Project - 2015 genetic diversity update
Low variation—Eurasian population, out of Africa population
Americans, Islanders... have high levels of genetic diversity islands are descendants of slaves trade populations, this is why diversity matches the diversity of Africa. There is a lot of mixture
<
slaveTrade
African Native American
European
European & Asian .
Scale indicates increased levels of genetic diversity in populations
Genetic diversity is greatest in southern Africa, maybe populations originated from this region, not great Lake region. The southern African population are the ones who left Africa. There was a lot of migration within Africa.
Great Lake
thought to be origin Ofhuman
migrated south
✓ a
Breaking News update: September 2019
→ -
↳bit 2 genes
⇐Mugsy ! Asian
“Out of Africa” model for the spreading of Homo sapiens
Implications of SNP mapping #2: SNPs provide us information about the history of human populations
Update - Nature Genetics Sept 2011
Southern Africa populations diverged from other populations around 108-157 thousand years ago
Eurasians diverged from ancestral African populations 38-64 thousand years ago
Effective population size of the ancestors of all modern humans was ~9,000
Bottleneck
Bottleneck
Most SNPs In our gnomes, have no effect. Look for oldest SNPs and can use this data to a population
East Asians populating North America
Africans have more genetic diversity because they have not experienced the bottle neck effect that the rest of the world has.
In Africa this is the number to generate all the diversity we have now
+ Only a few left Africa most diversity remained . qfftheite.ca |
.
- glacier age
2501000 Islands later cuz need boats.
Mohe went
south so
diversity maintained as
High
Nature January (2013)
• Correlates with the explosion of human populations and with invention of agriculture
• Different sub-populations carry different mutational loads
Implications of SNP mapping #3: Most deleterious SNPs are of recent origin
3/4 of protein coding SNVs and 86% of all SNVs that are deleterious happened within 5000 to 10,000 years.
5000 to 10,000 years ago, there was an explosion of written language tools agriculture. Many variance arose from increase in population during this time.
Age of deleterious SNPs among molecular pathways to first as well. Like the crêpe cycle has lower mutation than less important pathways since they are less lethal
Europeans have more deleterious variants of essential and medallion disease jeans compare to African Americans due to weak purify election due to out of Africa disposal
exome = protein coding
- cities '
-
language -
technology (farming) - population explosion
Archaic human genomes!
Dr. Svante Paabo Max Plank (Germany)
Dr. Richard Green UCSC
Science May 2010
From 1-4% of the genome of modern eurasians is derived from Neanderthal sequences
Ancestors of Eurasian were coming North from Africa & met w/ ( interbreed) w/ Neandertal . Then populations diverged(Europe and Asia) These ppl took the product of encounter. Which was DNA blocks from Neandertal populations. that’s why they are In Eurasian populations but not in African populations. Because this happened during the evolution and intermixing of populations.
These bones worst extracted to give archaic human genomes. Had to be reported and pieced together back together from bacterial chinos that were residing on the bones
intermixing of human population (neanderthal vsHomo sapiens.) there was breeding between the two populations. Eurasians carry segments of neanderthal genomes
Hybridization of population. -301. Of Neanderthal
genome exists
collectively
T
Nature December 2010
The Denisovans: a new group of (sequenced) archaic humans
Neanderthals
Well preserved bones in the scheme. Found that there was another human species (Denisova) closely related to Neanderthals. This population was not involved and chin flow from Neanderthals into your rations. Contributed 46% of genomes into present the Melanesians (Indonesia in south Asia)
'' "nuthouses".
More closely related to
Neanderthals
“Out of Africa” model for the spreading of Homo sapiens
New additions to the history of human populations
Bottleneck Neanderthals
Denisovans
Meeting between Neanderthals and Denisovans populations migrated down to Australia in the islands. Neanderthals did not mix with the Denisovans for populations in Europe/ Eurasia.. African descendants do not carry the Neanderthal genomics sequences. Neanderthals Denisovansand Homo sapiens all migrated out of Africa
Recent Milestones in human evolutionary genomics From Nielsen et al. (2017)
Two REALLY good books on the topic
Africa’s first ancient genome Science – Oct 8, 2015
Gene backflow Into Africa ~3,000 yrs ago
Mota died before the early Neolithic farmer population came back (black flow). — helped clarify the historic events that are not present in the historical documents.
Climate and the rest of the world is colder so DNA was better preserved. Africa is so hot but DNA is mostly integrated.
Artifacts were about 4500 years old. Sequences look like Gracian DNA.
Hypothesis: people from urination my grade it back to Africa. However mota died before backflow into Africa so he did not have the DNA
Most African genomes have Eurasian ancestry because of the Backflow from your EurAsian populations. Around 67% of your Eurasian ancestry. Back flow was from Sardinia/Tuscan regions back into Africa
Mota was the first genome sequence from Africa that had a purely African genome with no Eurasian sequences Mota lived before the backflow to Africa occurred
=
Breaking News: Science – September 2017
Sequence individuals from South Africa. Allow researchers to measure divergence from other populations
Demographic model of African history and estimated divergences. From Schlebusch et al. Science 2017
Estimated divergence for
modern humans
Gene flow back from European populations
Migration of West African populations back to Southern Africa
Modern humans skull approximately 250,000 years ago
All groups of people with in Africa that have split from original ancestor
Eurasian & East African people leaving Africa
Long gray line represents backflow into Africa. Green line represents migration back into South Africa
Out of Africa migration
thickness of the branch show relative
genome size
Nature (Jan 2017)
Genetics suggest ghost population which are pops that must have existed according to genetics. But we have no historical record or archaeological record
unknown Hominin- phone segments of DNA that cannot be Denisovans
Hominin- diverged into Neanderthals and Denisovans
Neanderthals and Denisovans died but crossed with humans before they died
hybridization w/ Neanderthal
event of hybridization bit Denisovah
How about intra-individual genetic variation? Classic examples of cells within our bodies that have different genomes:
- Immune system - Germ cells
Does DNA within an individual vary at all?
Immune cell - B and T cells breaking and reassembling DNA to create new antibodies Germ cells – meiosis leads to crossover which introduces variation
analysis of CNV in diverse tissues. cancers are notorious for CNVs
Looked at tissue samples between individuals and looked at CNVs. 79% of genomic change events affect genes.
look at various tissues
chromosomes are on the outside. The concentric circles inside display the positions of a variable region. Focused on copy number variance. ex) duplications or deletions.
could recognize copy # variation within the body of individuals
10WIn variation .
I 11 tissues in 6 Individuals
New mutations can lead to mosaic individuals
(all women are mosaic for which X chromosome is expressed)
Mosaic vs Chimera
Mosaic: one genome which has genetic change Women are mosaic for which X is expressed in different tissues
Chimer: 2 fertilization events and cells fuse or exchange. (Not a result of mutation)
These somatic mutations WILL NOT pass onto offsprings.
-embryos
Reminder: LINE-1 retrotransposition can create somatic mosaicism
LINE-1 retrotransposition was typically thought to occur in parental germline.
New data suggests that somatic tissues are targeted instead.
L1 and retrotransposons can jump within cells to create somatic moscaicism. Transposon hoipping sites are colored regions
L1 are good at hopping within the brain. Neurons can store things in different places and remember things
Can be affected by epigenetic and hormonal effectrs on chromatin structure
Can have an implant on health and adaptation of individuals
L1 transposition is linked to epigenetic. Storage of info and experiences is. Brian chemistry can be changes through epigenetics.
Cells that are more sick vs cells that are more strong and healthy...
Mosaicism is much more common than previously thought, esp. in women
Lupski, Science, July 2013
• 56% of women who had sons have a Y chromosome in breast tissues • 63% of women who had sons have Y chromosomes in their brains • 74% of individuals who received bone marrow transplantation have
mixed genomes
Red cell is lethal, cell will be eliminated (Catastrophic mistake)
Why is there small chunks of Y chromosome in breast tissue? DNA from fetus can leak from placenta into bloodstream of the mother. Ends up transformingmom’s Cells with foreign DNA. (Obvious to detect Y chromosome in females) Ask the embryo develops during pregnancy is sheds a lot of cells, The souls die and release the content into the embryonic sack and that is exchanged into the placenta and makes its way into the bloodstream. ( mom’s bloodstream is fairly high with free cell DNA from the fetus)
Up to 13% of cell- free DNA floating in mother’s plasma is of fetal origin!!
Science Translational Medicine
Snyder et al., Prenatal Diagnostic 2013
Maternal plasma contains fetus DNA. Is able to sequence entire genome of fetus. How to tell difference from mom DNA and fetus DNA? —have both maternal and paternal DNA taken from tissues and cell free DNA from fetus and compare.
Companies are cornering this market for the detection of fetal trisomies
Can diagnose trisomy 21, sequence DNA leaked into mothers bloodstream. Count the number of copies of chromosome 21. If you see it 3 times for every 2 autosome, there is trisomy 21.
Diagnosing and Anaploidy
Newest development: using cfDNA to diagnose cancer!
(2016)
Sampling chromatin in mothers blood can pinpoint abnormal DNA rearrangements and which type of tumor it came from
nucleosomes wrapped with histones Every cell type has a unique nucleoside deposition that reflects genetic expression.