Molecular Genetics

profileyjysupergirl
Slideset1-7.pdf

MCB162: Human Genetics

Goals of class: 1. Explain central concepts of modern human genetics 2. Illustrate how molecular and genomic approaches are changing human genetics 3. Discuss professional practice of human genetics in the era of personal genomics 4. Explore scientific literature and databases

Genetics Timeline

1865 Gregor Mendel's paper – Classical Genetics is born 1869 Miescher isolates DNA (“nuclein”) from white blood cells and other cells 1902 Suton describes the chromosome theory of heredity 1910 Morgan shows that genes reside on chromosomes 1927 Physical changes in genes are called mutations 1931 Crossing over is the cause of recombination 1941 Tatum and Beadle show that genes code for proteins (“one gene, one protein”) 1944 Avery, McLeod and McCarty isolate DNA as the genetic material 1952 The Hershey-Chase experiment proves the genetic information of phages to be

DNA 1953 DNA structure is resolved to be a double helix by Watson and Crick,

with critical assistance from Rosalind Franklin 1959 Lejeune identifies trisomy 21 as the cause for Down syndrome – first evidence

that chromosome abnormalities can underlie human disease 1961 Brenner, Jacob and Meselson show that mRNA ferries genetic information from

DNA to the protein synthesis machinery: central dogma 1961 First systematic screen for a metabolic defect in newborns: phenylketonuria 1966 The genetic code is cracked

Part I: The Pioneers' Years

Part II: The Molecular Genetics Revolution

1972 Cohen and Boyer create the first recombinant DNA 1973 First animal gene (frog) cloned in Bacteria 1976 First biotechnology company founded: Genentech 1977 DNA is sequenced for the first time by Fred Sanger 1978 Riggs and Itakura produce human recombinant insulin and license it to Genentech 1983 Mullis discovers the polymerase chain reaction (PCR) 1984 First disease gene cloned: Huntington’s disease 1984 Jeffreys introduces technique for DNA fingerprinting to identify individuals

genetically 1989 Collins and Tsui sequence the first human gene (CFTR) 1990 First gene therapy trial on a four-year old boy with an immune deficiency 1991 The genetically modified FLAVR SAVR tomato is FDA approved (Calgene) 1995 The genome of Haemophilus influenzae is sequenced 1996 Saccharomyces cerevisiae is the first eukaryote genome sequence to be released 1997 Wilmut clones Dolly the sheep from the cell of an adult ewe 1998 The first genome sequence for a multicellular eukaryote, C. elegans is released 1999 The complete sequence of the first human chromosome (22) is released

2003 Successful completion of Human Genome Project 2004/05 Rat and chicken genomes sequenced. Dog and chimpanzee genomes sequenced;

HapMap project completed; Cancer Genome Atlas project initiated 2006 23 and me, Navigenics are founded; DeCODE Genetics offers personal genetics

test 2007 Shinya Yamanaka reprograms adult human cells to embryonic stem cells

The first two individual human genomes are released. The ENCODE project analyzes the function of 1% of the human genome

2009 Mouse, cow, and corn genome are published. Representative Indian and Korean genomes are released.

2010 First data release from the 1000 Genomes Project. A representative Japanese genome is published. First African genomes are released. The Neanderthal genome is sequenced.

2011 Exome sequencing is applied routinely to many rare human diseases. 2012 The tomato genome is sequenced. Multiple breast, lung, melanoma, leukemia,

kidney cancer genomes are released. ENCODE releases whole human epigenomes.

2014 The RoadMap Epigenomics project will catalogue epigenetic information across tissues and developmental time.

2016 NIH announces 1-million people Precision Health initiative. AstraZeneca announces it will sequence 2 million genomes. Earth Microbiome and Earth BioGenome projects.

Part III: The Genomics Revolution

The era of personal genetics?

The era of personal genetics?

TAS2R38

OCA2 ABCC11

There are genetic variants in these

genes

Each variant is associated with

changes in a trait LCT

• How do you identify genes? • How do you associate genes with traits? • How are genetic variants identified and their effect measured? • How are genetic variants distributed among populations?

• How much of a trait is genetically determined? • What is the heritability of a trait?

The era of personal genetics?

In most cases:

•Genetic analysis leads to the calculation of an “odds ratio”

•The heritability of a trait is not 100% due to environmental influence

• In most cases, traits are polygenic

Normal human karyotype after G-banding of mitotic chromosomes

2

The human genome: 24 chromosomes of varying sizes and shapes

Centromere: Functions in chromosome segregation during mitosis

Telomeres: chromosome ends 3.2 billion base-pairs!

Condensation = 10,000 fold

Fundamental importance of chromatin in DNA compaction and organization

12

Passing the genome on: from cell to cell, person to person

13

Mitosis and meiosis are the two basic events underlying genetic inheritance

MITOSIS

MANY CELL TYPES SAME DNA

ONE PERSON

HIGH SPECIALIZED CELL TYPE NOVEL DNA COMBINATIONS

NEW PEOPLE!

MEIOSIS

PROPAGATION OF DNA

REPLICATE DNA REPLICATE AND RECOMBINE DNA

Mitosis ensures the faithful distribution of DNA at each cell division

G0 (quiescent)

Most cells in the adult body are quiescent except:

• bone marrow • epithelium (intestine) • germ cells (males) • adult stem cells (wound healing, tissue regeneration, maintenance)

Mitosis and meiosis are the two basic events underlying genetic inheritance

Meiosis

Mitosis and

differentiation

Mitosis versus Meiosis

No change in ploidy

haploid gametes from diploid cells

KEY terms: homologues vs. sisters

Meiosis: the essence of heredity

germline

oocyte

Meiosis

zygote fertilization mitosis

differentiation mitosis

sperm

diploidhaploid

Meiosis

Credit: Dr. Neil Hunter

MEIOSIS Generating haploid gametes

from diploid cells

! Crossing-over occurs almost exclusively between homologs (Mom and Dad chromosomes), not between sister chromatids

! Cross over sites are not distributed randomly along chromosomes

! Each chromosome engages in 1-2 cross overs

End of meiosis I

Possible gametes

homologs segregated

recombinant

involved

4-haploid gametes from ldiploid precursor cell

A matter of interpretation…

- Meiotic DNA recombination likely did not evolve to generate genetic diversity by mixing genomes through crossovers.

- Meiotic DNA recombination, instead, ensures the proper segregation of homologues at meiosis I, thereby permitting the formation of haploid gametes.

unclear

sister-chromatid cohesion

Recombination Promotes Chromosome Pairing

Credit: Dr. Neil Hunter

Parental chromosomes = homologues

(shown here post replication)

-

Recombination Promotes Chromosome Pairing

homolog pairing

Recombination Promotes Chromosome Pairing

synaptonemal complex

( chromatid loops

Crossovers Direct Homolog Disjunction

crossing over

Crossovers Direct Homolog Disjunction

mono-oriented sister kinetochores + sister cohesion ensures segregation of homologues

Crossovers Direct Homolog Disjunction

protection of centromere cohesion

cleavage of cohesion

Accurate Segregation Produces Haploid Gametes

euploid gametes

But doesn’t meiosis also produce genetic diversity?

Independent assortment of maternal and paternal homologs at meiosis I produces the first level of genetic diversity

Possible combinations = 223 or 8.4 million possible arrangements

Real life example of how crossover site position and site number can vary

Parent Child

Recombination at crossovers adds another level of genetic diversity

R

R

Birth

In utero development

Throughout life

Puberty

m ei

os is

Oocytes blocked in MI (Diplotene / diakinesis)

m ei

os is

ovulation

Male and female meiosis are very different!

diploid . multiplies

fixed # of

At Not begunoocyte ←Birth-7 meiosis 1- until post replication stage puberty(

f - -

diploid

until oocyte is ovulated .

Itv2 each)

*The timing of meiosis142 are

different inmates -8 females.

0 10 20 30 40 50 0

2×1006

4×1006

6×1006 ~6 million (18-22 weeks gestation)

~1 million (birth)

~300,000 (menarche) ~1,000

(menopause)

Attrition

post-natal Atresia

ongoing Atresia

o o

c y

te /f

o ll

ic le

n u

m b

e r

age (yrs)

Oocyte Numbers Decline Dramatically With Age due to Quality Control Processes

quality control process (most destroyed)

- →

quality control -

E age when Ovulation

begins

Male and female meiosis are very different!

¾ of female meiotic products are discarded: polar bodies

Meiosis II only happens after fertilization

primary oocytes are blocked during meiosisI

Male versus female meiosis

Diploid germ cell population rapidly multiplies during early development

Cells enter meiosis during fetal development

Developing oocytes pause in Mei i I f decade !

Oocytes undergo continuous QC and their numbers plummet

Oocytes complete Meiosis I only upon ovulation

Oocytes complete meiosis II only if fertilized

One oocyte is produced from one diploid germ cell precursor (and three polar bodies)

Cells keep proliferating and create a population of diploid germ cells

Meiosis begins at puberty, continues throughout life (no pause)

A very large number of spermatocytes are produced

Four spermatocytes are produced from one diploid germ

cell precursor

.

.

.

Aneuploidy in gametes:

2% Sperm

15-70% Oocyte (increases with maternal age)

Incidence and Impact of Aneuploidy in Humans

≥25% of all conceptions are miscarried (≥1 million/yr in the U.S.)

Impact of zygotic aneuploidy:

~20% of pre-implantation embryos

35% of spontaneous abortions

4% of stillbirth

0.3% of livebirths

chromosomes 13, 18, 21, X or Y ~12,000/yr in the U.S.

inctfrrect# of chromosomes

wrong # of chromosomes (abnormal)

before implanted on

↳day 2N3 the Uterus

'Yeotthallv If wrong # of chromosome 112,3 , 4. .. embrio can't survive except

z

↳Most gene poor chromosomes

of all , carries the smallest genes

B B B B B B B B B B B B B B B B B B B

B B B B B B

B

B

B

B

B

J J J J J J J J J J J J J J J J J

J J J J J J J

J

J

J

J

J

J

20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 0

0.02

0.04

0.06

0.08

0.1

0.12

B Down Syndrome Risk

J All Trisomies Risk

Maternal Age

* Trisomic for chromosomes 21, 13, 18, X or Y - all other trisomies cause miscarriage

Fr eq

ue nc

y of

L iv

eb irt

hs *

Altered Maternal Crossing-Over Is Associated With Infertility, Pregnancy Miscarriage And Chromosomal Diseases

§ Aneuploidy is associated with defective crossing over (in utero) and progressive loss of cohesion.

§ Crossover rates measured in children born to older mothers are higher than in children born to young mothers, implying that advancing maternal age selects for oocytes with more crossovers.

§ Extra crossovers may act to buffer the maternal age effect.

§ Variation in human crossover rate is heritable.

§ Women who inherit higher crossover rates have slightly more children.

in oocytes that are blocked in meiosis1 . when cohesiont pulled the wrong way by spindle fiber

more crossover

insurance less likely to dissolve .(mechanism) ensures homologs tobe more cohesive .

Trisomy 21 diagnosis correlates with maternal age

How is diagnostic performed at 16 weeks?

Why is 16 week diagnostic different from live births?

Why focus on trisomy 21 and not other trisomies?

Maternal

① •non- thVasIV ly w/ Ultrasounds.

• collect DNA from the fetus by collecting amniotic fluidW/ huge needle . (strong risk

② of miscarriage), • Also from MOM'S blood

.

②some parents may discontinuethe pregnancy o fetus dies→ Mlscartage °

Parents have experience w/ down syndrome✓ so gets tested earlier .③ o represents 971. of trisomy in humans . ° most tolerated trisomy

Key terms (slide set 1): Chromosome structure � Telomere, centromere, metacentric, acrocentric

Mitosis DNA replication Sister chromatids Homologues Meiosis Crossovers Chromosome segregation Independent assortment Haploid versus diploid Euploid gametes Oocytes, polar bodies Aneuploidy Non-disjunction Trisomy

Slide set 2: Organization of the human genome

The Human Genome Project (HGP)

1988: Human Genome Organization is founded to coordinate international efforts aimed at mapping and sequencing the human genome

1990: HGP is presented to Congress as a 15-year plan

1992: Low-resolution genetic linkage map of entire human genome published 1994: Completion of second-generation DNA clone libraries representing each human

chromosome by LLNL and LBNL. Genetic-mapping achieved 1 year ahead of schedule

1997: Physical maps – high-throughput DNA sequencing 1998: Celera Genomics formed to sequence much of human genome in 3 years using HGP-

generated resources 1999: Chromosome 22 is sequenced. 1 billion base-pair mark 2000: Chromosomes 5, 16, 19 and 21 are sequenced

President Clinton announces the completion of a working draft of the human genome by the NIH and Celera Genomics

President Clinton signs executive order prohibiting federal departments and agencies from using genetic information in hiring or promoting workers

2001: Publication of Initial Working Draft Sequence February 12, 2001 2003-4: “Final” draft of the human genome.

→ 13N14years .

→ genetic non-discrimination .

Nature (2001) vol 409

Science (2001) vol 291

Nature – Sept 2004

Still missing: ~200 million bp of highly repetitive peri-centromeric and centromeric DNA

Heterochromatin Sequence of the human genome in book format at the National Human Genome Research Institute

BTW: who’s genome was it?

-Still missing about 200 million base pairs of Peri-centromeric and Centromeric DNA -we don’t know if any variance of any change still trying to figure out : where is the variation at and why it matters???

(a

public→ anonymous

private→Craig venter (owner)

& his dog

Genome of DNA Discoverer Is Deciphered (NY Times) By NICHOLAS WADE Published: June 1, 2007 The full genome of James D. Watson, who jointly discovered the structure of DNA in 1953, has been deciphered, marking what some scientists believe is the gateway to an impending era of personalized genomic medicine. http://jimwatsonsequence.cshl.edu

hid atzhemier gene to protect his children .

Human Genome Organization: Genes, “junk”, and repeats…

(circa 2010)

The human genome carries ~20,000 genes

Class discussion: • how would you discover & annotate genes? • why is this even a question?

(exons)

-How much of our genomes code for proteins? - 1.1% -4% codes for RNA genes: tRNA; rRNA; siRNA; snRNA -45% are transposons --> selfish parastic RNA elements and want to make more copies of themsleves -6% is heterchromatin

How would you discover & annotate genes? -Find promoter -coding regions -Homology searches: important genes have evolved over a long period of time so it will be largely conserved

Orthologues: sequencein the human genome that are homologous or conserved with other genes

RNA-sequence or transcript domix -extract on it, bring it back to DNA, sequence, and map it back to human genome -which human genomes are actually transcribed into RNA, because the idea is that if it is transcribed into RNA are likely to be a gene -will only be able to catch genes expressed in the specific tissue -its is cell type specific and there are issues with sensitivity

① knockout→ phenotype - RNAsequencing→ house keeping -

CHIP-seq

②Computational -

look for conservation in otherspecies genomes - look for evidence of protein coding there

From Nature (2001), 409:860-921

“Gene” (i.e. protein-coding) density varies among human chromosomes

“Gene-rich”

“Gene-poor”

-"Gene" protein-coding density varies among human chromosomes --> Ex. 19 is gene rick while 18 is gene poor

1St publication . -

gene information IS NOT uniform. (density)

fewest

density

Gene density varies along each human chromosome together with GC content and CpG density

• GC percent correlates gene-rich regions • G-banding (from Giemsa staining) correlates with AT-rich, gene-poor regions

Chr17

CpG islands are GC-rich, CpG dense regions (~ 1 kilobase) that are highly enriched at the 5’-end of genes

• CpG islands serve as promoters for 60% of human genes, particularly “housekeeping” genes

Human genome browser: http://genome.ucsc.edu

light→ Go rich dark→ AT rich

stained by DNA bindingdie . ↳ likesto bind AT regions

# I class ofpromoters in the human housekeeping genes. C60%)

Human genome assembly (version)

coordinateschromosome

Position of gene

Filled boxes = EXONS (protein-coding)

Last exon and 3’UTR

Typical structure of a human gene

BRCA2

First exon and 5’UTR

CpG island promoter

Thin line (no box) = INTRONS Orange signal above gene is from mRNA-seq. Introns are spliced out.

BRCA2 protein-coding sequence space: 10.2 kilobase BRCA2 genomic sequence space: 84.2 kilobase

BRCA2 : found in breast& ovarian cancer

Cpa island overlaps the 1st exon when mutated > essential for life; important In DNA repair

orange signals for exons (from mRNA

- seq)

-Introns are transcribed THEN

spliced

exon's

what are introns ? 70184 are introns

→→ →transpoohs : repeating units

right④Strand

“Average” structure of human protein-coding gene: • 27 kb of genomic sequence • 9 exons (122 bp average) • Encodes for a 2.6 kb mRNA • Including 5’-UTR (250 bp) and 3’-UTR (750 bp) • Average human protein (530 AA).

Transcription / Splicing Capping / Polyadenylation

CAP AAAAAAAA

Huge variation in gene organization and size • Gene size: 0.4 – 2,400 kb • Exon number: 1 – 363 • Exon size: <10 bp – 7.6 kb • Intron size: 20 bp – 100s of kb • polypeptide: 10 - >38,000 AA (Titin)

-"Average human-protein-coding gene" -structure includes a 5' UTR, then exons, and then 3' UTR after transcription/splicing it includes a cap and poly-A tail -usually 27kb of genomic sequence -9 exons -encodes for a 2.6 kb mRNA -average human protein (530 AA) -thin and then thick exon means the thin part isn’t coded anymore (3' untranslated region)

-where does transcription happen? —inside the nucleus

-where does splicing occur? —happens during transcription!!

so in the nucleus -mature mRNA leaves the nucleus -where does BRCA2 live?

—part of DNA repair so in the nucleus

- structure includes

5 ' UTR→ exons→ 3

' UTR

after transcription it includes a

cap & poly A -tail

° thin→ thick exon means the thin part isn't coded anymore ( 3 ' UTR)

Nuclear vs. mitochondrial genomes

3.2 billion bp ~16,500 bp

Exons ONLY!

Structure of the mitochondrial genome

Mutations in mtDNA can lead to several human diseases

Leber’s Hereditary Optic Neuropathy (LHON)

- presents in midlife as acute or subacute central vision loss leading to central scotoma and blindness

- disease results from a defect in the respiratory chain

- Maternal inheritance in familial cases

Why so few genes in the mitochondrial genome?

Recap: RNA and protein transport in eukaryotes.

Who codes for mitochondrial gene?

→ The nuclear genome .

-

-

-

#

> mitochondrial mutation .

From degeneration of optic nerve

→ Most of genes that code for mitochondrial protein are located in the nucleus .

cristae There are alot of mitochondrial genes that are made in the nucleus

.

nucleus (transcribes the gene in the nucleus) •He •• where does transcription happen?

• →- - inside the nucleus

⑥ (Mito

of splicing ?

I ⑧protein Mitochondria - Also nucleus b/c it happensneed to be during transcription . 7 Imported backRibosome /

mitochondria

→ Most of genes that code for mitochondrial protein are located in the nucleus .

WHY ? 1. Mitochondria = hostile environment b/c it is generating alot of Oz species which is toxic for DNA

Ribosomal RNA genes - Ribosomal RNA is a central component of the ribosome

- Plays both a structural and a catalytic role: the peptidyl-transferase activity responsible for adding an AA to a growing polypeptide chain is catalyzed by rRNA!!

- ~2 million ribosomes/cell - 75% of transcriptional output! - rRNA are transcribed by a specialized RNA polymerase: RNA Pol I - rRNA transcription happens in a specialized nuclear sub-compartment: the

nucleolus

Protein-coding genes are “easy” to identify (~20,000 in total) Non-coding RNA genes are much more difficult to identify The human genome encodes at least 5-6,000 RNA genes

Red: RNA Blue: Protein

Large ribosomal subunit from H. marismortui (from Moore and Steitz. Annu Rev Biochem. 2003;72:813-850

Staining of HeLa cells with Anti-nucleolin, the most abundant protein in the nucleolus

-

↳no membrane .

nucleolus

2

Telomeres: chromosome ends rDNA repeats

• rDNA repeats localize to p arm of acrocentric chromosomes (13p12, 14p12, 15p12, 21p12, 22p12)

• Each chromosome carries hundreds of copies of tandemly repeated rRNA genes

p-arm

]

↳p-arms very short

why so many ? b/c we make so much

Acrocentric chromosomes are sometimes involved in Robertsonian translocations

Translocation: a chromosome translocation is a chromosome abnormality caused by rearrangement of parts between nonhomologous chromosomes

Carriers are phenotypically normal

When there

is a short arm

& a long arm .

NO

shredded ( centromere

Carriers of Robertsonian translocations often produce unbalanced gametes

Higher risks for Down syndrome & reproductive issues

Depending on HOW homologs pair .

Problems begin

PSEUDOGENES

• Initially thought to represent non-functional (dead) copies of gene parts

Why do we care?

• Pseudogenes occupy a significant portion of our genome

• Pseudogenes represent interesting events of genome evolution and variation

• Pseudogenes can be functional in some (rare) instances (undead genes?) • Some pseudogenes are transcribed • Some pseudogenes code for proteins

Pei et al., (2012) Genome Biology

-Why do we care? --> occupy a significant portion of our genome --> represent interesting events of genome evolution and variation --> can be functional in some rare instances

But we know they can code for proteins

- -

Examples of clustered non-processed pseudogenes

Non-processed pseudogenes - Result from the duplication of (parts of) the genomic sequence of a gene

(includes both exons and introns)

Two forms of pseudogenes reflect the mechanisms by which they formed

Adult hemoglobin: a2b2 >97%Fetal hemoglobin: a2g2 Fetal hemoglobin (yolk sac): z2e2 Adult hemoglobin: a2d2 <3%

—occurs at the level of DNA that will be copied somewhere else in the genome ex. alpha-globin cluster -pseudogenes arise from copies that got changed slightly ex. NF-1 gene -mutations in NF1 can cause Neurofibromatosis type 1

-

higheraffinity for oxygen . allowsfetusto grab02 frommom's It

Examples of dispersed non-processed pseudogenes

Mutations in NF1 can cause Neurofibromatosis Type 1 (among other diseases): - autosomal dominant disorder characterized by cafe-au-lait spots, Lisch nodules in the eye,

and fibromatous tumors of the skin. - Individuals with the disorder have increased susceptibility to the development of benign and

malignant tumors. - The worldwide incidence of NF1 is 1 in 2,500 to 1 in 3,000 individuals

Psedogenes arises from copies that

changed slightly ex) NF-t gene

Processed pseudogenes - Result from reverse-transcription of mRNAs

(only carry exons – no introns)

2. processed -result from reverse-transcritpion of mRNAs -only have exons -NO introns -result from RT of abundant mRNAs

Processed pseudogenes often originate from highly transcribed genes - Result from reverse-transcription of abundant mRNAs

Pei et al., (2012) Genome Biology

Highlyexpressed =most likely to end up as a processed pseudogene

KEY terms (slide set 2): • Gene structure • GC content / gene density • CpG island • Heterochromatin • Mitochondrial genome • Ribosomal DNA array • Nucleolus • Translocation (incl. balanced translocation) • Un-processed pseudogenes • Duplication • Processed pseudogenes • Reverse-transcription

Human Genome Organization: of “junk” and repeats…

10113120

Tandem repeats

Satellite repeats: repeated millions of times in tandem over peri-centromeric regions (5 – 170 bp in length per repeat)

Mini-satellite repeats: 10-100 bp, dispersed through genome, hyper- variable between individuals. Used for genetic testing (forensics, paternity). Includes telomeric repeats.

Micro-satellite: arrays of simple repeats, dispersed throughout. Most frequent: (CA)n, (AT)n, An

A. Paternity test (Mother, Child, Father 1, Father 2) B. Forensics

motheredgamers -

↳ In an arrangement head to tail w/ arrays of these repeats .

distinguished by lengths &positions on chromosomes

- largest - corresponds to DNA sequence either at the centromere

or around the region a

similar but smaller - Not localized

-

-

anywhere onany -

chromosome

- commonly used for genetic testing

- Main distinguish point = size .

*satelites take up 6.5 '

l. of our genome because there are SO MANY repeats.

Tandemly repeated DNA at centromeres

Note: rDNA array (most actively transcribed region of the genome) is sandwiched between silent, heterochromatic satellite repeats Nucleolus

DAPI-dense heterochromatin

5 week mouse liver cell Blue = DAPI = DNA Green = euchromatin

InaChrocentric DNA -

rDNA mostheavily transcribed region of the genome

dark blue -regions Ihave a lot

ofsatellite repeats

-LINE-1 RETROTRANSPOSITION MECHANISM -dont know for sure -LINE is somewhere in the genome 1. has its own promoter; generatess a trasncript(copy of genome) 2. transcript leaves the nucleas and make proteins (ORF1 3. ORF1 catches its own mRNA 4. makes a RNP with ORF 2 5. goes back in the nucleus 6. ORF2 will find a region w TTTTT and nicks them 7. the AAAAs of the polyA will pair 8. RT will start making the RNA; makes a DNA copy -our cells fight against this; often the RT is blocked before the end -reflects attempts to go back in the genome -can create somatic mosaicism (different cells) -typically thought to occur in parental germline -new data suggests that somatic tissues are targeted instead

-WHICH TISSUE ARE LINE MOST ACTIVE IN -germ cells !!

-INTERSPERSED REPEATED ELEMENTS: LINEs and SINEs -parasitic transposable elements -make up half of our genome 1. LINE(Long Interspersed Nuclear Element): -autonomous -codes for genes and proteins - LINE-1 is the most successful -typical form: has its own promoter; 2 genes; and makes a transcript that is protein coding 2. SINE (Short Interspersed Nuclear Element): -non-autonomous -do not code for anything -Alu family is most successful -need the RT from LINE 3. retrovirus-like -HIV -originate from a virus -autonomous -sometimes have mutations that make them non autonomous 4. DNA transposon fossils -work by cut paste -don’t need RT -occurred millions of years ago; no longer in our genome

Interspersed repeated elements: LINEs and SINEs (not to scale)

6 kb

280 bp

4 - ReaFeats are NOT tendon

451. ' spreadout in the genome

(big)→ I mRNA long interspersed nuclearelement wyz open reading frame (ORF)

promoter region

-can initiate transcription.

enzymatic activity 1

Us ,%PE→DNA Short 4

SINES (much smaller)

(typeofSINES)

weird case If poly - ATail processed

non-codingMa pseudogenes that was

ate made

captured by from mRNA reverse transcriptase &became a dimer

Transposons - DNA element thathave the ability tomakemany

copiesof itself & invades the genome of thehost . ('selfish-DNA") genomic parasite

Dispersed DNA repeats

% genome

21%

13%

8%

3%

This element can transpose which ( make more copies of itself) all on its own because it carries the key gene pol which stands for a reverse transcriptase

Can’t make copies of themselves by themselves they need help

-non-coding – Key activity needed = reverse transcriptase SINE depends on LINES

Envelope gene: lost compared to retrovirus which

makes capsids Most likely from a retrovirus

infection which lost the ability to make a capsid which

means they became endogenous genomics

parasites they stay inside ourselves and doesn’t make

viral particles

It’s a different version of the left but went

through deletion and no longer can make

copies of themselves

-Called fossils because they invaded our genome

years and years ago. -It’s still can be

recognized in our genome

Just like the retroviruses there are deleted versions of these transposons.

No longer able to make copies of themselves because they’re considered fossils. Transposes chain that was originally there has completely decayed and it’s no longer able to function properly

Most (599,900) are actually mutated & have decayed to the point that they can no longer make copies. There are only about 100 that are considered hotline. That can move around(hopping) , make copies ,and have full coding potential.

\

> Like a

Long Terminal repeats G promoter -- -

V -

LINE-1 retrotransposition mechanism

Beck et al., 2011 Ann Rev Genomics Human Genet

RNA binding protein that likes to bIND into it it’s OWN RNA. (Highest affinity for its own RNA) What it does is it recruits ORF2p at the end of ORF1 coded RNA.

endonuclease looks for genomes part with a bunch of A's & it will cleave to expose the Poly A tail of the RNA & poly T tail of the DNA. Now this creates a hook between RNA in one of the DNA strands

In cleaving a three prime hydroxyl is created which can be extended by polymerase.

insertion event of a new LINE 1 Into a different chromosome

How do these repeats Hop ? -

startshere !

* One of the 100 hotlines

LINE 1

copying Itself

= Target prime reverse transcription

poly-A-Tail

exported ° "

t d

'

byRibosome

-

-

LINE 1

copying

something But could make mistake & bind to other mRNA

.

else

If you are a SINE element or an ALU element or a viable number tenant repeat ( VNTR) ORF1p can make a mistake occasionally an will bind to for ex. An alu element ORF1p will bring ORF2p with it and it will reverse transcribes an ALU RNA then the ALU RNA will integrate itself in the same mechanism somewhere in the genome. If ORF1P binds to a cellular mRNA it will make a process pseudogene. ( when proteins from the LINE 1 element jump onto one of our own mRNA’s and reverse transcribe this back into DNA

Beck et al., 2011 Ann Rev Genomics Human Genet

Effects of LINE-1-mediated retrotransposition on the human genome

Ex: phosphorylase kinase deficiency, Alport syndrome, and Ellis–van Creveld syndrome

Ex: Hemophilia – Factor VIII insertion

Impact on human diseases

Def causes mutation for the gene most likely making it Non functional

Even if line 1 and integrates into an intron. It can alter the splicing pattern of the gene

Our genome is good on recognizing Homology And at triggering DNA recombination between homologous repeats. LINE1 on chromosome X improperly base pairs and engages in homologous recombination with another LINE 1 on a different chromosome. Could have created a translocation.

1. Could start expressing a gene in the middle of the gene. Effects are the true mental and could create proteins that have a dominant negative affect on the original proteins function

2. could cause transcription to end prematurely. Will not be able to produce before protein

3. The region LINE 1 was inserted become silenced and it be and it spreads can lead to inappropriate silencing of the human gene.

EX of how hopping can be BADfor us.

can effectively break the-

gene inserting into

l

S crossing over

2 normal

q ,

I e- region patterns where

altered of spring transcription

Stops 3

leading to inversion&

huge structural rearrangements.

Don't coagulate easily

4 BYOOD coating

LINE-1 retrotransposition can create somatic mosaicism

LINE-1 retrotransposition was typically thought to occur in parental germline.

New data suggests that somatic tissues are targeted instead, especially neurons.

Impact on cancer is known.

Where do these transpones Hop & make More copies ? ① = events of Hopping

Iv creates

cells that are now

no longer genetically identical

.

-can target developing tissues or even

adult tissues - Very active in Historically adult neurons LINE 1 promoter

MOST active In *Right after fertilization

germ cells & in & will be Inherited the zygote to every single cell BUT can also occur In .

Defense mechanisms EXIST ! ! somatic tissue→ creating mosaicism .

L1 transposition can mediate exon shuffling

• 21% of L1 surveyed contained non-L1 DNA • Length of carried over DNA varied from 30-970 bp • Up to 1% of our genome might have been generated in this way

Also mediates pseudogene formation, sometimes retrogene formation

Are repeats just “junk” or do they play a biological role? GOOD Different from Trash .

(exon transduction) (Take exon from 1 gene & insert it in another

) NIFTYan"eye Bwuotnsosmfptime

but continue

30km illion basepairs

could lead to the evolution of NEW

functions in the human genome .

Gene B gained an exon (E3) from gene A IS shuffled into geneB will make a new protein that ISbad or develop a new function .

(could acquire new skill of exon)

Are all repeated DNA sequences “junk”?

Alu’s might boost gene expression in conditions of stress

In general: repeats are strong drivers of evolution: • Promote genomic rearrangements, exon shuffling, retrogene formation • Contribute new genes: RAG1 and 2, telomerase • Contribute regulatory signals (promoters, transcription terminators, binding sites)

Schmidt et al. (2012) Cell.

CTCF, a key DNA binding Zinc finger protein involved in organizing mammalian genome in well-defined chromatin loops shows both conservation in mammals and evidence for lineage-specific expansions.

Vestiges of repeated elements (SINEs) can often be found next to these expanded CTCF binding sites suggesting that these repeats are involved in “spreading” new CTCF sites.

process pseudogene or

die w/o

- (adaptive immunesystem)

conserved

: -

Transposons present

Transposable Elements: An Abundant and Natural Source of Regulatory Sequences for Host Genes

Lynch (2016) Science.

Gradual accumulation of rare mutations that create new binding sites

Rapid spreading of pre-existing binding sites through transposable mobile elements

Gene 1 Gene 2

Gene 3

Gene 1

Gene 2

Gene 3

HOW do you evolve transcription networks ?

evolved binding sites

mutation frequency = low Transposonshave binding sites Hops everywhere & makes. copies of itself Including some that land around the

promoter region f-open regions) NOW the 2 transcription factors

onlyneedto acquire a fewmutations to learn that the binding sites are present . C learn howto recognize) - can spread DNA binding sites.

Science (2016)

Two examples of the impact of transposons on the evolution of gene networks involved in major evolutionary innovations

Cell Reports (2015)

Remote control action by TEs

Judd and Feschotte. eLife 2018

Induction or silencing of LTR5HS elements leads to changes of expression of thousands of genes, including reciprocal changes in hundreds of them

Breaking news: October 2020

New exon 2

• New (Cryptic) exon 2 codes for an extended protein

• New protein is more stable and is the one responsible for male sex determination

• New exon is derived from retrotransposon!

④ Insertto

of transposons

NOTfunctional .

The X and the Y X Is larger than

Y

Structure of the human X and Y chromosomes

heterochromatic

euchromatic

Sex-specific portion of X

X chromosome: • 1,000 genes (gene-poor) • Enriched for germ cell-

specific genes and neuro- developmental genes

• Depleted in CpG island promoters and housekeeping genes

• Highly enriched for LINE- 1 retrotransposons

Y chromosome: • 38 protein coding genes • Numerous pseudogenes

X-linked traits: • Are more likely to present in males • Are often more severe in males • Are often inherited from

asymptomatic mother

neurodeveloplemental genes on X. Autism is 3 times more likely in females than males.

depletion in housekeeping genes

LINE-1 transposons on X is 2-3 times higher than on an autosome

small amt of coding genes. Lots of pseudogenes! heterochromatin silenced DNA doesn't code for anything (50% of Y chromosome)

X linked Diseases: Red Green Colorblindness Hemophilia- Queen Victoria's fam Duhchenne Muscular Dystrophy (little or no protein) - Becker- mutation causing 30-50% of protein. non lethal.

Rett Syndrome: MECP2 mutation. mental, developmental,verbal.

Only observed in females. Males are lethal prior to development.Why? Males only have one copy of X so they will have phenotype if they carry the gene. They are haploid.

In females, other X can balance mutation phenotype. One chromosome is silenced...X silencing only affects 50% of cells. half cells will be normal and half will be mutant. Phenotype will be much less severe

gene poor

b/c males IX- linked

only have genes

-

x-inactivation IS RANDOM 50150 Chance of

expressing mutation → females

1001. → males

PAR

Pseudo-autosomal regions (PAR) are sites of obligatory crossovers in male meiosis

Dumont, Genetics (2017)

Sex chromosome synapsis in spermatocyte cell spreads. (A) Schematic of the mature Synaptonemal Complex (SC) composed of both lateral elements and transverse filaments. The SC serves as a scaffold for the attachment and organization of chromatin loops in meiosis I. (B) Pachytene spermatocyte immunostained for SYCP3, a component of the lateral elements of the SC, and kinetochore-associated proteins visualized by CREST antibodies. (C) Synapsis of the sex chromosomes (circled) is restricted to the PAR.

—PAR: green regions that behave as autosomes. Identical on X and Y (have homology). —During male meiosis, X and Y pair within PAR region bc they are identical. There is an obligate crossover in PAR1 region

SYP3 holds homologs together like glue.

2 regions that are Identical in the x &y b/c in meiosis 2 chromosomes

need topair . PAR region =only region

bothcan

synapse

Organization of the major pseudo-autosomal region (PAR1)

PAR region has many genes, homology on X and Y.

Boundary: X and Y are different

SHOX: DNA binding protein. HOX= homeobox which is a DNA binding domain, a Transcription Factor

SRY gene is next to boundary on Y chromosome. Testis determining factor.

expressed 1h BOTH chromosomes .

exception to X- inactivation in females .

Henes 11

male

gender

SRY encodes for the sole Testis Determining Factor:

• XX individuals with a portion of Y corresponding to SRY translocated on autosome are males (sterile) • XY females have mutations in SRY • Transfer of human SRY into XX mouse embryos leads to male (sterile) phenotype

• SRY encodes for an HMG-box transcription factor • SRY upregulates the expression of SOX9, another transcription factor which leads to the differentiation of uncommitted support cells to Sertoli cells, not to granulocytes.

SRY binds to promoters- upregulates SOX9-

Primoridial germ cells- Sry activates SOX9

if SRY present, cells become sertoli cells. if no SRY, become female germ cells granulocytes

sex- reverse phenotype .

Not committed

to either sex

Aneuploidy of the X and Y lead to human disorders

XO: Turner syndrome • Phenotypically female but sterile, affects in in 2-5,000 live births • Short stature (SHOX deficiency), lymphedema, broad chest, heart defects, low set ears • Caused by non-disjunction in the father in majority of cases (can also be caused by Xp

deletions) • While viable, XO karyotypes are found in 15% of stillbirths, often due to congenital heart

defects

XXY: Klinefelter syndrome • Most common sex aneuploidy in males (1 in ~1,500 live births) • Reduced testosterone levels and associated physical changes • Hypogonadism and reduced fertility • Increased risks of autoimmune disorders, breast cancer, osteoporosis • Caused by non-disjunction in either the male or female meiosis (XY + X or Y + XX)

B

If PAR regions fail to disjoin at crossover of X&Y usually in the father

Increased female diseases since they carry 2X chromosomes one of the 2X is inactive

missing the PAR region -

sperm makes no contribution . (otherwise most non-disjunction→ mom's side) only has11 dose I

-

Are

undergoing x-inactivation at random

Evolution of X and Y chromosomes

The SRY gene evolved prior to the divergence between

marsupials and placental mammals

1- Acquisition of a Testis Determining Factor 2- Suppression of recombination around TDF. Acquisition of additional male- specific genes. Extension of “no recombination” zone 3- Harmful mutations on the Y chromosome are fixed. Inactive genes are gradually lost, leading to “degeneration”. According to rate of decay (~4.6 genes per million year, there might be no more gene on the Y in 10 million years)

The transcaucasian mole vole has lost its Y chromosome

Identical autosomes. TDF Evolved on one of the autosomes. Suppression of crossover allowed evolution

Cannot exchange info so they evolved independently. On X genes can be repaired by homology. On Y there is no homology so genes are lost.

Autosome

chromosome

Surprise: the lack of interhomolog X-Y recombination has been replaced by intra chromosomal recombination between repeated genes located on large palindromes Sequence of DNA that is mirror

inverted

Conventional pathway: recombination between homologs

Y-specific pathway: recombination between sisters mediated by inverted repeats

The Y chromosome engages in unusual recombination events

This puts the Y chromosome at risk for rearrangements

2009

Gene conversion occurs in X Which can repair broken genes

Iso dicentric chromosome

acentric will be lost during meiosis

palindromes are inverted repeats specific genes are on the arm

Ion cross ever

-

>

3

←y >

2 Centromere

> no centromere

> -

snever an homolog

>

ONLY MY chromosomes

production ofabnormal chromosomes . - unstable - inviable meiosis

-death of all sperm cells. - will leadto more issues.

The recombination lifestyle of the Y chromosome also has implications for its evolution

98% similarity between human and chimp genomre

Crosses are the concert palindromes. The Y is constantly renewing itself

NO diaghot→ Very divergent

98% inversion identical

Human Genetic

Variation

What is the extent of individual genetic variation?

• What are the types of individual genetic variation? • Single Nucleotide Polymorphisms (SNPs): a single nucleotide variant

observed to vary among unrelated individual at an “appreciable” frequency.

• Structural variants (SVs): variation that arises via deletion, insertion, or rearrangement of the DNA. Types of SVs include indels, inversions, translocations. Size of SVs varies from >1bp up to >100s kb.

• What is the distribution of these variants?

• What do these variants tell us about human traits, disease risk factors, human demography?

SNPs are the most common type of genetic difference

Image courtesy of Biosciences for Farming in Africa

• DNA found in nucleus of all cells

• Human genome: 3.2 billion nucleotides (A,C,T,G)

genome 1 genome 2

A

G

C

A

C

G

T

C

Slide courtesy of Dr. Megan Dennis

Complex structural variation are less frequent but can affect large regions

“Reference” genomic region

Deletion

Duplication >1 kbp termed “segmental duplication”

Inversion

Slide courtesy of Dr. Megan Dennis

nonprocessed pseudogenes .

I SNP for every base pair .

BH 2 avg genome : aa.a % identical

which means there IS 1 SNP lKbp (1000bp)

genome 3 billion → SNP 3 Million .

The Hap Map project: cataloging human genetic diversity

Objective: to genotype at least one common SNP every 5 kilobases (kb) across the euchromatic portion of the genome Population: 270 individuals from four geographically diverse populations

• 30 mother–father–adult child trios from the Yoruba in Ibadan, Nigeria • 30 trios of northern and western European ancestry living in Utah • 45 unrelated Han Chinese individuals in Beijing, China • 45 unrelated Japanese individuals in Tokyo, Japan

Phase III: 1,115 samples with expanded geographical coverage (incl. populations from Italy, Kenya (Masai), Mexico, India (Gujurati) and African- American descent) – focus on 1.6 million SNPs.

1.1 million SNPs 3.1 million SNPs

2005 2007

Dramatic reduction of sequencing costs post HGP powers genomics for all revolution

Cost per genome

year

$3 billion $1,000

genome

“Next-gen” sequencing

DNA is fragmented to small pieces

Fragment nucleotides are

sequenced producing 150 bp

“reads”

Introduction

Genome

Illumina “Next Gen” sequencing

Slide courtesy of Dr. Fereydoun Hormozdiari

Newest sequencing technologies can generate up to 10 billion reads per run (or over 100 human genomes) in only a few days

We now have sequenced over a million human genomes…

http://biologiaevolutiva.org/tmarques/

...and hundreds of ape genomes...

Slide courtesy of Dr. Megan Dennis

Variants are identified by mapping reads to a “reference” genome

Identify single nucleotide variants

C T- G AG T CA Reference genome

Map reads

A

A

A A A

A

A A A A

A

G G G G G

G

--

--

TT

TT

TT

C

C C

C

TT A GC A --A

Coverage: how many times was a base sequenced?

Can also identify SVs with these read mappings

sequencing reads

"

deletion "

V

¥¥¥:{aemieeoanaen. -More coverage the better.

Inversion& duplication a Iot more difficult to identify but still can bedone

> 30x will definefly be able to identify SNP w/ high confidence . reads

The 1000 Genomes Project

Goal I

Figure 1: • Most known SNPs common to all three populations • Most novel SNPs unique to specific populations • Novel SNPs mostly correspond to low frequency or rare variants • African populations show 2-3 times more diversity

SNP analysis

Learn 1. All populations share a certain#of SNP - likely old events of genetic variation - shared by all 2. Reflects population history (unique to each population)

( l 't of gene)

- COMMON for all 3

European African POP pop Chinese

+

Japanese

NewSNPs - unique to specific pop

some shared some unique .

Figure 1: Abundance of structural variants. • Only 50% of short insertions/deletions and long structural variants are novel • Vast majority of mid-size variants (0.1-10 kb) are new (note Alu and LINE)

SNPs are only a small part of the genetic diversity pie SNPs are one type of variations

E § S

EE K

- Bigger the events decrease MDNA I increase

lowerthe frequency - The larger the structural variant the more likely it is to interrupt critical genes

While not as frequent as SNPs, SVs have strong impact

Campbell and Eichler, October 2013

SNV: Single Nucleotide Variant Indel: short insertion / deletion MEI: Mobile Element Insertion (Alu / LINE) CNV: Copy Number Variant

Structural variants

Not just changing 1- base pair . Butchanging MANY bp.

Makes a big qq.es identical difference In

10x YOU take Into

E genetic diversity

less account e l "

everythingelse e p - E

F x E U

E 100011 s s

t

o less E c g E

§ I

Frequency of SNPs Highest gain loss of but effect is not strongest entire chromosomes

We are all mutants!

• Each individual shows on average: ➢ 10-11,000 non-synonymous SNPs in protein-coding genes ➢ 200 in-frame insertions/deletions ➢ 90 premature stops ➢ 40 splice site disruptive variants ➢ 200-250 deletions accompanied by frameshifts

• Collectively, each person carries approx 250-300 loss of function variants in known genes, including 50-100 in disease-causing genes

• 30-50 true de novo germline mutations (SNPs) in family trios • Mutation rate ~ 10-8 per base-pair per generation

• Alus hopping constitute most Mobile Element Insertion events with a rate of 2– 4.6x10-2 per genome per generation or approximately 1 in 20 births • LI insertions are rarer, occurring at 3–4x10-3 per genome per generation (1 per approximately 100–150 births)

(typically breaks the protein) ( by stop codons)

Diploid : Helps buffer

new mutations that occur per generation the mutations . &

extra copy

prevents the loss

of function . - l mutation1hundred million bp - NOdifferent from other species

Born w/ AIU that

had a new insertion

SNPs are mostly contributed from the paternal lineage!

➢ 76% of SNPs the parental origin of which could be determined were from the paternal germline

➢ Rate of de novo mutations is increasing with paternal age although magnitude of effect varies with studies

➢ Evidence that some mutations might favor the growth of spermatogonial stem cells.

Ex: FGFR3 mutations that give rise to achondroplasia are almost exclusively paternal and rise strongly with father’s age

➢ Large structural variants (CNV>150 kb) also appear to be biased towards paternal germline. Segurel et al., (2014)

New Novel

↳Slope→ positive but magnitude differs

positive Trend -

Form of ← dwarfism -exclusive to paternal

Germcells produced throughout life - so many copies of the genomes

are made

much more likely For mutations to arise .

2012 Update from 1000 Genomes Project

accomplished goal to reach 1000 genomes .

won't impact Important gene most likely→lower effect on the fitness of embryo .

Where ? brain function,neurocognition77 > mitochondrial functiongenes .

2015 Update from 1000 Genomes Project

SVA: transposon NUMT: Nuclear Mitochondrial DNA

Big picture -SNPS dominate - tnostarepriviatetospecicfk Populations .

- 3x more bp being changed by structural variants compared to SNPs

Why else do we care about mapping SNPs?

❑ SNPs are modern day genetic markers along the genome

• The segregation of these markers can be easily followed in relation to any measurable trait (for instance disease, or any other property you might be interested in). This enables gene mapping.

❑ SNPs inform us about the structure of the human genome and about recombination rates

(Well come back to this later…)

Good

marker 1.Dense

Ztehdto bespread out evenly can identify on genome ancestry . fairly randomly - Andspecifically 3. can Identify actually momgeneordad genes Measukcaniearp structure of recombination In the them . human genomepretty cheaply .

Implications of genetic variation for human population history

Bustamante et al., Nature (July 2011)

Cataloguing human genetic diversity is a top priority

9 years ago

Implications of SNP mapping #1: African populations carry the majority of genetic diversity

First African genomes Nature February 2010

> African pop → reflect ancestral population colder& diverse) -all other pop evolved from the African ( others only carry a few of the diversity

Main findings from Bantu and Khoisan genomes

Difference in muscle fibers and oxygen consumptio n between BC and DA groups

Hunter gatherers use poison arrows, can survive long time without food endurance runners... In other regions they are sprinters and have muscle fibers that are different from other populations

Bushman or more different within populations compared to European and Asian person

Kalahari desert region BC hunter gatherer DA-Batu

Muscle physiology diff according diff hunting methods .

ore diff than those

Europe vs Asia

ant diversityT

1000 Genomes Project - 2015 genetic diversity update

Low variation—Eurasian population, out of Africa population

Americans, Islanders... have high levels of genetic diversity islands are descendants of slaves trade populations, this is why diversity matches the diversity of Africa. There is a lot of mixture

<

slaveTrade

African Native American

European

European & Asian .

Scale indicates increased levels of genetic diversity in populations

Genetic diversity is greatest in southern Africa, maybe populations originated from this region, not great Lake region. The southern African population are the ones who left Africa. There was a lot of migration within Africa.

Great Lake

thought to be origin Ofhuman

migrated south

✓ a

Breaking News update: September 2019

→ -

↳bit 2 genes

⇐Mugsy ! Asian

“Out of Africa” model for the spreading of Homo sapiens

Implications of SNP mapping #2: SNPs provide us information about the history of human populations

Update - Nature Genetics Sept 2011

Southern Africa populations diverged from other populations around 108-157 thousand years ago

Eurasians diverged from ancestral African populations 38-64 thousand years ago

Effective population size of the ancestors of all modern humans was ~9,000

Bottleneck

Bottleneck

Most SNPs In our gnomes, have no effect. Look for oldest SNPs and can use this data to a population

East Asians populating North America

Africans have more genetic diversity because they have not experienced the bottle neck effect that the rest of the world has.

In Africa this is the number to generate all the diversity we have now

+ Only a few left Africa most diversity remained . qfftheite.ca |

.

- glacier age

2501000 Islands later cuz need boats.

Mohe went

south so

diversity maintained as

High

Nature January (2013)

• Correlates with the explosion of human populations and with invention of agriculture

• Different sub-populations carry different mutational loads

Implications of SNP mapping #3: Most deleterious SNPs are of recent origin

3/4 of protein coding SNVs and 86% of all SNVs that are deleterious happened within 5000 to 10,000 years.

5000 to 10,000 years ago, there was an explosion of written language tools agriculture. Many variance arose from increase in population during this time.

Age of deleterious SNPs among molecular pathways to first as well. Like the crêpe cycle has lower mutation than less important pathways since they are less lethal

Europeans have more deleterious variants of essential and medallion disease jeans compare to African Americans due to weak purify election due to out of Africa disposal

exome = protein coding

- cities '

-

language -

technology (farming) - population explosion

Archaic human genomes!

Dr. Svante Paabo Max Plank (Germany)

Dr. Richard Green UCSC

Science May 2010

From 1-4% of the genome of modern eurasians is derived from Neanderthal sequences

Ancestors of Eurasian were coming North from Africa & met w/ ( interbreed) w/ Neandertal . Then populations diverged(Europe and Asia) These ppl took the product of encounter. Which was DNA blocks from Neandertal populations. that’s why they are In Eurasian populations but not in African populations. Because this happened during the evolution and intermixing of populations.

These bones worst extracted to give archaic human genomes. Had to be reported and pieced together back together from bacterial chinos that were residing on the bones

intermixing of human population (neanderthal vsHomo sapiens.) there was breeding between the two populations. Eurasians carry segments of neanderthal genomes

Hybridization of population. -301. Of Neanderthal

genome exists

collectively

T

Nature December 2010

The Denisovans: a new group of (sequenced) archaic humans

Neanderthals

Well preserved bones in the scheme. Found that there was another human species (Denisova) closely related to Neanderthals. This population was not involved and chin flow from Neanderthals into your rations. Contributed 46% of genomes into present the Melanesians (Indonesia in south Asia)

'' "nuthouses".

More closely related to

Neanderthals

“Out of Africa” model for the spreading of Homo sapiens

New additions to the history of human populations

Bottleneck Neanderthals

Denisovans

Meeting between Neanderthals and Denisovans populations migrated down to Australia in the islands. Neanderthals did not mix with the Denisovans for populations in Europe/ Eurasia.. African descendants do not carry the Neanderthal genomics sequences. Neanderthals Denisovansand Homo sapiens all migrated out of Africa

Recent Milestones in human evolutionary genomics From Nielsen et al. (2017)

Two REALLY good books on the topic

Africa’s first ancient genome Science – Oct 8, 2015

Gene backflow Into Africa ~3,000 yrs ago

Mota died before the early Neolithic farmer population came back (black flow). — helped clarify the historic events that are not present in the historical documents.

Climate and the rest of the world is colder so DNA was better preserved. Africa is so hot but DNA is mostly integrated.

Artifacts were about 4500 years old. Sequences look like Gracian DNA.

Hypothesis: people from urination my grade it back to Africa. However mota died before backflow into Africa so he did not have the DNA

Most African genomes have Eurasian ancestry because of the Backflow from your EurAsian populations. Around 67% of your Eurasian ancestry. Back flow was from Sardinia/Tuscan regions back into Africa

Mota was the first genome sequence from Africa that had a purely African genome with no Eurasian sequences Mota lived before the backflow to Africa occurred

=

Breaking News: Science – September 2017

Sequence individuals from South Africa. Allow researchers to measure divergence from other populations

Demographic model of African history and estimated divergences. From Schlebusch et al. Science 2017

Estimated divergence for

modern humans

Gene flow back from European populations

Migration of West African populations back to Southern Africa

Modern humans skull approximately 250,000 years ago

All groups of people with in Africa that have split from original ancestor

Eurasian & East African people leaving Africa

Long gray line represents backflow into Africa. Green line represents migration back into South Africa

Out of Africa migration

thickness of the branch show relative

genome size

Nature (Jan 2017)

Genetics suggest ghost population which are pops that must have existed according to genetics. But we have no historical record or archaeological record

unknown Hominin- phone segments of DNA that cannot be Denisovans

Hominin- diverged into Neanderthals and Denisovans

Neanderthals and Denisovans died but crossed with humans before they died

hybridization w/ Neanderthal

event of hybridization bit Denisovah

How about intra-individual genetic variation? Classic examples of cells within our bodies that have different genomes:

- Immune system - Germ cells

Does DNA within an individual vary at all?

Immune cell - B and T cells breaking and reassembling DNA to create new antibodies Germ cells – meiosis leads to crossover which introduces variation

analysis of CNV in diverse tissues. cancers are notorious for CNVs

Looked at tissue samples between individuals and looked at CNVs. 79% of genomic change events affect genes.

look at various tissues

chromosomes are on the outside. The concentric circles inside display the positions of a variable region. Focused on copy number variance. ex) duplications or deletions.

could recognize copy # variation within the body of individuals

10WIn variation .

I 11 tissues in 6 Individuals

New mutations can lead to mosaic individuals

(all women are mosaic for which X chromosome is expressed)

Mosaic vs Chimera

Mosaic: one genome which has genetic change Women are mosaic for which X is expressed in different tissues

Chimer: 2 fertilization events and cells fuse or exchange. (Not a result of mutation)

These somatic mutations WILL NOT pass onto offsprings.

-embryos

Reminder: LINE-1 retrotransposition can create somatic mosaicism

LINE-1 retrotransposition was typically thought to occur in parental germline.

New data suggests that somatic tissues are targeted instead.

L1 and retrotransposons can jump within cells to create somatic moscaicism. Transposon hoipping sites are colored regions

L1 are good at hopping within the brain. Neurons can store things in different places and remember things

Can be affected by epigenetic and hormonal effectrs on chromatin structure

Can have an implant on health and adaptation of individuals

L1 transposition is linked to epigenetic. Storage of info and experiences is. Brian chemistry can be changes through epigenetics.

Cells that are more sick vs cells that are more strong and healthy...

Mosaicism is much more common than previously thought, esp. in women

Lupski, Science, July 2013

• 56% of women who had sons have a Y chromosome in breast tissues • 63% of women who had sons have Y chromosomes in their brains • 74% of individuals who received bone marrow transplantation have

mixed genomes

Red cell is lethal, cell will be eliminated (Catastrophic mistake)

Why is there small chunks of Y chromosome in breast tissue? DNA from fetus can leak from placenta into bloodstream of the mother. Ends up transformingmom’s Cells with foreign DNA. (Obvious to detect Y chromosome in females) Ask the embryo develops during pregnancy is sheds a lot of cells, The souls die and release the content into the embryonic sack and that is exchanged into the placenta and makes its way into the bloodstream. ( mom’s bloodstream is fairly high with free cell DNA from the fetus)

Up to 13% of cell- free DNA floating in mother’s plasma is of fetal origin!!

Science Translational Medicine

Snyder et al., Prenatal Diagnostic 2013

Maternal plasma contains fetus DNA. Is able to sequence entire genome of fetus. How to tell difference from mom DNA and fetus DNA? —have both maternal and paternal DNA taken from tissues and cell free DNA from fetus and compare.

Companies are cornering this market for the detection of fetal trisomies

Can diagnose trisomy 21, sequence DNA leaked into mothers bloodstream. Count the number of copies of chromosome 21. If you see it 3 times for every 2 autosome, there is trisomy 21.

Diagnosing and Anaploidy

Newest development: using cfDNA to diagnose cancer!

(2016)

Sampling chromatin in mothers blood can pinpoint abnormal DNA rearrangements and which type of tumor it came from

nucleosomes wrapped with histones Every cell type has a unique nucleoside deposition that reflects genetic expression.