discusion 9
Chapter 18: Bioinformatics
Robert E. Hoyt MD
William R. Hersh
Indra Neil Sarkar MD
Learning Objectives
After viewing the presentation, viewers should be able to:
Define bioinformatics, translational bioinformatics and other bioinformatics-related terms
State the importance of bioinformatics in future medical treatments and prevention
Describe the Human Genome Project and its important implications
List major private and governmental bioinformatics initiatives
List several bioinformatics projects that involve EHRs
Describe the application of bioinformatics in genetic profiling of individuals and large populations
Definitions
Bioinformatics = Computational Biology or the field of science in which biology, computer science and information technology merge to form a single discipline
Bioinformatics makes use of fundamental aspects of computer science (such as databases and artificial intelligence) to develop algorithms for facilitating the development and testing of biological hypotheses
Finding genes of various organisms
Predicting structure or function of newly developed proteins
Developing protein models and examining evolutionary relationships
Transformational Bioinformatics: Simply put, is the specialization of bioinformatics for human health
Genomics is the field that analyzes genetic material from a species
Proteomics is the study at the level of proteins (e.g., through gene expression)
Pharmacogenomics is the study of genetic material in relationship with drug targets
Metabolomics is the study of genes, proteins or metabolites
Definitions
Biologists
collect molecular data:
DNA & Protein sequences,
gene expression, etc.
Computer scientists
(+Mathematicians, Statisticians, etc.)
Develop tools, softwares, algorithms to store and analyze the data.
Bioinformaticians
Study biological questions by analyzing molecular data
The field of science in which biology, computer science and information technology merge into a single discipline
5
Translational Bioinformatics
Metagenomics is the analysis of genetic material derived from complete microbial communities harvested from natural environments
A phenotype is the observable characteristic, structure, function and behavior of a living organism. Size and hair color could be examples. Phenotype is strongly guided by the genotype. The Phenome refers to total phenotypic traits
Genotype is based on the raw genetic information that is associated with a phenotype or regulation of biological function. The genome is the total of genotypic traits
Definitions
The human body has about 100 trillion cells and each one contains a complete set of genetic information (chromosomes) in the nucleus; exceptions are eggs, sperm and red blood cells
Humans have a pair of 23 chromosomes in each cell that includes an X and Y chromosome for males and two Xs for females
Offspring inherit one pair from each parent
Chromosomes are listed approximately by size with chromosome 1 being the largest and chromosome 23 the smallest
Genomic Primer
Chromosomes consist of double twisted helices of deoxyribonucleic acid (DNA)
DNA is composed of four sugar-based building blocks (“nucleotides”: adenine [A], thymine [T], cytosine [C], and guanine [G]) that are generally found in pairs (“Watson-Crick” pairing: A-T, C-G)
An organism’s DNA encodes its full complement of proteins essential for cellular function
Genes are regions on chromosomes that encode instructions, which may result in proteins that then enable biological functions
Genomic Primer
The process of decoding genes involves transcribing the DNA into ribonucleic acid (RNA) and then translation into amino acids that form the building blocks for proteins
Collectively, the complete set of genes is referred to as a “genome” (combination of “gene” and “chromosome”)
It is estimated that humans have between 20,000 and 30,000 genes and genomes are about 99.9% the same between individuals
Variations in genomes between individuals are known as single nucleotide polymorphisms (SNPs) (pronounced “snips”)
Genomic Primer
Genome-wide associations studies (GWASs) are being conducted where two groups of participants are studied; those with a disease of interest, compared with those without the disease. The variations or SNPs discovered are said to be associated with the disease, but true cause and effect is often unclear
Similarly, phenome-wide association studies (PheWAS) are being carried out comparing genes to disease associations, most recently using the electronic health record for phenotypical information
Genomic Primer
Genes
Importance of Bioinformatics
Diagnosing hereditary diseases
Discovering future drugs targets
Developing personalized drugs based on genetic profiles (personalized medicine)
Developing gene therapies to treat diseases with a strong genomic component (e.g. cancer)
Discover:
New indications for an old drug (drug repurposing)
New targets for existing drugs (e.g., treatment of tongue cancer using RET inhibitors)
Drugs to work better in certain patient groups (gender, age, race, ethnicity, etc.) with possible genetic variants
What drugs to avoid due to higher incidence of side effects that are genetically modulated
Improve clinical decision support for electronic health records
Importance of Pharmacogenomics
The Human Genome Project
International collaborative project started in 1990 and finished in 2003
3 million SNPs discovered
Ethical, legal and social issues also discussed
Huge relational databases are necessary to store and retrieve this massive information
New technologies such as DNA arrays (gene chips) speed up analysis
Significant drop in cost along the way
Other Projects
National Human Genome Research Institute (NHGRI)
Encyclopedia of DNA Elements (ENCODE) Project
Human Microbiome Project (HMP)
Humans have more bacteria on and in their body than cells = microbiome
Project will determine whether individuals share a core human microbiome and try to understand whether changes in the human microbiome can be correlated with changes in human health
Studies are already suggesting the intestinal bacteria function like a new organ system
Other Projects
Human Variome Project
The PhenX Project
1000 Genomes Project
Pediatric Cancer Genome Project
National Center for Biotechnology Information (NCBI)
Hosts thousands of databases associated with biomedicine, Including MEDLINE and GenBank databases
The NCBI provides access to sequences from over 285,000 organisms
Others noted in the textbook
Personal Genomics
The goal is to have “tailor made” medications and treatments that target the individual and not a group having little in common with the patient
Also to offer bio-surveillance for future outbreaks of infectious diseases
All of Us Project will collect biological data to further precision or personalized medicine
Cost of Human Genome Determination Decreasing
Personal Genetics Testing
Available commercially without a doctor’s order:
Often less than $100
DNA Direct
AncestryDNA
23andMe
Myriad™ specializes in cancer-genetic links but they found they could not patent BRAC gene testing
Ethical Questions Related to Genetic Testing
Testing is not regulated, lacks external standards for accuracy, has not demonstrated economic viability or clinical benefits and has the potential to mislead customers, according to Varmus
Patients must be sure of accuracy before undergoing e.g. a prophylactic mastectomy
Patients will need genetic counseling as most physicians have not had this training
Genetic Information Nondiscrimination Act of 2008 protects patients against discrimination by employers and healthcare insurers based on genetic information
Genomic Information Integrated with Electronic Health Records
Genetic profiles will likely be part of many electronic health records in the future
Cost will become less of a factor but adding the genetic information will raise multiple other questions and data storage must be increased due to large data files
The Electronic Medical Records and Genomics (eMERGE) Network is a consortium of nine healthcare organizations with significant investments in both EHR and genomic analytics across the United States that have already started the process
In order for EHRs to incorporate genomic data:
They must store data in structured format
Data must be standards based
Phenotypic information must also be stored as structured data
Data must be available for use by rules engines
EHRs must be able to display information needed by the clinician based on phenotypic and genotypic data
SNOMED CT will be modified to incorporate genomic data
Data standards and clinical decision support will need to be enhanced
Genomic Information Integrated with Electronic Health Records
Digital family histories are now a reality with pervasive EHRs and meaningful use
It will likely be at least a decade before we can intelligently integrate genomic information into EHRs, so in the mean time we can expect some use of the family history to alert clinicians of genetic risk of e.g. cancer
The US government is interested in better family history integration and hence their creation of the web site My Family Health Portrait
Digital Family Histories
Translational bioinformatics will blend traditional bioinformatics with health informatics
We are experiencing huge advances in bioinformatics but we are still a ways off in terms of incorporating this information into the average medical practice
It is logical that eventually genomic information will be part of every EHR; in the meantime we will use family histories
Direct to consumer genomic testing is very interesting but not always evidence based
Conclusions