Week 3 Assignment 1.5-3 pages
A s next-generation technology ratchets the price of sequencing lower and lower, users from aca- demic labs to Big Pharma are fi nding themselves drowning in data. What used to be gigabytes worth of information has become terabytes or petabytes. At the same time, the cost crunch brought on by the global recession has made researchers leery of unnecessary capital spending.
The result is more and more users moving their data management to the cloud or outsourcing it entirely.
Whereas large pharma companies may have the funds and infrastructure to maintain dedicated servers for storage and analysis of sequencing data, small companies—especially those that don’t sequence continuously—are leading the migration to the cloud, and service providers are springing up to meet the demand. Cost, secu- rity, and convenience top the list of concerns for researchers looking for a place to unload reams of data. However, once that transition is made, features like collaborative data sharing, access to third-party analysis apps, and patient privacy become more important.
Expression Analysis, a Quintiles Co., Durham, N.C., provides genomic services to the pharma and biotech industry, as well as academic, government, and foundation laboratories doing research in molecular biology and genetics. It provides cloud computing services through a partnership with Golden Helix Inc., Bozeman, Mont. Its clients require computation-intensive services for gener- ating the initial RNA or DNA sequence and also for cleaning up, aligning, and analyzing the sequence.
According to Expression Analysis, a typical sequencing project for 100 RNA samples would generate 300 to 400 GB worth of com-
pressed data, or 700 GB to 1 TB worth of data in total; and that’s just for one experiment. For multiple experiments, the amount of data can add up to astronomical quantities quite quickly.
Some applications, such as analyzing cancer samples, are even more data intensive, because of the depth of coverage and the need to sample multiple cells in the tumor.
“The cloud offers a full environment in order to do analysis on a large number of samples simultaneously,” said Wendell Jones, PhD, vice president of statistics and bioinformatics for Expression Analysis.
That computing power becomes a commodity for the customer, replacing expensive, on-site, server infrastructure. The data is instead accessed through a browser, and there is no need to upload or download huge fi les. “You can leave them on the cloud and access in a streaming fashion via the cloud,” Jones said.
For small companies, the cloud-based service offers additional advantages beyond saving on hardware and real estate. Startup companies may not have the structure in place to operate Linux- based genome software applications. A cloud-based storage and analysis service allows those companies to use their own local Windows or Macintosh desktop operating systems.
There are some advantages to maintaining a physical server. “You have the option of having lower redundancy ... and faster data access times. You can choose to take your old data and unplug it. You don’t have to pay for power,” explained Jonathan Bingham, product manager for informatics and software for Menlo Park, Calif.-based Pacifi c Biosciences, a provider of genom- ics services through its SMRT platform technology and hosted cloud-based storage and analysis service.
10 � September/October 2012 www.dddmag.com
Exploding sequencing data volumes push researchers to the cloud and into partnerships.
� COVER STORY
� Catherine Shaffer, Contributing Editor
Managing Data in the Cloud Age
dd29_10_COS.indd 10dd29_10_COS.indd 10 10/1/2012 11:41:49 AM10/1/2012 11:41:49 AM
On the other hand, that means taking responsibility for managing the hardware, Bingham added, such as replacing failed drives. That burden of ownership and main- tenance is not right for every company.
Jones explained that cloud comput- ing is ideal for research groups that have “bursty” computing needs, meaning that generating and analyzing sequence data is an intermittent need.
“The cloud in some sense is cheap, in the sense that it’s cheaper to rent a vacation home than buy it and only use it two or three weeks a year. If you’re constantly at your vacation home, it’s just better to buy it.”
Cost is a major concern at Illumina (San Diego, Calif.) as well. A giant in the sequenc- ing industry, Illumina controls 70% of the market share for sequencing. Illumina can sequence an entire human genome in a day, and it offers its cloud-computing solution, BaseSpace, through Amazon Web Services (Seattle), the world’s largest cloud hosting service. Recently, Amazon announced a ser- vice providing reliable data storage starting at $0.01 per gigabyte per month.
Although that is a very economical rate for data storage by any standard, for long- term storage of hundreds or thousands of complete genomes, many experts agree it is better to store the data in the original tissue. In other words, if the raw data is needed again in the future, it is cheaper to regener- ate the sequence from an archived sample.
Illumina offers an even better deal to its customers. “We’ve picked the ultimate pricing strategy which is free,” said Alex Dickinson, senior vice president of cloud
DRUG Discovery & Development September/October 2012 � 11
genomics for the company. “Customers get a free terabyte of data storage, enough for 10 years of typical usage of MiSeq. We do the secondary processing, alignment, and variant calling. We also do that for free,” Dickinson said.
MiSeq is Illumina’s “personal sequencer,” a next-generation sequenc- ing system suitable for applications such as multiplexed PCR amplicon sequenc- ing, targeted resequencing, small RNA sequencing, and so forth.
Illumina’s choice to offer free service is based on concerns of researchers, who may be comparing the company’s offerings to use of infrastructure in their facil- ity. Although in an absolute sense, that infrastructure is never “free,” because of the cost of housing it in the facility, its use often doesn’t come out of an individual laboratory budget. “If you try to charge for basic service, they try to compare that to free,” Dickinson said.
Instead of charging customers directly, Illumina instead channels revenue through third-party service providers, who will be offering genomic analysis apps within the sequencing environment. The application interface (API) for BaseSpace will be open to partner companies to offer applications that will be available in an app store. An initial block of 14 companies are already signed up to offer those apps.
Although Amazon cloud services provide an ideal solution for research, the rapidly emerging market for clinical sequencing comes with tougher regulatory requirements, chief among them compli- ance with the Health Insurance Portability and Accountability Act of 1996 (HIPAA).
Amazon cloud services are not cur-
rently HIPAA compliant, and according to Richard Resnick, CEO of GenomeQuest Inc. (Westborough, Mass.), it is very unlikely to become compliant any time soon. Resnick said that the cloud is comprised of three components: application, platform, and hardware. Achieving HIPAA compliance requires control of all three of those com- ponents. A service that is designed around coordination of many third-party providers such as Amazon would have a hard time ever validating full compliance for the entirety of its applications, platform, and hardware.
“What we’re doing is thinking about how to connect different parts of the health care ecosystem through next-gener- ation sequencing and cloud-based genom- ics,” said Resnick.
GenomeQuest offers a secure HIPAA- compliant cloud designed for large scale analysis of whole genomes and gene panel samples from clinical laboratories.
Resnick said that unlike research labs, clinical laboratories can’t tolerate problems like noise and false positives in their data. “You can’t do that because there’s a real patient at the end of the day.”
So in addition to security and data pri- vacy standards, cloud services for clinical sequencing applications have a higher bar to achieve for quality.
“There are still many uncertainties around the regulatory requirements for using cloud and hosted IT services in genomic medicine trials, so it was impor- tant for us to work with a company that really understands the healthcare IT space,” said Spyro Mousses, PhD, director of the Center for BioIntelligence at The Translational Genomics Research Institute (TGen) in Phoenix, Ariz.
BaseSpace enables users to perform interactive genetic analysis from any location using a web browser.
Sequence data analysis results depicted in the DNANexus genome browser.
dd29_10_COS.indd 11dd29_10_COS.indd 11 9/28/2012 3:00:59 PM9/28/2012 3:00:59 PM
www.dddmag.com
� COVER STORY
In November 2011, TGen partnered with Dell to support the world’s fi rst personalized medicine trial for pediatric cancer, and to leverage cloud comput- ing resources donated by Dell. The Dell Giving commitment includes multi-year grant funding to support the clinical trial, as well as major hardware, software, and services contributions.
Focusing initially on neuroblastoma,
the trials will leverage high-performance computing to dramatically accelerate the processing of sequencing information from patient tumors to predicting the optimal treatment for each patient. As would be required of any trial under U.S. Food and Drug Administration (FDA) regulations, the cloud solution will be compatible with both FDA and HIPAA compliance requirements.
The KIDS Cloud, as TGen terms it,
“will provide a hybrid-cloud platform for securely storing and exchanging genomic data and clinical information across mul- tiple collaborating organizations,” accord- ing to Mousses.
TGen is also participating in several other large personalized medicine trials and hopes that the kind of cloud-enabled computational infrastructure can serve as a national model for collaborative person- alized medicine. “It takes a village to cure a kid with cancer,” Mousses said.
With the advent of next-generation sequencing technology, the emphasis has shifted from bringing the cost of sequenc- ing down to addressing the cost of analy- sis. “The bottleneck now is being able to effectively analyze the data,” said Marc Olsen, president and COO of DNANexus (Mountain View, Calif.), a provider of cloud-based data management and analy- sis. Those challenges include not only the cost of storage and management of quanti- ties of data that could fi ll thousands and thousands of PCs, but questions of how to transfer data, and how to share and collaborate while still maintaining security and privacy. The industry is currently seek- ing answers to those emerging problems, and in some cases already moving towards some degree of standardization. �
Catherine Shaffer is a freelance science writer specializing in biotechnology and related disciplines with a background in laboratory research in the pharmaceutical industry.
Expression Analysis’ cloud computing pipeline, powered by Golden Helix, can access virtually infi nite storage capacity on the fl y and distribute large jobs across hundreds of servers in parallel. A dashboard showing computing progress for a 15 sample RNA-Seq project is shown above.
ANTIBODY PROBLEMS? Have difficult targets to develop effective antibodies? What if an antibody doesn't exist for your target/antigen? Aptagen develops and manufactures aptamers which are ligands of RNA/DNA and peptide oligos that bind to a variety of target antigens. Aptamers are sometimes referred to as “chemical antibodies or DNA antibodies.”
…AND MUCH MORE ONLINE For Example: Aptamers have been generated that exhibit greater than 10,000-fold binding affinity for theophylline over caffeine, which differ from one another in structure by only a single methyl group.
dd29_10_COS.indd 12dd29_10_COS.indd 12 9/28/2012 3:01:50 PM9/28/2012 3:01:50 PM
Copyright of Drug Discovery & Development is the property of Advantage Business Media and its content may
not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written
permission. However, users may print, download, or email articles for individual use.