The $1,000 Genome Project and New DNA Sequencing Technologies
An NIH View
Jeffery A. Schloss
National Human Genome Research Institute
National Institutes of Health
About the Lecture
The first human genome sequenced was completed in 2003. Rather than completing a picture of human genetics, it spurred a desire to understand sequence variations in the human genome and their effects on health and disease. Although the cost of DNA sequencing was reduced more than 1,000 fold during the 13 year course of the human genome project, it was still much too high to sequence the thousands of human genomes necessary to begin understanding complex relationships between DNA sequence variation and health and disease. The high cost of the existing methods led some innovators – in government, academia and industry – to begin thinking about and developing new and much more powerful sequencing technology. To spur the development of these methods, NIH through NHGRI in 2004 launched an innovative program with the goals of reducing sequencing costs 100 fold in five years and ultimately 10,000 fold. This would allow sequencing of complete human genomes for less than $1,000, make it possible to study rare variations and put within reach the use of whole genome sequencing as a practical tool of individualized medical care. Five years later, as a result of intensive research and development stimulated by NIH’s modest investment in this program and substantial investments by several companies, the initial goal of driving costs below $100,000 was achieved. Sequencing a human genome, which in 2003 required 100 machines operating continuously for three months, could be done in 2008 on one machine in a month. And the community of academia, government and private institutions that achieved the initial goal is now on a path to bringing the cost down to less than $1,000 per genome – the ultimate goal of the program. Already, the cost of sequencing a human genome commercially is well below $10,000, and a new generation of sequencing technologies is entering the market that will further reduce costs. This presentation will summarize the technologies that are used for high-throughput sequencing today, the staggering amounts of DNA sequence information they are providing, the novel biological insights scientists are obtaining from the data and the real clinical impact that DNA sequencing is having even now. Emerging and horizon technologies will be discussed that promise to provide DNA sequence information with the quality, rapidity, and cost required for optimal applications in research and medicine. And the presentation will highlight the key roles of all of the sectors in bringing these dramatic improvements about, particularly focusing on the role the NIH program plays in the functioning ecosystem of research, development and commerce that puts products in the marketplace and changes how we think about human health.
About the Speaker
JEFFERY A. SCHLOSS is Program Director at the National Human Genome Research Institute, National Institutes of Health. He earned a B.A. Biology, cum laude at Case Western Reserve University and a Ph.D. at Carnegie Mellon University. He did postdoctoral work at Yale University and was Assistant Professor at the University of Kentucky. He currently manages the grants program in DNA technology development at NIH, including the “$1,000 genome” program. He also coordinates the Centers of Excellence in Genomic Science; leads the technology development component of the NIH Common Fund Human Microbiome Project; and serves as co-chair of the NIH Nanomedicine Common Fund Initiative. He was previously NIH representative to the National Nanotechnology Initiative; served as chair of the NIH Bioengineering Consortium and co-chair of the Trans-NIH Nano Task Force. He was a finalist for the Service to America Medal in Science and Technology (2009), 6 NIH Director’s Awards; and 10 NHGRI merit awards.
President Robin Taylor called the 2,288th meeting to order at 8:21 pm September 23, 2011 in the Powell Auditorium of the Cosmos Club. Ms. Taylor announced the order of business and introduced six new members of the Society.
The minutes of the 2,287th meeting were read and approved.
Ms. Taylor then introduced the speaker of the evening, Mr. Jeffery A. Schloss of the National Institutes of Health. Mr. Schloss, who is Program Director at the National Human Genome Research Institute (NHGRI), spoke on "The $1,000 Genome Project and New DNA Sequencing Technologies."
Mr. Schloss began by describing the strategic research transition that occurred as the sequencing of the human genome neared completion in 2003. The Human Genome Project was a technologically and conceptually bold plan using multiple donors to create a mosaic sequence, but the ability to sequence full individual human genomes and also the genomes of many other organisms was not easily in reach. By sequencing individuals, especially those with diseases or genetic variations, we could better understand the possible genetic contributions to their conditions. A full sequence for many other species would enable comparative genomics to provide us further insight into the structure and function of the human genome.
Mr. Schloss explained that early DNA sequencing technologies required that a genome, about three billion base pairs for humans, be shattered into many smaller fragments of DNA before processing. Each fragment was cloned in bacteria to make many identical copies, then isolated and biochemically processed to create a set of nested fragments of varying lengths. These fragments were separated by size using electrophoresis and the fragments that ended with each of the four nucleotide bases were labeled separately. This allowed you to "read off" the DNA sequence manually, he said.
Starting in the 1980s, fragments were labeled with fluorescent dyes that glowed under a scanning laser, allowing an optical sensor and attached computer to read off the sequence even faster. This enabled a robotic workflow where every fragment would be cloned in bacteria, moved to reaction plates where robots would set up the biochemical reactions, and then automatically loaded into a sequencing machine. This factory-inspired cycle was used for much of the Human Genome Project, with academic labs processing ten to twelve of these cycles per sequencing machine, per day.
Mr. Schloss emphasized that the cost per finished base of DNA was carefully tracked throughout the project, with a total cost of approximately $650 million to sequence the entire human genome. Due to technology and automation development during the project, if the genome were resequenced immediately upon completion in 2003 it would only cost $50 million, a reduction of the cost per base by a factor of one hundred over ten years.
The NHGRI published a strategic plan in 2003 that laid out the anticipated relation of genomics to biology, health, and society. For related technology developments, they listed "quantum leaps" that "seem so far off as to be fictional but . . . which would revolutionize biomedical research and clinical practice," which included sequencing a mammalian genome at high quality for less than $1000 by 2014. This further cost reduction factor of ten thousand would be enabled by funding many new technology approaches, both conservative and speculative.
Mr. Schloss explained that an important early innovation replaced the robotic processing of fragments in many individual test tubes with a new system that captured the same information from hundreds of millions of DNA molecules in a single test tube. By modifying the DNA fragment ends with known sequences and capturing those ends on beads in a water and oil solution, each droplet became its own reaction tube. To sequence by synthesis, DNA polymerase is allowed to incorporate one correct fluorescent nucleotide modified with a blocking group. After this fluorescence is recorded for each droplet, the fluorescent dye and blocking group are removed and the cycle repeats, achieving very high throughput.
A series of commercial sequencing machines were produced beginning in 2005, many of which received support from NHGRI. This program support explicitly encourages both collaboration and competition. Specifically, sequencing throughput has steadily increased to the point that many labs cannot handle the amount of data produced, leading to new computer storage and processing capacity bottlenecks. To resequence one human genome in 2003 would take one hundred machines approximately three months but today it would take only one machine about one week for $10,000.
Mr. Schloss stated that sequencing technology is both pushed to develop but also rapidly pulled into new applications. The ability to digitally sample very large numbers of molecules allows new biological insights, such as examining the variations between individuals, human DNA variation based on geographic origin, reading methylation status directly from the genome, finding rare RNA that would be difficult to observe with analog methods, studying the human microbiome, and comparing tumor/normal pairs from individuals with cancer.
Mr. Schloss reported on several new sequencing technologies in development, such as a flow cell chip containing millions of pH meters that measures the release of Hydrogen ions as each base is added during sequencing by synthesis. This method doesn't yet provide competitive throughput or read lengths, but the instrument costs ten times less than leading existing machines. Another approach involves watching individual DNA polymerase molecules in real time as they synthesize a new DNA molecule. A zero mode waveguide and light source are used to create an energy field that only excites the fluorescent nucleotide at the polymerase molecule's active site. This method may be able to determine many kinds of methylation, making it a very powerful tool.
The technology that may have the most dramatic impact on cost is nanopore sequencing. A small protein channel is made in a membrane and a single DNA molecule is threaded through it. The passing nucleotide's disruption of the channel's ion transport allows measurements that can distinguish bases. This would allow fast, digital, non-destructive, direct sequencing without conversion or amplification, as well as very long read lengths. Current implementations have distinguished between nucleotides both in solution and in a DNA strand, observed a specific methylation, and controlled the speed of the strand to maximize signal to noise ratio, all the critical functionality needed for sequencing. Mr. Schloss hopes to see a laboratory demonstration of nanopore sequencing within the next year and anticipates that a one thousand nanopore array should be able to sequence a human genome in less than a day, potentially using a portable, handheld device.
Mr. Schloss emphasized that the NHGRI is committed to supporting technological development through research investment decisions, internal peer review, and an advisory structure that are all supportive of funding more speculative projects than other large organizations might. This chain of discovery, development, and investment paired with a growing research market has led to many publications on applying whole genome sequencing to clinical situations such as understanding the molecular flaw in disease of a newborn and selecting a personally optimized cancer therapy. The institute's new strategic plan is focused on improving healthcare effectiveness by moving technology rapidly towards the clinic. He explained that a milestone of sequencing a patient genome in thirty minutes for $50 at a quality sufficient for individual medical decisions could enable exciting new applications such as pharmacogenomics, checking therapeutic drugs against enzymes coded in your genome for potentially ineffective or adverse reactions.
With that, he closed his talk and Ms. Taylor invited questions.
In response to a question regarding the potential dark side of new sequencing capabilities, Mr. Schloss agreed that, like any new technology, concerns accompany the potential benefits. Particularly, he noted that genetic discrimination will be a significant issue and stated the need for updated anti-discrimination legislation.
Someone wondered how noncoding DNA is handled during sequencing. Mr. Schloss explained that some techniques only sequence the coding regions because they're much easier to interpret. New sequencing tools allow closer examination outside coding regions, which may provide insight into complex genetic diseases.
One person was curious about gene therapy. Mr. Schloss clarified that gene therapy can be used to correct structural defects in a protein rather than simply correcting its regulation, but noted there are still many challenges to performing gene therapy.
A final question concerned the comparative quality and correlation of the different sequencing machines. Mr. Schloss offered the analogy that sequencing technology genome coverage is a Venn diagram with a large degree of shared, trustworthy agreement. Areas of less rigorous agreement can still be validated with specific, laborious biochemical tests.
After the question and answer period, Ms. Taylor thanked the speaker, made the usual housekeeping announcements, and invited guests to apply for membership. At 9:57 pm, she adjourned the 2,288th meeting to the social hour.
The weather: Cloudy
The temperature: 21°C