Microbial Genome Sequencing
A Window into Physiology and Evolution
Karen E. Nelson
Institute for Genomic Research, Carnegie Institution of Washington
About the Lecture
Whole genome sequencing has accelerated the rate at which new genes are being identified. Many of these genes have potential for the biotech and health care industries, as well as for addressing environmental issues. Complete genome sequences also allow a detailed understanding of the evolutionary history of an organism, the complete genome revealing more information than that obtained from single gene analyses. One significant example comes from the DNA sequence of Thermotoga maritima. T. maritima is thought to be one of the earliest branchings of the known bacteria, and as a thermophilic organism has potential for providing thermostable industrially relevant enzymes. Apart from interesting basic biological findings, this genome sequence presents significant evidence (based on protein sequence similarities and regions of atypical DNA composition) for extensive lateral gene transfer between Archaea and Bacteria. This finding is also supported by independent periodicity analysis of the genome sequence. Almost one–quarter of the T. maritima genome appears to have been acquired from the archaeal domain, highlighting the significance of natural exchange of genetic material in the environment, and also raising questions on the definition of organisms that have mosaic–like genome sequences. The nature of the last universal common ancestor continues to be debated.
About the Speaker
Karen Nelson received a Ph. D. in Microbiology from Cornell University. She is an Assistant Investigator at The Institute for Genomic Research in Rockville, Maryland, where she works in the area of Microbial Genomics. She recently completed the whole genome sequencing effort for the bacterium Thermotoga maritima and is presently involved in the whole genome sequencing and annotation of Pseudomonas putida, Neisseria meningitidis, and Vibrio cholerae.
President Spargo called the 2113rd meeting to order at 8:22 p.m. on, January 28, 2000. The Recording Secretary read the minutes of the 2111st meeting and they were approved.
The speaker for the 2113rd meeting was Karen Nelson, of The Institute for Genomic Research. The title of her presentation was “Microbial Genome Sequencing: A Window into Physiology and Evolution”.
Whole genome sequencing, which is now greatly accelerating the pace at which new genes are being identified, may provide our first detailed understanding of the evolutionary history of an organism. To date 24 microbial genomes have been reported, and about 50 other projects are underway. The genomes of the economically important yeast Saccharomyces cerevisiae and of the simple worm Caenorhabditis elegans have been completed, and the DNA sequence of the Homo sapiens genome [but not the interpretation] may be done within the year. From our experience in this work, 40-60% of the individual genes are unique across species.
Why should we sequence these bacterial genomes? Because they represent a broader physiological and biochemical diversity than most of the animals and plants of everyday experience. Some are selected for study because they are pathogenic, some because of their unique environmental adaptation, or because of their potential economic utility. Certainly they are less expensive to determine that the genomes of eukaryotic parasites and fungi which are considerably larger and more complex. The first genome sequenced at TIGR in 1995 was Haemophilus influenza, a major pathogen, and the most recent at the end of 1999 was Deinococcus radiodurans, an extremely radiation resistant organism. Helicobacter pylori and Vibrio cholerae are pathogenic organisms for which knowledge of the genome may enable us to design more effective antibiotics.
At TIGR the method used to sequence the genomes is to break them into “shotgun libraries” of 2-3 kilobase-long fragments, then computationally reassemble them through overlapping their segments.
The sequencing of the Thermotoga maritima genome is a significant example of how complete genome sequencing has increased our understanding of evolutionary history. Thermotoga was selected for sequencing because it may represent one of the earliest branchings of the bacteria from their last common ancestor with the archaebacteria. There was the additional consideration that, as a thermophilic organism, Thermotoga is a potential source of industrially important, thermotolerant enzymes. The genome was found to consist of 1860725 base pairs with 1877 open reading frames (probable encoded proteins) of which 54% were recognized as similar to proteins in other bacteria, 24% matched sequences found in archaebacteria, and 22% could not be reliably identified because they were not sufficiently similar to any other known sequences. The preserved gene similarities often occurred in mosaic patterns; 81 of the archaeal-like sequences occurred in 15 clusters, often associated with 30 base-pair repeat structures. This finding suggests that some of the archaeal gene clusters may have arisen through a process of horizontal gene transfer, whereby gene clusters are passed from one species to another through viroids or insertion sequences. Similar findings of mosaic assemblages in the nuclear, chloroplast and mitochondrial genomes of Arabidopsis thaliana have lead to the same suggestions for this very different organism. Any horizontal transfer process must confer some evolutionary advantage such as a new metabolic opportunity, increased competitive toxicity, or antibiotic resistance. It is assumed that the horizontal transfer must require physical and genetic proximity of the donor and recipient, relative stability of DNA in their environment, and a still undetermined transmission vector. Whatever the mode and whatever the advantages, the process would be limited by genetic instability of the repeat structures, host restriction systems, and differences in translation and editing. The selective pressures on foreign DNA are significant both for and against genetic promiscuity.
With this increased evidence of horizontal genetic inheritance, the “tree of life” becomes more and more a “web of life”. There may have been a lot of genetic self-engineering going on which would potentially falsify or obscure the evolutionary record preserved in the genome. In particular the results for Thermotoga suggest that the tree constructed on the basis of ribosomal RNA sequences only a few years ago may not be completely reliable. Certainly the nature of the last common ancestor of Bacteria and Archaea continues to be debated. The results also suggest that more careful consideration must be given to the potential consequences of genetic engineering when those artificial genes might be horizontally transferred to other, unintended species.
The availability of these genomes is giving us insights into the evolution of genes and species, the elucidation of metabolic pathways and regulatory mechanisms, the design of effective drug, and the selection of vaccine targets. Indeed, the completion of an organism's genome doesn't bring the research to a close, but rather stimulates it as measured by the rate of MedLine citations. The genome work so far suggests that we need to encourage further development of databases for annotating and providing public access to this sequence information, so that we will be able to make the most effective use of the sequence information we do have for organisms whose genomes may not be determined because they are unculturable. Also there is a need to do expression studies to confirm that the open reading frames have been interpreted correctly.
Ms. Nelson kindly answered questions from the floor. President Spargo thanked Ms. Nelson for the society, and welcomed her to its membership. The President announced the speaker for the next meeting and made the parking and beverage announcements and adjourned the 2113th meeting to the social hour at 9:29 p.m.
John S. Garavelli