The Human Genome Project

Unravelling the Instructions for Creating Life

Robert J. Robbins

About the Lecture

At conception, information is passed from parent to progeny as digitally encoded instructions. Each human sperm and egg carries a linear string of 3.3 billion nucleotides of DNA packaged into 23 chromosomes, similar to gigabytes of instructions stored on 23 mass-storage devices in a computer system. The goal of the Human Genome Project (HGP) is obtaining and understanding a copy of those instructions for humans and for several other species. The computer equivalent would be extracting the binary files from a 3.3 gigabyte hard drive, then reverse engineering the files all the way back to the design specifications. In fact, the Human Genome Project is nothing but the effort to create the most important database ever attempted—the database of instructions for building people.

The HGP is an international effort to characterize all the human genetic material by improving genetic maps, constructing physical maps of entire chromosomes, and determining the complete sequence of the DNA in the human genome. Parallel studies are being carried out on selected model organisms. The ultimate goal is to identify all of the more than 100,000 human genes and to render them accessible for further biological study. Information obtained as part of the HGP will dramatically change almost all biological and medical research. Comparative genomic data will provide the basis for the most definitive study of evolutionary relationships possible. In addition, both the methods and the data developed as part of the project will benefit investigations of many other genomes, including a large number of commercially important plants and animals.

About the Speaker

Mr. Robbins received an A.B. in oriental history from Stanford University. In 1970, he enrolled in Michigan State University, earning a B.S., M.S., and a Ph.D in zoology before joining the faculty in 1975. In 1987, he went to the National Science Foundation to “facilitate the computerization of biology.” After two years as NSF’s program director for Database Activities in the Biological, Behavioral, and Social Sciences, he accepted a position in 1991 as Director of the Applied Research Laboratory in the William H. Welch Medical Library at Johns Hopkins, where he also held an appointment in the Computer Science Department and served as director of the informatics core of the Genome Data Base, the central repository for data generated by the HGP. Currently on leave from Hopkins, he serves as Program Director for Bioinformation Infrastructure in the Office of Health and Environmental Research of the U.S. Department of Energy.

Minutes

The President, Ms. Enig, called the 2022th meeting to order at 8:16 p.m. on January 21, 1994. The Recording Secretary read the minutes of the 2020th meeting and they were approved. The President then read a portion of the minutes from the 414th meeting, January 20, 1894.

The speaker for the 2022th meeting was Robert J. Robbins, of the Applied Research Laboratory, The Johns Hopkins University, who talked on “The Human Genome Project: Unravelling the Instructions for Creating Life".

Mr. Robbins began by discussing the relatively new science of genomics, the study of the genetic material of life, its organization, function and eventually its detailed structure. One goal of the human genome project is the construction of a high-resolution human genetic map. A map is a representation of the relative positions of recognizable landmarks. Landmarks may be functional units — genes — or just handy unique sequences that can serve as arbitrary milestones. Arbitrary “landmarks” of genomes include sections of unique text, called Sequence Tagged Sites (STS), found along the approximately 6 billion letters of human DNA sequences.

Manipulating and understanding genomes involves the manipulation of tremendous amounts of information. Genomics as a science depends on the computer as an instrument, in the same way that astronomy depends on the telescope and histology on the microscope.

In the last century Gregor Mendel observed that numerical patterns in the distribution of traits among progeny suggested that inheritance was controlled by discrete “particles” that were carried in pairs in adults but transmitted individually to progeny. After Mendel's work was “rediscovered” at the start of this century those individual inheritable particles were named genes. It was found that these genes tended to be inherited in linked groups. These groups of genes would occasionally reassociate in a way that suggested they were linearly arranged; nearby genes were more likely to remain associated than distant genes. This phenomenon enabled A. H. Sturtevant to produce the first genetic map for five genes of the fruit fly in 1913 [1]. Linked groups of genes in fruit flies and other organisms were later recognized to be associated with the chromosomes microscopically observed in cell nuclei. The human genome is believed to contain 50,000 to 100,000 genes arrayed in 23 chromosome pairs.

In 1988 the National Academy of Science Committee on Mapping the Human Genome said in their report, “The ultimate, highest resolution map of the human genome is the nucleotide sequence, in which the identity and location of each of 3 billion nucleotide pairs is known.” [2] Unfortunately, this language ignores the fact that normal humans show significant variation in the size of their genomes. To deal with this normal variation, we should begin to think of genomic anatomies rather than maps. A good anatomy is approximately true of all members of a population, but exactly true of none.

Although the classic notion that “genes are arranged on chromosomes, like beads strung on a loose string” is no longer considered valid, remnants of this thinking still adversely affects our thinking about genome organization. The idea that the position of a particular gene could be represented as precise start and stop addresses, given in base pairs, ignores variations in genome size. In a computer analogy, genes are arranged in genomes like subsegments in a linked list. In a dynamic linked list, absolute addressing is impossible. Instead, objects can best be described in terms of offsets from recognizable landmarks. Addressing is done associatively (i.e., by content) rather than by location. Before the genome project is completed, we will need to develop better schemes for representing the location of genes in genomes.

Although biological information is passed from parent to progeny in digitally encoded form, we must be careful not to push analogies with computer science too far. The expression of encoded information in a computer is deterministic, in that we expect the same op codes always to produce the same result. However, the “op-codes” of the genome — the instructions that affect the expression of genes — are probabilistic, not deterministic. Although we expect protein-coding regions to be normally decoded in a deterministic manner, promoters and other regulatory regions interact with the enzymes that recognize them in a probabilistic manner. As we strive to understand the genome by reverse engineering the billions of bytes of instructions encoded in DNA, we must always bear the probabilistic nature of gene expression in mind.

When the human genome project is finished, many of the innovative laboratory methods involved in its successful conclusion will begin to fade from memory. What will remain as the project's enduring contribution is a vast amount of computerized knowledge. Seen in this light, the human genome project is nothing but the effort to create the most important database ever attempted — the database containing the instructions for creating life.

[1] J. Exp. Zool. 14, pp. 43-59 (1913).
[2] Mapping and Sequencing the Human Genome by the Committee on Mapping and Sequencing the Human Genome of the National Research Council. National Academy Press, Washington, DC (1988), p. 20.

Mr. Robbins then kindly answered numerous questions from the audience.

The President thanked the speaker on behalf of the Society. There were no new members to be introduced. The President then announced the speaker for the next meeting on February 4, Mr. Fred Rothwarf on “Rare Earth Permanent Magnets and their Applications", made the usual parking announcement, and adjourned the 2022th meeting at 9:46 p.m.

Attendance: 42
Temperature: -7.2°C
Weather: clear with ice and snow cover

Respectfully submitted,

John S. Garavelli
Recording Secretary

The 2,022nd Meeting of the Society

January 21, 1994 at 8:00 PM

Powell Auditorium at the Cosmos Club

The Human Genome Project

Unravelling the Instructions for Creating Life

Robert J. Robbins

Robert J. Robbins

About the Lecture

About the Speaker

Minutes

The 2,022^nd Meeting of the Society