From Dark Data to a Global, Accessible Digital Resource Documenting Life on Earth
Program Director, Division of Biological Infrastructure
Directorate for Biological Sciences
National Science Foundation
About the Lecture
For more than 300 years, biologists have documented research by preserving samples, known as voucher specimens, in biological collections. These specimens are the direct evidence for recognition, description, and publication of the millions of species known to science. The basic data and information within collections worldwide underwrite our knowledge about biological diversity, the history of life on earth, molecular and cellular biology, organismic and ecological systems. Downstream applications in biomedical research, agriculture, and management of genetic and natural resources also directly use or indirectly benefit from collection knowledge bases. However, access to the physical specimens and data associated with them have traditionally been available only to specialists based on their credentials and academic background. Our national and international infrastructure of biological collections are a treasure trove of data; but these are dark data, much of which is still hidden away in the physical archives. This lecture will address how advances in technology are changing how collections are conceived, maintained, secured, and made accessible, increasing their relevance to science and society. Recent impetus for change began with the recognition that collections are not dark, hidden archives but rich, expansive, big data resources. It is estimated that 2.5 billion specimen objects are curated in biological collections. Every single object has, at a minimum, data about its identity, origin, and provenance. By making basic collections data more available through digitization initiatives scientists will be able not only to study and understand the collections themselves better and the items in them; but, also be better able model how landscapes and environments have changed in the past, how they are changing now, and how they will change in the future. Technology has enabled use of scientific collections in novel, unanticipated ways, driving further innovation and opening new opportunities for research using collections. For example, ancient DNA methods enable researchers to sequence extinct species from specimens in collections. Non-invasive CT scanning provides a means to visualize of the brain cavity of a fossil animal specimen. With X-ray fluorescence spectroscopy herbarium specimens can be scanned for hyper-accumulation of minerals and to link these observations to environmental sensing data. Furthermore, the concept of biological collections discussed in this lecture goes beyond preserved items to include living stocks, cultures, and cryo-facilities. These repositories make living material, tissues, and genetic resources available for study of model and non-model organisms that are essential for research on far-ranging topics. The lecture will illustrate examples of how long-term investments in collections have paid off, along with the challenges for supporting and managing the infrastructure critical to their maintenance, growth and effective utilization.
About the Speaker
Reed Beaman is a Program Director at the National Science Foundation (NSF) with primary responsibilities for the Collections in Support of Biological Research and Advancing Digitization of Biodiversity Collections programs. Previously at NSF he was responsible for a variety of programs in biology, including Next Generation Networks for Neuroscience, Advances in Biological Informatics; Dimensions of Biodiversity, and Critical Techniques, Technologies and Methodologies for Advancing Foundations and Applications of Big Data Sciences and Engineering. Reed’s research interests have focused in Southeast Asia, particularly on Mount Kinabalu, a biodiversity hotspot on the Island of Borneo. His dissertation work involved the description of eight new plant species and landscape level biogeographic analysis using remote sensing imagery and geographic information systems. More recently, he has engaged with researchers in Asia as the Biodiversity Expedition Lead for the Pacific Rim Applications and Middleware Grid Applications (PRAGMA) network, a community of practice that facilitates cyberinfrastructure experimentation on an international scale. Reed was a Postdoctoral Fellow in Biological Informatics sponsored by the Royal Botanic Gardens Sydney and University of Kansas, during which he developed software tools for automating geo-referencing specimen data. He continued work on digitization methods while Associate Director for Informatics at the Yale Peabody Museum and as Curator of Informatics at the Florida Museum of Natural History prior to serving at the NSF. Reed earned a BS in Botany at the University of Michigan and a PhD in Botany at the University of Florida.
President Larry Millstein called the 2376th meeting of the Society to order at 8:11 p.m. He announced the order of business and welcomed new members. President Millstein reported on a presentation by Society members Alan Stern and Kirby Runyon to the International Astronomical Union proposing a revision to the definition of the term “planet”. Recording Secretary Preston Thomas presented a report to the Society on an ascent of Mt. Kilimanjaro and observations regarding the data collected on the acclimatization of the climbers. The minutes of the previous meeting were read and approved. President Millstein then introduced the speaker for the evening, Reed Beaman, a Program Director for the Division of Biological Infrastructure at the Directorate for Biological Sciences of the National Science Foundation. His lecture was titled “Biological Collections: From Dark Data to a Global, Accessible Digital Resource Documenting Life on Earth”.
Dr. Beaman began by explaining that biological collections are often misperceived as akin to stamp collections. In practice, however, biological collections, both preserved and live, provide an empirical basis for research as well as an invaluable historical record that can be studied years or decades in the future to further our understanding of the relationships between species. Preserving these treasure troves, and making their hidden insights available to a wide audience, is a critical project for the scientific community.
Dr. Beaman explained that the project of describing nature goes back to the Systema Naturae written by Carl Linneaus in 1735. Given the expense, uncertainty, and danger, Dr. Beaman characterized the expeditions of James Cook, Charles Darwin, Alfred Russel Wallace, and Odoardo Beccari as the moonshots of the 18th and 19th century.
Current biological collections and their projects go well beyond the mere acquisition and classification that characterized the 18th and 19th century expeditions. Today, a major focus of collections is not exploring new lands, but racing to obtain samples and indigenous knowledge from areas that are threatened by habitat destruction or other permanent changes. Dr. Beaman explained that we may not be able to go back to some of these places, and among other purposes, biological collections provide an irreplaceable window into the past.
Dr. Beaman explained that a “sample” in a biological collection consists of a specimen together with its metadata: a scientific name, where it was collected, when, and by whom. If a specimen does not have that information, it is not of scientific value. Specimens are no longer stored in formalin but ethanol, which is generally safer but is still flammable. Importantly, DNA can still be sequenced from specimens preserved in ethanol, meaning that the plunging cost of genetic sequencing can be leveraged to retroactively investigate samples collected decades or sometimes centuries ago.
Dr. Beaman explained that in addition to preserved samples, slide collections, fossils, and skeletons, living and cryopreserved stocks such as algae, microbes, fungi, and rodents are a key part of modern biological collections. Through National Science Foundation support, samples of these live stocks can be ordered on a cost-recovery basis, which makes them very available for research, promoting standarization. Dr. Beaman highlighted one such animal, a cute brown deer mouse, Peromyscus maniculatus, which is emerging as a potential supplemental model organism to the familiar white mouse Mus musculus. Other living stocks supported by NSF include the lemur center, which facilitates cognitive, perceptual, and behavioral experimentation.
Dr. Beaman emphasized that we may store collections for decades without knowing their usefulness, but if we don’t store them we’ll never know. The advent of CRISPR-CAS 9 demonstrates the importance of maintaining reliable, well-curated and documented stock. The initial paper exploring the topic was published in the 1970s, and nearly 50 years later, consistent living stocks were available to further that research.
Dr. Beaman concluded that, if we resurrect the wooly mammoth, it will be because we took the time and effort to grow, curate, and analyze our precious biological treasure troves.
After the conclusion of the talk, President Millstein invited questions from the audience.
One question asked about the risk of data loss of digital collections due to accident or obsolesce. Dr. Beaman acknowledged the risk, and noted that there are efforts to replicate digital collections, but the effectiveness of these programs is not known. Futureproofing is even more challenging due to the NSF funding paradigm of 5 years plus a potential 5 year extension, meaning that initiatives it funds necessarily focus on this time frame.
After the question and answer period, President Millstein thanked the speaker, made the usual housekeeping announcements, and invited guests to join the Society. At 10:08 p.m., President Millstein adjourned the 2376th meeting of the Society to the social hour.
Preston Thomas External Communications Director