Deserts in the Deluge
Transformation of the earth’s social and ecological systems is occurring at a rate and magnitude unparalleled in human experience. The past five decades have seen the world’s population more than double, food and water consumption triple, and fossil-fuel use quadruple. Attendant benefits such as increased lifespans and economic growth are increasingly joined by negatives including growing socioeconomic inequality, environmental degradation, and climate change. These transformations are manifestations of complex human-environment systems that arise from complicated interactions among individuals, society, and the environment.
The National Research Council and many other scientific and policy bodies have called for more richly-detailed data to support the research and informed decisions necessary to meet the challenges of rapid social and environmental change. There is particular interest in the ‘data deluge’ or ‘big data’, or research based on datasets that are vastly larger than those traditionally used in most fields, and which in turn entail new forms of processing and analysis. Disciplines ranging from physics to economics are being redefined by big data gleaned from a host of new sensors, internet activities, and merging of existing databases. Many of our daily experiences also reflect the rise of big data. The online retailer Amazon mines millions of customer transactions to craft individual book recommendations, while at the same time, governments trawl through vast private and public databases about many aspects of our lives to identify specific people and behaviors.
There are deserts in the data deluge, however. Gaps in human-environment big data include:
- Data. For all the excitement about big data, scholars wanting to untangle human-environment interactions face a dearth of spatially-detailed multidecadal data. While some relevant data are available, such as climate observations or online opinions about global warming, there is surprisingly little detailed information about many social and natural features for most of the globe before the year 2000.
- Methods. There are shortfalls in our ability to store, manipulate, and analyze the big data of human-environment systems. Most big data research is done on fairly simple data, in the sense they involve straightforward measures or a single research domain. In contrast, we face many unresolved challenges in representing social and biophysical entities and relationships that operate at multiple levels of organization, over space, and through time.
- Theory. A growing number of big data proponents argue that the data deluge augers the ‘end of theory’ because this approach offers a powerful ‘black box’ that creates knowledge without needing domain experts or engagement with existing research, method, or theory. This proposition is profoundly at odds with the core conceptual precepts of many disciplines that seek to advance understanding of human-environment systems.
In response to these research needs, Dr. Steven Manson and colleagues have garnered $18 million in external funding over the past four years to advance the big data of human-environment research. He is the Principal Investigator (PI) for the National Spatiotemporal Population Research Infrastructure (NSPRI) and Co-Principal Investigator for the Integrated Public Use Microdata Series (IPUMS) and Terra Populus (TerraPop).
These projects create ‘gold standard’ research data. They also exemplify big data: TerraPop is currently in the prototype stage but will be the largest curated source for global human-environment data; NSPRI is the largest population-environment dataset on the United States at 265 billion data points; and IPUMS is the largest population database in the world, with records on over half a billion individuals described by 270 billion data points. These projects share data, approaches, and underlying research questions (e.g., TerraPop uses IPUMS data, NSPRI uses TerraPop methods).
Dr. Manson and his colleagues identify, acquire, and develop various data sources ranging from historic handwritten census forms to current satellite observations of the earth. They research new ways to standardize these data and make them comparable across space and through time. These data are then preserved and made internet-accessible so they are readily used by thousands of scholars and many other students, policy makers, and members of the public. Finally, the projects work with over a hundred national statistical agencies and other organizations around the world on the science of large, complex spatiotemporal datasets.
This research has important broader impacts. All three projects—TerraPop, NSPRI, and IPUMS—employ and train a diverse group of post-docs and graduate and undergraduate research assistants. The projects offer training workshops and online instructional modules for hundreds of researchers and policy makers who use these data. The projects partner with educational organizations that serve thousands of students via specialized curricula used in classrooms worldwide. Finally, these projects support the broader service mission of the University by providing data and research to staff in organizations ranging from regional planning councils to health policy organizations and community development agencies.