Philosophy of Science and the Crisis of Reproducibility
Is science reliable? Fears of a “reproducibility crisis,” a pervasive inability of scientists to replicate or reproduce published findings, have led many researchers and laypeople alike to wonder. But philosophy professor Alan Love thinks that such skepticism toward science is unwarranted.
In a recent opinion piece in the Proceedings of the National Academy of Sciences, provocatively titled “Reproducibility failures are essential to scientific inquiry,” Love and his colleagues A. David Redish (Neuroscience, University of Minnesota), Erich Kummerfeld (Health Informatics, University of Minnesota), and Rebecca Lea Morris (Philosophy, Stanford University) argue that much of the discussion surrounding the purported crisis fails to appreciate the centrality of reproducibility failures to the scientific process. In so arguing, they point to a number of examples of reproducibility failures in mathematics and computer science, contending that such cases both cast doubt on many of the proposed solutions to the crisis and illustrate the role that failures to replicate play in scientific inquiry.
Reproducibility Failures in Mathematics
As Love notes, most commentators on the reproducibility crisis have largely focused on cancer biology and social psychology, the fields commonly thought to be most severely afflicted by failures to replicate previous findings. But Love and his colleagues are quick to point out that replication failures are not unique to these fields. In fact, such failures crop up even in non-empirical sciences like mathematics and computer science. While biologists and psychologists might seek to replicate an experiment, mathematicians might seek to reproduce a proof. In so doing, these mathematicians often find that the proof to be replicated isn’t a proof at all but contains errors or is incomplete in some way. Thus, just as biologists or psychologists might fail to replicate a previous experiment, a mathematician might fail to reproduce a purported proof.
As an example of such a replication failure in mathematics, Love and his colleagues point to the history of attempts to prove the Four-Color Theorem. The Four-Color Theorem states that for any two-dimensional map divided into touching, non-overlapping regions—a map of the continental United States for example—no more than four colors are required to color each region so that no two touching regions have the same color. Conjectured as early as 1852, it wasn’t until twenty-seven years later that the mathematician Alfred Kempe claimed to have proved it. But shortly thereafter, Percy Heawood, in trying to replicate Kempe’s findings, discovered an error in the purported proof, thus failing to reproduce it. Over the next century, mathematicians continued to refine Kempe’s methods, culminating in Appel and Haken’s computer proof of the Four-Color Theorem in 1976. But again, as with Kempe’s purported proof nearly a century before, mathematicians continued in their attempts to replicate the proof, employing new methods and leading to a number of novel proofs of the theorem.
Failure as Essential
Love and his colleagues argue that considering such examples of reproducibility failures in mathematics and computer science helps clarify the role and scope of replication failures in the sciences. Though proposals to overcome the crisis have largely addressed questionable research practices, statistical problems, and issues of experimental design and execution, they point out that mathematics, as a non-empirical discipline, is immune from these kinds of ailments. “Although good experimental design and data management are obviously important parts of conducting good science,” they write, “these examples show that failures of reproducibility occur even in fields in which these specific problems do not arise.”
But replication failures in the non-empirical sciences also reveal the central role such failures play in the scientific process. Upon failing to replicate Kempe’s purported proof of the Four-Color Theorem, subsequent mathematicians continued to refine Kempe’s methods. “When faced with reproducibility failures in the form of an invalid proof and questions about the validity of particular methods,” Love and his colleagues write, “mathematicians sought to better understand these methods, results, and inferences over time, which led to new mathematical techniques and different ways to prove the Four-Color Theorem.” Thus, the failure of mathematicians to replicate Kempe’s purported proof was not a mere failure, but an opportunity to refine Kempe’s methods and acquire further mathematical insight. “In the identification of the failure, they developed new methods and learned from the failure,” Love adds, “and this is how researchers respond to replication failures not just in mathematics but across the sciences.”
Love argues that failures to replicate findings are an essential component of scientific inquiry, allowing for the eventual integration of conflicting results into coherent theories. “Often times what you find is that replication failures and conflicting results reflect different perspectives on complex phenomena,” says Love. Rather than taking replication failures as merely failures, Love argues that they provide the opportunity to decipher how different perspectives fit together into a complex whole. “Reproducibility failures are a normal part of science,” Love and his colleagues explain, “and do not necessarily indicate incompetence or fraud.” In place of “technical fixes” designed to curb replication failures by addressing statistical and experimental problems, Love instead emphasizes the need for strategies to metabolize these failures, incorporating conflicting outcomes into more encompassing scientific theories.
The Philosopher’s Many Roles
Thus, Love contends that reproducibility failures should not be taken to undermine the reliability or trustworthiness of science: “We should be confident precisely because scientists sometimes get it wrong, since they know how to process their errors and take advantage of situations when they fail.”
“Part of the way scientific inquiry works is that you fail,” he elaborates, “and if you want to build the public’s confidence in scientific knowledge, you need to provide them with a realistic image of how that knowledge works.” Presenting such an image of how scientific knowledge actually works -- and how replication failures are an integral component of the process -- is one task facing the philosopher of science: “Part of what philosophers are able to do quite well is think about problems from many different angles, and for that reason, philosophers are particularly suited to facilitating a broader and more realistic understanding of how the sciences work.”
But this is not to suggest that there is only one proper role for the philosopher of science in the conversations about reproducibility: “Philosophers can play many roles, and there is still much philosophical reflection to be done,” he says. For this reason, in Fall 2018, Love, along with fellow philosophy professor Samuel Fletcher and researchers in the Department of Psychology and School of Statistics, helped run the Reproducibility Working Group, an interdisciplinary collaboration which grew out of the Minnesota Center for Philosophy Science discussion groups on reproducibility in Spring 2017 and 2018. Through such interdisciplinary discussion among philosophers and scientists alike, Love is confident that we can work toward sorting out the significance of different facets of the reproducibility crisis and thereby alleviate the fears to which it has given rise among researchers, funding agencies, and the general public.