Replication as Success and Unsuccessful Replication
Thomas Kuhn thought that scientific crises are also opportunities for philosophical reflection. Philosophy professor Samuel Fletcher agrees. “Philosophers have disciplinary training to find the weak spots of arguments and explore conceptual connections,” he explains, “and this makes them especially suited to sorting out some of the conceptual issues at stake in scientific crises.”
Over the past two years, Fletcher has been engaging with the philosophical relevance and implications of one such crisis facing the sciences: the replication crisis, the inability of researchers, particularly in fields such as cancer biology and social psychology, to replicate or reproduce certain published findings in subsequent studies.
In Fall 2018, Fletcher, along with fellow philosophy professor Alan Love and researchers in the Department of Psychology and School of Statistics, helped run the Reproducibility Working Group, an interdisciplinary collaboration which grew out of the Minnesota Center for Philosophy Science discussion groups on reproducibility in Spring 2017 and 2018. But Fletcher’s philosophical reflections on replication and reproducibility have not been confined to the discussion table; in the June 2018 issue of The Reasoner, he devoted his column “What’s Hot in Mathematical Philosophy?” to clarifying a central conceptual issue relevant to the crisis: the notion of replication itself.
Two Notions of Replication
In his column, Fletcher draws a distinction between replication as a success term and replication as an experimental method. “As a methodological term, ‘to replicate’ just means to accumulate further evidence by employing the same method multiple times.” As an experimental method, then, to replicate a study or experiment amounts to fulfilling the imperative: Do it again!
As a success term, on the other hand, ‘replication’ is intended to be a criterion of scientific success and experimental reliability. “In this sense, ‘to replicate’ means to produce a study or an experiment that yields the same result as a previous study or experiment,” Fletcher explains. He calls it a “success term” because, for any study, one has either successfully replicated a previous study or not, and it’s the resulting ‘yes’ or ‘no’ answer that is supposed to bear on whether the original study was reliable. “Instead of trying to answer the question of whether a particular study was reliable by the accumulation and amalgamation of evidence,” Fletcher elaborates, “it’s answered through the application of this criterion of replication.”
Fletcher maintains that equivocation between these two senses of ‘replicate’ has only confounded discussion about the replication crisis. He emphasizes that one need not uphold replication as a criterion of scientific success while being committed to replication as a method.
Is Replication-as-Success Successful?
In fact, Fletcher thinks there might be good reasons to abandon the conception of replication as a criterion of success: “Replication as a success term doesn’t have the logical properties one would want for a criterion of scientific success,” he says. For one thing, Fletcher points out, replication-as-success can be asymmetric: There are circumstances in which one study replicates another, but not vice versa. Thus, if we uphold replication as a criterion of scientific success, it seems that the total evidence for a particular hypothesis might depend on which study was conducted first. But in evaluating the evidence for a hypothesis, why should it matter in what order the studies were conducted?
Moreover, Fletcher thinks that replication is too “coarse-grained” of a criterion to represent the total evidence available for a hypothesis. “Mainstream accounts of evidence and confirmation say that one should update her beliefs about the world on the basis of the total evidence available,” Fletcher explains, “but this total evidence can be as complex as the data itself.” One study may replicate another, finding the same effect as the first study, but also imply that the observed effect is smaller or larger than suggested by the original study. A simple ‘yes’ or ‘no’ answer to the question ‘Was the study replicated?’ fails to capture this complexity and hence fails to represent the total evidence in favor of the hypothesis.
Alternatives to Replication
Proposals abound in how to deal with the replication crisis. Some promote making research more transparent through the standardization and pre-registration of scientists’ statistical and experimental methods. Others propose lowering the significance threshold for scientific studies. The most typical significance threshold in the fields afflicted by the replication crisis is 0.05, meaning that the probability of erroneously rejecting a hypothesis must be less than 5%. Some propose that the standard should be reduced to 0.005, making the probability of producing a false positive result less than half of a percent. But Fletcher is skeptical of many of these proposed technical fixes: “Changing the significance threshold will make it harder for there to be false positive results,” he explains, “but it won’t change the underlying methodological problems facing scientific research.”
Fletcher instead proposes abandoning the idea of replication as a criterion of success, opting instead for the adoption of metanalytic methods, which he describes as “processes of combining the data analyses of many different studies in order to assess the total evidential support for a particular hypothesis.” Such an approach, argues Fletcher, has the advantage of providing a total assessment of the available evidence, rather than relying on a coarse-grained criterion of reliability and success like replication.
Fletcher admits that this proposal is a controversial one. “The idea of replication as success is so ingrained in people’s minds that I think it’s hard to think about things in a different way,” he says. But Fletcher again emphasizes the role of the philosopher of science in thinking differently, in order to sort out what’s really at stake in the replication crisis. “Philosophers, in virtue of their training, are particularly suited to zooming out from the details and seeing how everything does and does not fit together,” he explains, “and this is essential to identify what’s at stake in the crisis and to figure out what it means for science and, just as importantly, what it doesn’t mean.”