Science Can Perpetuate Myths

In a new paper in Perspectives on Psychological Science (open access PDF), John P. A. Ioannidis of the Stanford Prevention Research Center explains “Why Science Is Not Necessarily Self-Correcting.” His primary focus is on flaws in psychological research, but the principles he discusses apply to all scientific fields:

The ability to self-correct is considered a hallmark of science. However, self-correction does not always happen to scientific evidence by default. The trajectory of scientific credibility can fluctuate over time, both for defined scientific fields and for science at-large. History suggests that major catastrophes in scientific credibility are unfortunately possible and the argument that “it is obvious that progress is made” is weak. (Emphasis added.)

Self-correction is an ideal not always attained in the real world of science, for numerous reasons. Consider, for instance, the fate of a typical research paper. First of all, the paper’s conclusions might be right or wrong. Second, there are three ways the paper might be received, leading to six possible outcomes:

If right, the paper might be replicated and confirmed. This is the ideal.
If right, a replication attempt might incorrectly render it false (“false nonreplication”).
If right, it might not be replicated at all, leading to an “unconfirmed genuine discovery.”
If wrong, a replication attempt might show it to be wrong. This is another ideal.
If wrong, a replication attempt might not show it to be wrong, leading to a “perpetuated fallacy.”
If wrong, it might not be replicated at all, leading to an “unchallenged fallacy.”

Only #1 and #4 are desirable outcomes, while #3 is tolerable. The other outcomes conflict with the ideal of self-correction. But how common are they? Ioannidis shows that the vast majority of papers are never replicated, meaning that some genuine discoveries are never independently confirmed, and some “wrong” research is never checked, perpetuating fallacies.

In the absence of replication efforts, one is left with unconfirmed (genuine) discoveries and unchallenged fallacies. In several fields of investigation, including many areas of psychological science, perpetuated and unchallenged fallacies may comprise the majority of the circulating evidence.

At any point in time, he argues, the credibility of research is fluctuating. It would be na�ve to assume a trend toward self-correction over time. In addition, the credibility may vary between scientific fields.

Even if we believe that properly conducted science will asymptotically trend towards perfect credibility, there is no guarantee that scientific credibility continuously improves and that there are no gap periods during which scientific credibility drops or sinks (slightly or dramatically). The credibility of new findings and the total evidence is in continuous flux. It may get better or worse….
Moreover, even when we do witness an overall improvement, it is likely that there will be substantial heterogeneity across scientific fields: Credibility may increase in some fields, but may decrease in others. If fields with low credibility become prolific in their productivity and dominant (e.g., they attract many researchers, funding, and editorial and public attention), then they may decrease the overall credibility of the scientific corpus, even if major progress is made in many other fields

Ioannidis provides examples from history, like phrenology and eugenics, that show “major drops in credibility in large fields or even across the scientific continuum are possible.”

How is current science faring? Obviously, we see tremendous achievements in technology, measurement ability, and in the amount and sometimes even the quality of data. However, the state of the evidence needs careful appraisal. It could be that the credibility of some disciplines is improving, whereas some others may be facing difficult times with decreased credibility. It is not possible to exclude even the possibility that massive drops in credibility could happen or are happening.

The burning of the Library of Alexandria shows that massive destruction of evidence is also possible. We falsely breathe a sigh of relief that we no longer burn great libraries down. But consider: “Is it possible that we are facing a situation where there is massive destruction of evidence?” he asks. “At first sight, this would sound weird, as current science is apparently so tolerant.” Who could possibly believe that, these days, “a Library of Alexandria actually disappears every few minutes“?

Currently, there are petabytes of scientific information produced on a daily basis and millions of papers are being published annually. In most scientific fields, the vast majority of the collected data, protocols, and analyses are not available and/or disappear soon after or even before publication. If one tries to identify the raw data and protocols of papers published only 20 years ago, it is likely that very little is currently available. Even for papers published this week, readily available raw data, protocols, and analysis codes would be the exception rather than the rule. The large majority of currently published papers are mostly synoptic advertisements of the actual research. One cannot even try to reproduce the results based on what is available in the published word.

Now that we are all sufficiently alarmed, Ioannidis ups the ante:

Moreover, is it possible that we are currently facing a situation where there is massive production of wrong information or distortion of information? For example, could it be that the advent of research fields in which the prime motive and strongest focus is making new discoveries and chasing statistical significance at all cost has eroded the credibility of science and credibility is decreasing over time?

These are indeed major concerns, but to ground them beyond a “what if” possibility or a claim that the scientific sky is falling, Ioannidis needs to provide evidence it is indeed happening. He begins with some prior studies:

Empirical evidence from diverse fields suggests that when efforts are made to repeat or reproduce published research, the repeatability and reproducibility is dismal (Begley & Ellis, 2012; Donoho, Maleki, Rahman, Shahram, & Stodden, 2009; Hothorn & Leisch, 2011; Ioannidis et al., 2009; Prinz, Schlange, & Asadullah, 2011). Not surprisingly, even hedge funds don’t put much trust on published scientific results (Osherovich, 2011).

Ioannidis takes the reader on an mental excursion to an imaginary planet he dubs F345, where science is conducted by young researchers poring over huge databases, looking for instances of high significance according to an arbitrary standard. They are taught early on that what matters is “making new discoveries and finding statistically significant results at all cost.” Whenever they find some “extraordinary enough” statistical result, they show the senior investigator, who permits only the most extravagant cases to move forward — because that’s all the journals, the funding agencies and the university presidents care about. Replication? Boring. “Researchers advance if they make more extreme, extravagant claims and thus publish extravagant results, which get more funding even though almost all of them are wrong.” Citizens of F345 are bombarded by the mass media on a daily basis with announcements about new discoveries, “although no serious discovery has been made in F345 for many years now.”
Such a “dreadful” situation can occur when science loses sight of its purpose: the pursuit of truth about the world. Pursuing truth requires free speech, critical thinking and democratic institutions. Moreover, the price of good science, like liberty, is eternal vigilance:

I don’t even want to think that Earth in 2012 AD is a replica of F345 in year 3045268. However, there are some features where our current modes of performing, reporting, and replicating (or not replicating) scientific results could resemble this dreadful nightmare. More important, we may well evolve toward the F345 paradigm, unless we continuously safeguard scientific principles. Safeguarding scientific principles is not something to be done once and for all. It is a challenge that needs to be met successfully on a daily basis both by single scientists and the whole scientific establishment. Science may well be the noblest achievement of human civilization, but this achievement is not irreversible.

But doesn’t the clear progress in cell phones, jet airplanes and longer life expectancy indicate that science is healthy? “Some progress is clearly made,” he agrees, so the situation can’t be all bad, a critic might react. “This argument is very weak,” he argues. For one thing, it doesn’t mean science couldn’t be doing far better. “I guess that some high priests of the Egyptians may have equally claimed that their science was perfect and optimal because they had discovered fire and wheels — what more could one hope to achieve?” For another, even if some right results “work” in some practical way, wrong results could proliferate faster than right ones. Finally, he says, we may attribute progress to the wrong thing. Longer life expectancy in developing countries, for instance, could be due more to initiatives for clean water and sanitation than to medicine.
At this point, Ioannidis looks at actual empirical evidence from psychological research that shows most of it is never replicated. Replication, in fact, is “extremely uncommon,” meaning that it tends to perpetuate unconfirmed truths or unconfirmed myths. Perpetuated fallacies may be the most common: “Replication efforts, rare as they are, are done primarily by the same investigators who propose the original discoveries.” Confirmation bias may be the rule. Ioannidis lists a dozen impediments to valid replication, including misreporting, publication bias, and “voodoo correlations” — false relationships that may meet a statistical standard of significance but in fact are bogus. In any case, “The claimed discoveries that have no published replication attempts apparently make up the vast majority of psychological science.”
Lest one think his criticisms target only the field of psychology, he says, “major drops in credibility in large fields or even across the scientific continuum are possible. Major drops in credibility can occur by massive destruction of evidence or massive production of wrong evidence or distortion of evidence.” Who knows how much important data hides in the desk drawers of researchers, overlooked in their hurry to publish what they judge to be more interesting?
To end on a more positive note, Ioannidis gives advice for improving scientific credibility. In his conclusion, “Incentives for Replication and for Correcting Wrong Results,” he lists practical steps, like challenging the focus on “high impact” research, improving metrics for replication, and separating the task of exploratory research and confirmation. High above all these pieces of advice, though, is one overarching principle, without which no science can hope to succeed:

….science is about getting as close to the truth as possible and not about getting spectacular, but wrong, results. Some of the proposed steps may even harm the credibility of science unless the pursuit of truth is given the highest priority….
However, at the end of the day, no matter what changes are made, scientific credibility may not improve unless the pursuit of truth remains our main goal in our work as scientists. This is a most noble mission that needs to be continuously reasserted.