When Richard Feynman discussed in his 1974 commencement address at Caltech the problem of a research endeavor in which the critical hypotheses are not amenable to experimental testing, he described it as the “inapplicability of the scientific method to the subject.” The hypotheses generated in some disciplines simply cannot be tested in any rigorous and therefore meaningful way. Pieces of a hypothesis may be testable, but not the whole. These hypotheses can still be framed in such a way that they’re theoretically capable of being disproved — if we do this in an experiment or make this observation, we should expect to observe that. However, the reality of the science is such that they can’t be truly disproved, because the necessary experiments are either too expensive or they’re infeasible at any cost.
The implication, though, should be clear: If the scientific method can’t be applied to the subject, then the researchers who think of themselves as studying the subject are not doing functional science and cannot be expected to establish reliable knowledge. It may be that simple. It’s a brutal assessment, but it is the implication, nonetheless. As John Ioannidis said, “Claimed research findings may often be simply accurate measures of the prevailing bias,” even if the investigators “working in the field are likely to resist accepting that the whole field in which they have spent their careers is a ‘null field.’”
Research disciplines that have existed outside this Darwinian world of hypothesis and rigorous experimental test are disciplines in which reasonable hypotheses can never be unambiguously refuted. If researchers have never been publicly shamed by definitive experiments refuting their conclusions, if the discipline has no history of which to be embarrassed, no collective memory of quite how easy it is to be fooled, the researchers in it are likely (going back to Irving Langmuir) to have a “lack of understanding about what human beings can do to themselves in the way of being led astray by subjective effects, wishful thinking or threshold interactions.” The result would be a research culture that has never earned the necessary understanding and institutionalized skepticism necessary to establish reliable knowledge, and one that passes on this pathological culture from generation to generation.
Consider modern research on nutrition and chronic disease. One primary goal of this research is to establish the dietary causes of these diseases and then ultimately to prevent them. But diseases like heart disease, diabetes, and obesity can take years to decades to develop. Hence, clinical trials testing viable hypotheses of prevention — change in diet, for instance — have to run long enough to establish a significant difference between a control diet and an intervention diet. Depending on the size of the effect, this could require tens of thousands of subjects adhering to the different diets for years or maybe thousands adhering to the diets for decades. The subjects have to be chosen at random to assure that whatever effect observed, if any, is likely to be from the diet rather than some other characteristic that associates with preferring one dietary pattern (low-fat vegetarian, say) to another (high-fat with animal products).
The subjects have to remain on the diet for as long as the study proceeds (for years to decades), regardless of what else they might be reading in The New York Times or on Instagram about what they should or should not be eating. All of this could cost from tens of millions to hundreds of millions of dollars, and the subjects still won’t be blinded to the dietary intervention, implying that whatever the results, they have a high likelihood of being misinterpreted.
Even if correctly interpreted, the study would have to be replicated multiple times for the interpretation to be one on which we can rest our convictions — i.e., reliable knowledge. This could mean a 50- or 60-year research plan costing in the neighborhood of a few billion dollars that would still likely as not get an answer that provides us no helpful method of disease prevention. In other words, the hypothesis of prevention was wrong — a meaningful scientific conclusion — and has been refuted by experimental tests. Now what do we do?
Influential authorities in nutrition and chronic disease research — most notably former director of the Centers for Disease Control Thomas Frieden in the New England Journal of Medicine — have recently taken to arguing that some hypotheses of prevention are so important to implement on a nationwide or worldwide scale that these kinds of rigorous experimental tests should not be done. Some suggest they are more trouble than they are worth. But the counterargument is that anyone who thinks such tests are unnecessary believes the first principal of science is that we are unlikely to be fooled. Such individuals may not be the ones we want making scientific decisions about our health. “Trust us,” they’re saying, when they haven’t done anything like the hard work necessary to earn our trust.
In these kinds of research disciplines, pathological science is likely to be the norm rather than the exception. It’s possible for the field to stay healthy, but it requires that the researchers remain utterly honest about their conclusions, drawing them very tentatively, because empirical tests have not been done and won’t be. In modern science, such an approach is unlikely to garner further funding, let alone get an article published in a major journal, and so it is actively discouraged. And this could be the case, despite all the best interest and intellectual firepower of the researchers involved. A field in which every viable hypothesis stays in play (unless, of course, the researchers simply don’t like the implications), for whatever reason, is not a functioning science. A field in which the researchers rationalize away the effort of doing rigorous and possibly definitive experiments as unnecessary because they’re too difficult or expensive to do is not a functioning science.
What does good science look like?
If an entire discipline was pathological, how would we tell? Rampant irreproducibility (crisis or not) would be a very good sign, particularly the irreproducibility of conclusions that are not drawn tentatively. But the experiments capable of exposing the irreproducibility may not have been done. They may not be doable. What other clues could we look for that might give it away?
To answer these questions, it helps to have a better understanding of what a functioning science looks like. To understand when a process is unhealthy, in other words — i.e., pathological — we have to first understand what health looks like.
Feynman provided that in his commencement address by describing a simple rat experiment from the 1930s. The lesson from the experiment: Healthy science, what Feynman called an “A-Number-1” experiment, is not just about reporting what was done and observed with utter honesty and couching the results with suitable humility; it is the procedural manifestation of a relentless drive to assure that assumptions and conclusions are correct. It requires the continued interrogation of the evidence until all but one reasonable interpretation can be ruled out (although what constitutes “reasonable” will always be a judgment call). This again is why science is often referred to as institutionalized skepticism. Scientists are expected to remain skeptical of all evidence and interpretations of that evidence until they no longer have any choice but to believe it.
Here’s Feynman on the rat experiment, the one that lived up to Feynman’s expectations of good science, done by “a man named Young” in 1937:
He had a long corridor with doors all along one side where the rats came in, and doors along the other side where the food was. He wanted to see if he could train the rats to go in at the third door down from wherever he started them off. No. The rats went immediately to the door where the food had been the time before.
The question was, how did the rats know, because the corridor was so beautifully built and so uniform, that this was the same door as before? Obviously there was something about the door that was different from the other doors. So he painted the doors very carefully, arranging the textures on the faces of the doors exactly the same. Still the rats could tell. Then he thought maybe the rats were smelling the food, so he used chemicals to change the smell after each run. Still the rats could tell. Then he realized the rats might be able to tell by seeing the lights and the arrangement in the laboratory like any commonsense person. So he covered the corridor, and, still the rats could tell.
He finally found that they could tell by the way the floor sounded when they ran over it. And he could only fix that by putting his corridor in sand. So he covered one after another of all possible clues and finally was able to fool the rats so that they had to learn to go in the third door. If he relaxed any of his conditions, the rats could tell.
Now, from a scientific standpoint, that is an A-Number-1 experiment. That is the experiment that makes rat-running experiments sensible, because it uncovers the clues that the rat is really using — not what you think it’s using. And that is the experiment that tells exactly what conditions you have to use in order to be careful and control everything in an experiment with rat-running.
I looked into the subsequent history of this research. The subsequent experiment, and the one after that, never referred to Mr. Young. They never used any of his criteria of putting the corridor on sand, or being very careful. They just went right on running rats in the same old way, and paid no attention to the great discoveries of Mr. Young, and his papers are not referred to, because he didn’t discover anything about the rats. In fact, he discovered all the things you have to do to discover something about rats. But not paying attention to experiments like that is a characteristic of Cargo Cult Science.
I added the italics in the above paragraphs. What Young did is what he had to do to then learn something reliable about rats, covering “one after another of available clues” until he ran out of possibilities. He didn’t trust anyone else’s assessment of how and why his rats behaved as they did. He did what was necessary so he could trust his own assessment. Put simply, Young wanted to know if what he knew was really so. And by paying attention to the details, by relentlessly testing hypotheses — it’s the doors, no, the smell, no, the light, no, the sound, yes — Young was acting as a scientist should.
Feynman talked about this kind of process in a famous series of lectures given at Cornell University in 1963 and then published in book form as The Character of Physical Law. He talked about creating new laws by proposing hypotheses and then comparing the predictions of those hypotheses with experiment (A “new law” in the context of what we’re discussing would be a causal relationship between some aspect of our diet/lifestyle and the disease or disorder it supposedly causes or exacerbates). The typical takeaway from Feynman’s discussion is the simple idea that if the hypothesis disagrees with the experiment, it’s wrong. But what if the conclusion or result of the experiment is what’s wrong, as we’ve been discussing and is all too likely? Here’s Feynman:
In general we look for a new law by the following process. First we guess it. Then we compute the consequences of the guess to see what would be implied if this law that we guessed is right. Then we compare the result of the computation to nature, with experiment or experience, compare it directly with observation, to see if it works. If it disagrees with experiment it is wrong. In that simple statement is the key to science. It does not make any difference how beautiful your guess is. It does not make any difference how smart you are, who made the guess, or what his name is — if it disagrees with experiment it is wrong. That is all there is to it.
It is true that one has to check a little to make sure that it is wrong, because whoever did the experiment may have reported incorrectly, or there may have been some feature in the experiment that was not noticed, some dirt or something; or the man who computed the consequences, even though it may have been the one who made the guesses, could have made some mistake in the analysis. These are obvious remarks, so when I say if it disagrees with experiment it is wrong, I mean after the experiment has been checked, the calculations have been checked, and the thing has been rubbed back and forth a few times to make sure that the consequences are logical consequences from the guess, and that in fact it disagrees with a very carefully checked experiment. (My italics)
What Feynman described in that italicized paragraph as “obvious remarks” is another factor that’s missing in Cargo Cult Science and what I found in my reporting to be generally absent in nutrition, obesity, and chronic disease research. As I’ve written before, the guessing a new law part of science is relatively easy. Correctly understanding its consequences and then comparing them to nature and “with experiment or experience,” and, specifically, doing the experiments that constitute a meaningful, rigorous test of the hypothesis to see if it survives is the hard part.
While this process is another in which biases and preconceptions will influence decisions, it has to be done. This is where the rubbing “back and forth” to “make sure that the consequences are logical consequences from the guess” is critically important, as is fully understanding how those consequences should play out in any experiment.
Read the memoirs of accomplished scientists in any discipline, and you’ll often find them commenting that one necessary characteristic of making progress in science is a finely tuned intuition about which research and evidence needs to be ignored. The only way they can do that is by first picking it all apart in the manner Feynman suggested. The process is not dissimilar to what police investigators would do when investigating a crime, and what the defense attorneys would redo, with their very different perspective, in any criminal proceedings that follow. It’s necessary to ascertain which evidence is believable and meaningful and which evidence is not.
A healthy scientific enterprise allows for no shortcuts. This process is required with each and every experiment and observation. It can lead to accusations of cherry-picking (and often does), because here’s where researchers with different conceptions about the relevant science will have different guesses and different interpretations of the consequences of those guesses and will assess experiments and observations differently. They will almost literally see different things in the experiments and their results — they’ll tend to see what they want to see — and this is why they will come to different conclusions about their validity, about what they can tell you reliably about the validity of the competing hypotheses. If the science is a healthy one, the research will continue until no ambiguity — no reasonable rationale for disagreement — still exists.
It’s time consuming and expensive, but it is nonetheless what’s necessary to establish reliable knowledge. It has to be done.
Gary Taubes is co-founder of the Nutrition Science Initiative (NuSI) and an investigative science and health journalist. He is the author of The Case Against Sugar (2016), Why We Get Fat (2011), and Good Calories, Bad Calories (2007). Taubes was a contributing correspondent for the journal Science and a staff writer for Discover. As a freelancer, he has contributed articles to The Atlantic Monthly, The New York Times Magazine, Esquire, Slate, and many other publications. His work has been included in numerous “Best of” anthologies including The Best of the Best American Science Writing (2010). He is the first print journalist to be a three-time winner of the National Association of Science Writers Science-in-Society Journalism Award and the recipient of a Robert Wood Johnson Foundation Investigator Award in Health Policy Research. Taubes received his B.S. in physics from Harvard University, his M.S. in engineering from Stanford University, and his M.S. in journalism from Columbia University.