The Framingham Heart Study (FHS) is without argument and by design an observational study. Clearly, a group of people were recruited by researchers, evaluated for a handful of parameters, and observed over time to see what transpired. Ergo, an observational study.
There are actually a host of different types of observational studies, including prospective cohorts, retrospective cohorts, convenience cohorts, retrospective case-control studies, and a handful of others. The names of these types of studies imply rigorous science, but it must be remembered that no matter how scientific the name might sound, none of these forms of study can establish causality (7). They can only serve to generate hypotheses. And the same is true for Framingham. Let’s take a closer look at it.
Framingham, a Study in Bias
When the federal government gave a grant of US$500,000 (about $5 million in today’s dollars) as seed money to start the FHS, it did so because in the mid-20th century between one-third and one-half of Americans were dying from some form of cardiovascular disease, a term encompassing coronary artery disease, stroke, high blood pressure, and congestive heart failure. And strange as it seems today, no one at the time knew how to treat cardiovascular disease or what caused it. The early founders of the FHS decided to recruit citizens into the study from Framingham, Massachusetts, a former farming community that had become a factory town for products including General Motors cars. The citizens were mainly white, working-class people thought to be representative of the majority of the United States (3).
The idea was to enter subjects into the study, give them thorough examinations, then follow them over time, with follow-up exams every two years. Based on the original exam data and the biennial follow-up data, researchers hoped that as the patients aged, were stricken with disease, and began to die, the researchers could correlate the diseases the patients developed with the earlier findings on their exams and lab work and begin to get a sense of the cause. If strong patterns emerged from the data, all the better, as that would strengthen the notion of causality (2).
FHS staff recruited around 5,000 subjects, who comprised the first cohort along with about 300 volunteers, and the initial exams began.
At first blush, this seems like a reasonable way to start. But was it?
In order for a study like this to generate accurate data, the subjects need to represent the overall group as closely as possible. In this case, mainly working-class people were recruited and volunteered on their own, which creates a handful of problems.
First, Framingham was at the time home to wealthy and poor people. It’s well known that wealthy people are healthier and live longer than those who are less wealthy, and it’s well established that poor people tend to be sicker and die earlier than those who have more. So the working-class people who were the majority of the subjects didn’t really represent Framingham, much less the rest of Massachusetts or the rest of the country.
Second—believe it or not—people who are recruited into studies are different from those who refuse to participate. As physician and researcher Lars Werkö showed in a similar study in Sweden, “The mortality from cardiovascular disease and other usually not well defined causes for death is several times higher in a Swedish city population among those not answering an invitation for health examination than in those coming to the investigation.” (8)
This is a common finding in these kinds of studies. It makes sense that people who are more interested in their own health would respond to an offer of a free comprehensive medical examination.
Third, volunteers versus those actively recruited are even more interested in their health and thus skew the data even more than recruited subjects.
Finally, not only is the study population non-representational, but it is also small. Tiny, in fact. Five thousand subjects might seem like a lot, but when compared to the more than 160,000 subjects recruited into the Women’s Health Initiative, a 15-year study launched in 1991 by the National Institutes of Health, it really isn’t. And remember that the FHS investigators continuously subdivided these 5,000 subjects into smaller and smaller subgroups for various phases of the study. The smaller the group, the more difficult to get decent, statistically significant data.
So the FHS launches with a small study group that isn’t representative of even Framingham, let alone the rest of humanity—the group to which the FHS findings will be applied. And it gets worse.
As mentioned above, the researchers undertaking this study had no idea what was causing the cardiovascular disease responsible for a huge swath of annual U.S. deaths. The FHS was initiated to try to find a possible cause or causes. Assume you are one of the researchers setting up this study—what are you going to look for? Remember, you don’t have a clue as to what’s causing cardiovascular disease, yet you’re designing a study to generate data you hope will reveal a cause somewhere along the way. But you don’t know which data are going to come to light and elucidate some connection to the disease being studied. How do you determine potential risk factors? How do you decide what to look for? How do you decide what to measure?
At the start of the FHS, researchers decided to monitor blood pressure because it was easy to check and it seemed reasonable that it could be a driving force. Based on autopsy studies showing cholesterol to be a component of arterial plaque and decades-old studies demonstrating that rabbits fed cholesterol developed plaque, the investigators decided to look at cholesterol. They also looked at diet to see if there were any differences between those who developed cardiovascular disease and those who didn’t.
The FHS began with about 5,000 white, working-class subjects who were 28 to 62 years old and more interested in their own health than the average citizen. They were to be tested for factors that may or may not have any bearing on the development of cardiovascular disease.
Immediately, the physicians running the FHS found themselves in an ethical dilemma. They discovered patients who already had high blood pressure and/or elevated cholesterol. At this point, no one knew if high blood pressure or elevated cholesterol causes cardiovascular disease—that’s what the FHS was designed to determine. But the investigators felt both were likely risk factors, so they revealed the findings to the patients and suggested they talk to their own doctors about it. If the patients did and the doctors made recommendations or treated their patients with drugs, the data becomes further confounded.
And all this is just at the start of the study.
Russell Smith, in his magisterial two-volume “Diet, Blood Cholesterol, and Coronary Heart Disease: A Clinical Review of the Literature,” examines in great depth the four main problematic issues with the FHS. (9)
“Problems associated with the study may be categorized as 1. methodological, 2. analytical, 3. presentational, and 4. interpretation error. The first problem can generate error independent of the investigators, while the latter three problems can be or are definitely generated by the study investigators.”
We’ve already looked at some of the methodological errors in the selection of subjects who weren’t really representative of their own community or the world at large.
The decision to inform the patients’ physicians about any adverse health findings was the major methodological error. As Smith points out:
While such subjects and their physicians may not have acted on that information, the likelihood is that many, if not the majority, did act on it, particularly as Framingham investigators have established considerable credibility, rightly or wrongly. Although the disclosure of the subjects’ health status to their physicians can be considered to be an ethical necessity, it is a scientifically atrocious procedure. The entire 40-year (now 70-year) Framingham program was and is designed to produce confounded results, and Framingham investigators neither attempt to determine the extent of confounding or even address the problem. (9)
In studies such as the FHS, subjects are evaluated at the start—the so-called baseline in terms of known (or in this case potential) risk factors—and are followed for some period of time as per study design. After this time, statisticians calculate correlations to determine the strength of the relationship between the risk factors and the presence or absence of cardiovascular disease. For these calculations to be meaningful, the baseline measurements must remain stable for some period of time. For instance, if you find a total cholesterol of 220 mg/dL on the initial measurement, then find one of 175 mg/dL on the first follow-up and one of 250 mg/dL on the second follow-up, you don’t really have a baseline. It has been known for some time that cholesterol measurements fluctuate from day to day, and if subjects are then treated by their own physicians—causing even more fluctuation—the data from the follow-up exams becomes worthless.
The changing cholesterol levels proved to be a challenge, leading Harold Kahn and Thomas Dawber (two of the early investigators) to question the validity of even using cholesterol as a risk factor. From Smith:
Although admitting that blood cholesterol levels were changing after baseline, Kahn and Dawber drew a most dubious and unreasonable conclusion, i.e., “On the basis of the present report, it would be premature to judge whether a long-term prospective study of coronary disease benefits sufficiently from periodic cholesterol measurements in the study population to justify the efforts involved.” In effect, knowing that blood cholesterol levels had changed over a relatively short period of time, Framingham investigators nevertheless went on to correlate baseline measures with CHD events over periods of 10, 20 and even 30 years. There is absolutely no question that the changing cholesterol levels produced errors in the Framingham data, and the magnitude of the error undoubtedly increased with time. (9)
Smith offered another interesting observation after his critical read of the early Framingham literature. After discussing the fact that cholesterol measurements fluctuate from measurement to measurement and within individuals from day to day, he points out that:
Dawber (in one of these papers) offered a most curious and unreasonable observation, i.e., “Although the Framingham study utilized only one (blood cholesterol) determination at initial examination, and this measurement quite accurately classified the population, in retrospect it would have been better to have performed several tests before a classification was made.” In the first place, since several tests were not performed, Dawber could not logically draw the conclusion that the one measurement “quite accurately classified the population.” In the second place, if the single measurement did indeed “quite accurately classify the population,” why would it “have been better to have performed several tests before classification”(?) If one reads between the lines, Dawber was really saying that several tests should have been performed because we have all learned that single tests yield highly inaccurate results. But, in fact, Dawber had to say that their single test “quite accurately classified the population” because anything less would be an admission that decades of follow-up studies were based on inaccurate data at entry. (9)
As should be evident from these methodological errors, the FHS data are next to worthless. Yet each and every day, thousands of physicians worldwide use FHS data to determine risk and start patients on costly and sometimes perilous drugs to treat those purported risk factors that have been defined by this patently sloppy research.
“The sad truth is that Framingham was from inception burdened with methodological errors that were compounded by analytical and presentational errors.”
If we look at the analytical errors, the problem worsens. Analytical errors involve an enormous amount of sophisticated statistics, so I’ll make this part mercifully short.
As the FHS went on, the statistical calculations became more and more complex, which, in my view, is a dead giveaway that the data didn’t really show what the investigators wanted it to. It’s a common saying among researchers that if you torture the data enough, it will confess. That maxim is on clear display in the FHS.
As Werkö observed about the publication of successive papers, “The tendency has been to use more and more elaborate statistical methods and less of the primary figures in consecutive publications.” (8)
Often, statistical techniques were mentioned but not defined. It is likely that the investigators tasked the statisticians with using whatever statistical methods they could come up with to generate a significant association between risk factors and the onset of cardiovascular disease.
As Smith commented, with reference to early FHS directors, “It is doubtful that either (Dr. William) Castelli, (Dr. William) Kannel or the typical reader had more than a cursory understanding of such equations, logs, regressions, etc.” (9)
It should be apparent that if you have shoddy data, running it through a series of statistical programs isn’t going to correct it. It’s simply cargo-cult statistics; i.e., the appearance of legitimate statistical analysis but in reality devoid of any meaning.
Good data showing a definite correlation between a risk factor and a disease outcome are easy to present graphically. Take smoking and deaths from lung cancer, for example. If you put the average number of cigarettes smoked per day on one axis of a graph and deaths from lung cancer on the other, you’ll have a beautiful linear curve that defines the relationship between the two. If, on the other hand, you try that with dodgy data, you’ll end up with a line that goes all over the place. It will look something like a web spun by a spider on LSD, which should put you on notice that your data isn’t all that convincing.
If you start fudging and change the intervals on one axis or another or use some scale other than the true relationship between the two variables, you can create a cargo-cult graph that will appear to show what you want it to show, but it won’t be valid.
The Framingham investigators strive to avoid publishing the true relationship between, say, total cholesterol and cardiovascular disease. Instead they prefer to use relative risk scales or morbidity ratios, which serve to obfuscate the real relationship, which often doesn’t exist.
The sad truth is that Framingham was from inception burdened with methodological errors that were compounded by analytical and presentational errors. We will conclude our interpretation of the Framingham study in Part 3.
Note: These references include those previously published in “The Framingham Heart Study, Part 1.”
1. Bruenn HG. Clinical notes on the illness and death of President Franklin D. Roosevelt. Annals of Internal Medicine 72(4): 579-591, 1970. Available here.
2. Dawber TR. The Framingham Study: The Epidemiology of Atherosclerotic Disease. Cambridge, Mass.: Harvard University Press, 1980.
3. Mahmood SS, Levy D, Vasan R et al. The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. The Lancet 383(9921): 999-1008, 2014. Available here.
4. Brody JE. Scientist at work: William Castelli; preaching the gospel of healthy hearts. The New York Times. Feb. 8, 1994. Available here.
5. Mann GV. Coronary Heart Disease: The Dietary Sense and Nonsense. Cambridge, England: Janus Publishing Company, 1993.
6. Feynman RP. Cargo cult science. Engineering and Science 37(7): 10-13, 1974. Available here.
7. Feinstein AR. Scientific standards in epidemiologic studies of the menace of daily life. Science 242(4883): 1257-1263, 1988. Available here.
8. Werkö L. Risk factors and coronary heart disease—facts or fancy? American Heart Journal 91(1): 87-98, 1976. Available here.
9. Smith RL. Diet, Blood Cholesterol, and Coronary Heart Disease: A Critical Review of the Evidence. Sherman Oaks, Calif.: Vector Enterprises, 1991. Vol. 2 available here.
All links accessed Jan. 22, 2019.