The Framingham Heart Study, Part 2: The Framingham Observation

ByDr. Michael EadesJanuary 23, 2019

The Framingham Heart Study (FHS) is without argument and by design an observational study. Clearly, a group of people were recruited by researchers, evaluated for a handful of parameters, and observed over time to see what transpired. Ergo, an observational study.

There are actually a host of different types of observational studies, including prospective cohorts, retrospective cohorts, convenience cohorts, retrospective case-control studies, and a handful of others. The names of these types of studies imply rigorous science, but it must be remembered that no matter how scientific the name might sound, none of these forms of study can establish causality (7). They can only serve to generate hypotheses. And the same is true for Framingham. Let’s take a closer look at it.

Framingham, a Study in Bias

When the federal government gave a grant of US$500,000 (about $5 million in today’s dollars) as seed money to start the FHS, it did so because in the mid-20th century between one-third and one-half of Americans were dying from some form of cardiovascular disease, a term encompassing coronary artery disease, stroke, high blood pressure, and congestive heart failure. And strange as it seems today, no one at the time knew how to treat cardiovascular disease or what caused it. The early founders of the FHS decided to recruit citizens into the study from Framingham, Massachusetts, a former farming community that had become a factory town for products including General Motors cars. The citizens were mainly white, working-class people thought to be representative of the majority of the United States (3).

The idea was to enter subjects into the study, give them thorough examinations, then follow them over time, with follow-up exams every two years. Based on the original exam data and the biennial follow-up data, researchers hoped that as the patients aged, were stricken with disease, and began to die, the researchers could correlate the diseases the patients developed with the earlier findings on their exams and lab work and begin to get a sense of the cause. If strong patterns emerged from the data, all the better, as that would strengthen the notion of causality (2).

FHS staff recruited around 5,000 subjects, who comprised the first cohort along with about 300 volunteers, and the initial exams began.

At first blush, this seems like a reasonable way to start. But was it?

In order for a study like this to generate accurate data, the subjects need to represent the overall group as closely as possible. In this case, mainly working-class people were recruited and volunteered on their own, which creates a handful of problems.

First, Framingham was at the time home to wealthy and poor people. It’s well known that wealthy people are healthier and live longer than those who are less wealthy, and it’s well established that poor people tend to be sicker and die earlier than those who have more. So the working-class people who were the majority of the subjects didn’t really represent Framingham, much less the rest of Massachusetts or the rest of the country.

Second—believe it or not—people who are recruited into studies are different from those who refuse to participate. As physician and researcher Lars Werkö showed in a similar study in Sweden, “The mortality from cardiovascular disease and other usually not well defined causes for death is several times higher in a Swedish city population among those not answering an invitation for health examination than in those coming to the investigation.” (8)

This is a common finding in these kinds of studies. It makes sense that people who are more interested in their own health would respond to an offer of a free comprehensive medical examination.

Third, volunteers versus those actively recruited are even more interested in their health and thus skew the data even more than recruited subjects.

Finally, not only is the study population non-representational, but it is also small. Tiny, in fact. Five thousand subjects might seem like a lot, but when compared to the more than 160,000 subjects recruited into the Women’s Health Initiative, a 15-year study launched in 1991 by the National Institutes of Health, it really isn’t. And remember that the FHS investigators continuously subdivided these 5,000 subjects into smaller and smaller subgroups for various phases of the study. The smaller the group, the more difficult to get decent, statistically significant data.

So the FHS launches with a small study group that isn’t representative of even Framingham, let alone the rest of humanity—the group to which the FHS findings will be applied. And it gets worse.

As mentioned above, the researchers undertaking this study had no idea what was causing the cardiovascular disease responsible for a huge swath of annual U.S. deaths. The FHS was initiated to try to find a possible cause or causes. Assume you are one of the researchers setting up this study—what are you going to look for? Remember, you don’t have a clue as to what’s causing cardiovascular disease, yet you’re designing a study to generate data you hope will reveal a cause somewhere along the way. But you don’t know which data are going to come to light and elucidate some connection to the disease being studied. How do you determine potential risk factors? How do you decide what to look for? How do you decide what to measure?

At the start of the FHS, researchers decided to monitor blood pressure because it was easy to check and it seemed reasonable that it could be a driving force. Based on autopsy studies showing cholesterol to be a component of arterial plaque and decades-old studies demonstrating that rabbits fed cholesterol developed plaque, the investigators decided to look at cholesterol. They also looked at diet to see if there were any differences between those who developed cardiovascular disease and those who didn’t.

The FHS began with about 5,000 white, working-class subjects who were 28 to 62 years old and more interested in their own health than the average citizen. They were to be tested for factors that may or may not have any bearing on the development of cardiovascular disease.

Immediately, the physicians running the FHS found themselves in an ethical dilemma. They discovered patients who already had high blood pressure and/or elevated cholesterol. At this point, no one knew if high blood pressure or elevated cholesterol causes cardiovascular disease—that’s what the FHS was designed to determine. But the investigators felt both were likely risk factors, so they revealed the findings to the patients and suggested they talk to their own doctors about it. If the patients did and the doctors made recommendations or treated their patients with drugs, the data becomes further confounded.

And all this is just at the start of the study.

Russell Smith, in his magisterial two-volume “Diet, Blood Cholesterol, and Coronary Heart Disease: A Clinical Review of the Literature,” examines in great depth the four main problematic issues with the FHS. (9)

“Problems associated with the study may be categorized as 1. methodological, 2. analytical, 3. presentational, and 4. interpretation error. The first problem can generate error independent of the investigators, while the latter three problems can be or are definitely generated by the study investigators.”

Methodological Flaws

We’ve already looked at some of the methodological errors in the selection of subjects who weren’t really representative of their own community or the world at large.

The decision to inform the patients’ physicians about any adverse health findings was the major methodological error. As Smith points out:

While such subjects and their physicians may not have acted on that information, the likelihood is that many, if not the majority, did act on it, particularly as Framingham investigators have established considerable credibility, rightly or wrongly. Although the disclosure of the subjects’ health status to their physicians can be considered to be an ethical necessity, it is a scientifically atrocious procedure. The entire 40-year (now 70-year) Framingham program was and is designed to produce confounded results, and Framingham investigators neither attempt to determine the extent of confounding or even address the problem. (9)

In studies such as the FHS, subjects are evaluated at the start—the so-called baseline in terms of known (or in this case potential) risk factors—and are followed for some period of time as per study design. After this time, statisticians calculate correlations to determine the strength of the relationship between the risk factors and the presence or absence of cardiovascular disease. For these calculations to be meaningful, the baseline measurements must remain stable for some period of time. For instance, if you find a total cholesterol of 220 mg/dL on the initial measurement, then find one of 175 mg/dL on the first follow-up and one of 250 mg/dL on the second follow-up, you don’t really have a baseline. It has been known for some time that cholesterol measurements fluctuate from day to day, and if subjects are then treated by their own physicians—causing even more fluctuation—the data from the follow-up exams becomes worthless.

The changing cholesterol levels proved to be a challenge, leading Harold Kahn and Thomas Dawber (two of the early investigators) to question the validity of even using cholesterol as a risk factor. From Smith:

Although admitting that blood cholesterol levels were changing after baseline, Kahn and Dawber drew a most dubious and unreasonable conclusion, i.e., “On the basis of the present report, it would be premature to judge whether a long-term prospective study of coronary disease benefits sufficiently from periodic cholesterol measurements in the study population to justify the efforts involved.” In effect, knowing that blood cholesterol levels had changed over a relatively short period of time, Framingham investigators nevertheless went on to correlate baseline measures with CHD events over periods of 10, 20 and even 30 years. There is absolutely no question that the changing cholesterol levels produced errors in the Framingham data, and the magnitude of the error undoubtedly increased with time. (9)

Smith offered another interesting observation after his critical read of the early Framingham literature. After discussing the fact that cholesterol measurements fluctuate from measurement to measurement and within individuals from day to day, he points out that:

Dawber (in one of these papers) offered a most curious and unreasonable observation, i.e., “Although the Framingham study utilized only one (blood cholesterol) determination at initial examination, and this measurement quite accurately classified the population, in retrospect it would have been better to have performed several tests before a classification was made.” In the first place, since several tests were not performed, Dawber could not logically draw the conclusion that the one measurement “quite accurately classified the population.” In the second place, if the single measurement did indeed “quite accurately classify the population,” why would it “have been better to have performed several tests before classification”(?) If one reads between the lines, Dawber was really saying that several tests should have been performed because we have all learned that single tests yield highly inaccurate results. But, in fact, Dawber had to say that their single test “quite accurately classified the population” because anything less would be an admission that decades of follow-up studies were based on inaccurate data at entry. (9)

As should be evident from these methodological errors, the FHS data are next to worthless. Yet each and every day, thousands of physicians worldwide use FHS data to determine risk and start patients on costly and sometimes perilous drugs to treat those purported risk factors that have been defined by this patently sloppy research.

“The sad truth is that Framingham was from inception burdened with methodological errors that were compounded by analytical and presentational errors.”

Analytical Flaws

If we look at the analytical errors, the problem worsens. Analytical errors involve an enormous amount of sophisticated statistics, so I’ll make this part mercifully short.

As the FHS went on, the statistical calculations became more and more complex, which, in my view, is a dead giveaway that the data didn’t really show what the investigators wanted it to. It’s a common saying among researchers that if you torture the data enough, it will confess. That maxim is on clear display in the FHS.

As Werkö observed about the publication of successive papers, “The tendency has been to use more and more elaborate statistical methods and less of the primary figures in consecutive publications.” (8)

Often, statistical techniques were mentioned but not defined. It is likely that the investigators tasked the statisticians with using whatever statistical methods they could come up with to generate a significant association between risk factors and the onset of cardiovascular disease.

As Smith commented, with reference to early FHS directors, “It is doubtful that either (Dr. William) Castelli, (Dr. William) Kannel or the typical reader had more than a cursory understanding of such equations, logs, regressions, etc.” (9)

It should be apparent that if you have shoddy data, running it through a series of statistical programs isn’t going to correct it. It’s simply cargo-cult statistics; i.e., the appearance of legitimate statistical analysis but in reality devoid of any meaning.

Good data showing a definite correlation between a risk factor and a disease outcome are easy to present graphically. Take smoking and deaths from lung cancer, for example. If you put the average number of cigarettes smoked per day on one axis of a graph and deaths from lung cancer on the other, you’ll have a beautiful linear curve that defines the relationship between the two. If, on the other hand, you try that with dodgy data, you’ll end up with a line that goes all over the place. It will look something like a web spun by a spider on LSD, which should put you on notice that your data isn’t all that convincing.

If you start fudging and change the intervals on one axis or another or use some scale other than the true relationship between the two variables, you can create a cargo-cult graph that will appear to show what you want it to show, but it won’t be valid.

The Framingham investigators strive to avoid publishing the true relationship between, say, total cholesterol and cardiovascular disease. Instead they prefer to use relative risk scales or morbidity ratios, which serve to obfuscate the real relationship, which often doesn’t exist.

The sad truth is that Framingham was from inception burdened with methodological errors that were compounded by analytical and presentational errors. We will conclude our interpretation of the Framingham study in Part 3.

References

Note: These references include those previously published in “The Framingham Heart Study, Part 1.”

1. Bruenn HG. Clinical notes on the illness and death of President Franklin D. Roosevelt. Annals of Internal Medicine 72(4): 579-591, 1970. Available here.

2. Dawber TR. The Framingham Study: The Epidemiology of Atherosclerotic Disease. Cambridge, Mass.: Harvard University Press, 1980.

3. Mahmood SS, Levy D, Vasan R et al. The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. The Lancet 383(9921): 999-1008, 2014. Available here.

4. Brody JE. Scientist at work: William Castelli; preaching the gospel of healthy hearts. The New York Times. Feb. 8, 1994. Available here.

5. Mann GV. Coronary Heart Disease: The Dietary Sense and Nonsense. Cambridge, England: Janus Publishing Company, 1993.

6. Feynman RP. Cargo cult science. Engineering and Science 37(7): 10-13, 1974. Available here.

7. Feinstein AR. Scientific standards in epidemiologic studies of the menace of daily life. Science 242(4883): 1257-1263, 1988. Available here.

8. Werkö L. Risk factors and coronary heart disease—facts or fancy? American Heart Journal 91(1): 87-98, 1976. Available here.

9. Smith RL. Diet, Blood Cholesterol, and Coronary Heart Disease: A Critical Review of the Evidence. Sherman Oaks, Calif.: Vector Enterprises, 1991. Vol. 2 available here.

All links accessed Jan. 22, 2019.

Comments on The Framingham Heart Study, Part 2: The Framingham Observation

3 Comments

Comment thread URL copied!

Back to 190124

Sort

Matthieu Dubreucq

November 4th, 2019 at 5:54 am

Commented on: The Framingham Heart Study, Part 2: The Framingham Observation

I have a hard time to believe that we still use the Framingham to evaluate the risk factors of heart diseases.  It is like we forgot to do our homework and validate a study.  Unfortunately the general population, me included, usually rely on our doctor to do his homework and be critique of science if need be.  This makes me appreciate even more to see the amount of MDL1 attendees.

Comment URL copied!

Gary Taubes

January 25th, 2019 at 4:19 am

Commented on: The Framingham Heart Study, Part 2: The Framingham Observation

Reading Dr. Eades’s review/assessment of the Framingham study, two points come to mind. One is what Dr. Eades meant by “cargo cult,” which I’ll get to, and the other is implication of the word “strong” in this statement:

“…the researchers could correlate the diseases the patients developed with the earlier findings on their exams and lab work and begin to get a sense of the cause. If strong patterns emerged from the data, all the better, as that would strengthen the notion of causality.”

This speaks to the previous discussion of the utility of observational (cohort) studies like Framingham in inferring causality. Framingham was launched concurrently with the case-control studies in the late 1940s that linked cigarette smoking to lung cancer (by Doll and Hill in the UK and Wynder in the U.S.). That link could be called a “strong pattern,” as smoking is a relatively rare disease in non-smokers and a common disease in smokers. One in ten get it. The more you smoke, and the longer you smoke, the greater the risk. When epidemiologists like the Framingham folks set about doing their cohort studies, at least from my journalistic perspective (I first wrote about problems with this kind of epidemiology for Sciencein the mid-90s), these are the kinds of associations they thought they might find. If they worried that this might not be possible with common diseases like heart disease, I could never find it in their early discussions. But what they have found in these cohort studies are not 10x or 20x increased risks of disease, but rather associations one to two orders of magnitude smaller. Very subtle increases along the lines of a doubling of a risk. These subtle increases can be explained by a whole host of factors, from the kind of methodological and analytic biases introduced by the research itself to the confounders that are often the unknown unknowns in this business. This is what Dr. Eades is setting up in this assessment. While Framingham is famous for linking high cholesterol and high blood pressure to heart disease, half a century later we finally accept that the link with LDL cholesterol may indeed be causal (via Mendelian randomization studies being maybe the final nail in that coffin, maybe) but there’s still no real understanding of the diet or lifestyle factors driving that. And while high blood pressure certainly increases risk of cardiovascular and cerebral vascular disease, its role in metabolic syndrome and the diet/lifestyle trigger of high blood pressure have also remained unresolved. The catch with all these studies from Framingham onward, is these people were making up the methodologies as they went along. They didn’t have the benefit of hindsight that we have, and no one had ever done this stuff before. They assumed it was easy and when they found out it wasn’t, they were too far along to accept reality. Or at least that’s my take.

So now let’s get to Dr. Eades’s use of the term “cargo cult” to describe pseudoscientific statistical procedures. He doesn’t tell us his origin in this post, but it comes from this commencement address given by Richard Feynman, the Nobel Laureate physicist, at Caltech in 1974:
http://calteches.library.caltech.edu/51/2/CargoCult.htm

Everyone interested in science should read and maybe reread (and reread) that commencement address. Feynman understood as much as anyone alive what it took to do science right – i.e., to establish reliable knowledge – and this commencement address captures the essence. What’s specifically relevant here is his discussion of the mouse experiments by “a man named Young” in which Young methodically (seems like it could have taken him a few years) ruled out all alternative explanations for what he was seeing in his initial experiment, just so he could establish what he had to do to do the experiment he wanted to do correctly. The kind of criticism of Framingham that Dr. Eades quotes from Russell Smith in this post can sound like nitpicking and the work of a researcher who’s trying to explain away a result he (or she) doesn’t like, but its precisely this kind of nit-picking that is a necessary ingredient of all the best science. It says, “here are all the ways (that I can think of) that we might be misinterpreting what we’re seeing and here’s how to redo the experiment (or observation) such that we have a higher likelihood of trusting the results.” Every new experiment or observation, though, is likely to bring new ideas about how you’re misinterpreting it and mistakes you’ve made. Like Young’s mouse experiments, you just keep doing them again and again until neither you nor anyone else you know can think of how you might have f*cked up — an alternative explanation for what you think you’ve observed.

In public health science like Framingham and virtually all the nutrition studies (the subject I think I know best), this kind of critical assessment is absent because it is considered all too easy to do — not the studies, the nitpicking – and who has the money, after all, to repeat the observations, let alone the time (decades) to do it? So my take as a journalist is that the researchers themselves have convinced themselves not to be critical, and, as a result, they’re doing cargo cult science. As Feynman says he could see a danger of this kind of delusion happening even in physics, and that was 1974. It certainly happened in nutrition and chronic disease and public health research. The question is to what extent can any studies be trusted, and what, if anything, can be done about it.

Comment URL copied!

Clarke Read

January 24th, 2019 at 9:50 pm

Commented on: The Framingham Heart Study, Part 2: The Framingham Observation

"As mentioned above, the researchers undertaking this study had no idea what was causing the cardiovascular disease responsible for a huge swath of annual U.S. deaths. The FHS was initiated to try to find a possible cause or causes. Assume you are one of the researchers setting up this study–what are you going to look for? Remember, you don’t have a clue as to what’s causing cardiovascular disease, yet you’re designing a study to generate data you hope will reveal a cause somewhere along the way. But you don’t know which data are going to come to light and elucidate some connection to the disease being studied. How do you determine potential risk factors? How do you decide what to look for? How do you decide what to measure?"

This statement by itself points to both the theoretical value of FHS and where it has subsequently gone wrong. Beginning with what the FHS researchers knew when the study started, a large observational study is a solid recommendation - if we don't even know what to test, it allows us to observe a variety of variables that may be correlated with (and so subsequently could be shown to be causal in) a disease we care about. But these observations, unless they're exceptionally compelling, ought to be considered hypothesis-generating - ideas that subsequent studies directly testing each will confirm or refute. It is far from valueless, but it also isn't information we should be directly presenting to patients or clinicians, at least without secondary substantiation. The risk of an observed correlation not representing causation - whether due to methodological errors, confounders, idiosyncrasies in the population or chance - is significant. The desire to give patients the best treatment science can support is reasonable, but to intervene based on this sort of data would be like giving patients a pill without a clear understanding of either its probability of effectiveness or its side effects.

Many of the "atrocious" behaviors Smith & Eades point out would look less atrocious if the study were viewed in this narrow context. For example, notifying subjects' physicians of their health status is less problematic when any relationship between heart disease and any factors that may be affected by physician behavior (e.g., cholesterol) is merely highlighted for subsequent study, rather than being used to directly inform clinical goals.

Unfortunately, Framingham data has not been used in this way. And this interpretation and use of the data, moreso than the study itself, may be the most important problem.

Comment URL copied!