One of the most commonly used tactics by Aubrey and Michael, and Maintenance Phase fans in the MP subreddit, is to dismiss a scientific finding by saying “correlation =/= causation” or saying that all of the existing studies are just correlations. In fact, this saying has become so commonplace on the internet that Slate wrote a piece in 2012 about it. Professor Fedra Negri perfectly captured this trend when she wrote: “Anyone who has attended a statistics class has heard the old adage ‘correlation does not imply causation,’ usually followed by a series of hilarious graphs showing spurious correlations. Even if we strongly agree with it, this reminder has been taken a little too far: it is repeated like a mantra to criticize every observational study as being unable to detect causation behind statistical association.” Indeed, this is how the phrase is used in the MP universe. It has been uncritically adopted as a blunt instrument to reject any scientific finding from an observational study that Aubrey, Michael, and MP fans do not agree with. Almost every time this phrase is used by the hosts or redditors, it is used incorrectly. To illustrate, let’s take a step back and define correlation and causation.
Correlation is a very specific statistical measure that expresses the strength of the relationship between two continuous variables. Correlation is almost always linear, though it is possible to compute nonlinear correlations.
Causation is, in abstract, easy to conceptualize, though there is no single succinct epidemiologic definition. For simplicity, we can just think of causality as an exposure that produces an effect.
There are so many examples on the internet of graphs that present this “correlation =/= causation” phenomenon that people have accepted it uncritically, without considering that just because correlation does not always imply causation, it sometimes does! Peder Misager, a professor at Oslo New University College, very eloquently described the issue in a blog post from 2023: “Seasoned scientists will often scold students, journalists and other fellow humans for jumping to such a conclusion. ‘Correlation does not equal causation!’ they warn. This might lead to the impression that all inferences about causation from correlation are somehow wrong or illogical. But that is too harsh. Using correlation data for causal inference is, in principle, a perfectly logical thing to do.” Moreover, causation can be present even in the absence of correlation!
Ok, to return to the original point, very few, if any, studies that are mentioned in MP episodes assess correlation. Almost all of these studies assess “associations” (statistical relationships*). Without getting too into the statistical weeds, the measures of association that are most commonly used in epidemiology are ratios (i.e., relative risk, odds ratios). These ratios compare the observed outcome in the presence of an exposure with what we would expect the outcome to be in absence of the exposure. For example, the number of deaths in a population exposed to ivermectin compared to the number of deaths in a population not exposed to ivermectin (the outcomes in the “control”/unexposed population represent the “expected” outcomes). These are not correlations. Hopefully this explains why saying “correlation =/= causation” to try to discredit a study that didn’t even measure a correlation makes no sense.
*Statistical relationship simply means that the value of one variable provides information on the value of another variable. For example, if a study finds that people who eat more chocolate are happier, just knowing how much chocolate someone eats tells us something about how happy they are. As you all know, a statistical relationship does not mean a causal relationship. This is where the field of causal inference comes to play. There are ways to design an epidemiological study, including the use of specific statistical methods (“the methods used to estimate causal effects are not the same as those used to estimate associations”), that allow for us to estimate causal effects. People often question study results based on the use of the word “association” (sometimes correctly, sometimes incorrectly). However, as Dr. Clarence Tam aptly stated, “Sensible epidemiologists would shy away from stating that a particular X causes a particular Y, because they know that, in purely statistical terms, there is always a possibility that they could be wrong.” This is why you will rarely see causal language in epidemiology studies, but there is substantial debate about this in the epidemiology community. (I highly recommend that anyone interested in causal inference look more into Miguel Hernan’s work. A great start is his (free!) book, "Causal Inference: What If".)
EDIT: In response to the two commenters who have econometrics training and have claimed that I do not know what I am talking about, here are some resources that elucidate the fundamental differences between econometrics and epidemiology/biostatistics (i.e., it’s not just me who experiences this):
https://robjhyndman.com/hyndsight/statistics-vs-econometrics/
https://link.springer.com/article/10.1007/s10742-022-00291-x
EDIT #2: I had to block the commenters because they were crossing the line to cyber-bullying. I am not opposed to engaging in critical discussion about the content I post! But I will not continue to engage with people who cannot be respectful in their disagreement. I frankly don’t know exactly what there is to disagree with here. The point of the post is to simply explain what “correlation =/= causation” really means and why it isn’t a blunt instrument to bat away research findings you don’t agree with. Thank you to everyone who reads and engages in a respectful way!
I love how I’m not just learning exactly what Maintenance Phase gets wrong in its statistical “analysis” but also I am learning so much about statistics through this little series. Thank you!!!!
Holy geez: I really hope that your actual professional life is centred around medical/scientific communication, bc boy do ever have a gift for it.
Nothing to add (dated an epi prof for years, so have spent enough of my life teasing out the nuances of statistical methodology!), but just wanted to drop a note of thanks for taking the time to put this kind of info out into the ether in such a thorough yet easily digestible format.