I think the general point of this piece (there can be a tradeoff between small sample sizes in RCTs and large sample size observational data) is interesting and important, but I have serious concerns with the proposed solution (control for as much as possible). Controlling for covariates doesn't always make an estimate closer to the true causal effect -- it can often introduce new biases.
As a simple example, suppose I want to know whether smoking increases the chance of mortality. I might be worried that smokers differ systematically from non-smokers in some way, and control whatever variables I can find to account for this. If I control for whether or not a person has lung cancer, however, my estimate of the treatment effect will likely get *worse*, because I'll control away a main pathway by which smoking leads to death! This kind of covariate (one influenced by treatment) is an example of a "collider", which shouldn't be controlled for in a statistical analysis. With large datasets I think it becomes easier to control for lots of things, but harder to figure out which ones are appropriate to control for and which might be colliders or otherwise not ok to control for.
Terminology correction: I changed my example midway through writing this and didn’t update the terminology to reflect the different situation — lung cancer is a *mediator*; a collider is a different kind of covariate that also shouldn’t be controlled for — a classic example would be looking at the effect of height on basketball fundamentals and conditioning by restricting to people who play in the NBA. This severely weakens or reverses the relationship because a short person who plays in the NBA must be really good at basketball to overcome the height deficiency.
A great thing about critical thinking is it allows us to have meaningful conversations about how different types of evidence may be collected to answer different questions, and that the final decision about the type of evidence which is collected will be determined by a range of considerations, including ethical and practical considerations. The Evidence-Based Medicine movement, which has been adopted more widely as Evidence-Based Practice, acknowledges the value and suitability of observational research. I've taught an undergraduate course for several years where we explicitly teach students that the "hierarchy of evidence" is not a rule, but an over-generalised heuristic. I still find it useful for introducing students to concepts like confounding and bias. I found this article to be weirdly antagonistic about the value of RCTs, and other triallist methodologies, in a way which is neither charitable nor particularly novel position to take! Which is a shame, because I think it would be more productive for us to realise the value in technologies which allow for the collection of evidence to address research questions which have been (historically) difficult to address. That is to say, rather than trying usurp RCTs at the top of the study design pyramid, wouldn't it be better to work together and see what novel methodologies may emerge?
Public Service Announcement: epidemiologists are commenting on this article outside of Substack because it has clear technical deficiencies and you should not believe everything you read. In general, we all agree that RCT are not a one-size-fit-all solution and recognize the value of causal inference, but this article is glossing over technical issues found in observational design that cannot be solved by adding more computer power and controlling as much as you can.
The general topic you would want to address (or Google) is **variable selection for causal inference**. In short, you have to mitigate the risk of controlling for variables you should *not* control for (colliders).
Interesting and provacative. I’ve certainly seen some medical labs explore causal inference methods to get more bang for their buck on their observational data.
The placebo/confounding effects are non trivial in certain intervention designs. One example is psychedelics, where it is impossible to create a placebo (and perhaps this is part of the therapeutic effect). Further, protocols like therapy are distinct from more deterministic agents like medications, which arguably necessitates a more dynamic and updating version of an RCT.
The classic criticism of adjusting for confounders, as an alternative to RCTs, is that if there's any meaningful confounder not captured by your data you're just SOL-- and the confounder you haven't thought of looking for is exactly the one you should be most worried about. I wish this article had offered a real solution to that rather than just waving its hands vaguely at larger bodies of evidence and more advanced statistical techniques.
I think the general point of this piece (there can be a tradeoff between small sample sizes in RCTs and large sample size observational data) is interesting and important, but I have serious concerns with the proposed solution (control for as much as possible). Controlling for covariates doesn't always make an estimate closer to the true causal effect -- it can often introduce new biases.
As a simple example, suppose I want to know whether smoking increases the chance of mortality. I might be worried that smokers differ systematically from non-smokers in some way, and control whatever variables I can find to account for this. If I control for whether or not a person has lung cancer, however, my estimate of the treatment effect will likely get *worse*, because I'll control away a main pathway by which smoking leads to death! This kind of covariate (one influenced by treatment) is an example of a "collider", which shouldn't be controlled for in a statistical analysis. With large datasets I think it becomes easier to control for lots of things, but harder to figure out which ones are appropriate to control for and which might be colliders or otherwise not ok to control for.
Terminology correction: I changed my example midway through writing this and didn’t update the terminology to reflect the different situation — lung cancer is a *mediator*; a collider is a different kind of covariate that also shouldn’t be controlled for — a classic example would be looking at the effect of height on basketball fundamentals and conditioning by restricting to people who play in the NBA. This severely weakens or reverses the relationship because a short person who plays in the NBA must be really good at basketball to overcome the height deficiency.
Apologies for the error!
A great thing about critical thinking is it allows us to have meaningful conversations about how different types of evidence may be collected to answer different questions, and that the final decision about the type of evidence which is collected will be determined by a range of considerations, including ethical and practical considerations. The Evidence-Based Medicine movement, which has been adopted more widely as Evidence-Based Practice, acknowledges the value and suitability of observational research. I've taught an undergraduate course for several years where we explicitly teach students that the "hierarchy of evidence" is not a rule, but an over-generalised heuristic. I still find it useful for introducing students to concepts like confounding and bias. I found this article to be weirdly antagonistic about the value of RCTs, and other triallist methodologies, in a way which is neither charitable nor particularly novel position to take! Which is a shame, because I think it would be more productive for us to realise the value in technologies which allow for the collection of evidence to address research questions which have been (historically) difficult to address. That is to say, rather than trying usurp RCTs at the top of the study design pyramid, wouldn't it be better to work together and see what novel methodologies may emerge?
Public Service Announcement: epidemiologists are commenting on this article outside of Substack because it has clear technical deficiencies and you should not believe everything you read. In general, we all agree that RCT are not a one-size-fit-all solution and recognize the value of causal inference, but this article is glossing over technical issues found in observational design that cannot be solved by adding more computer power and controlling as much as you can.
The general topic you would want to address (or Google) is **variable selection for causal inference**. In short, you have to mitigate the risk of controlling for variables you should *not* control for (colliders).
Interesting and provacative. I’ve certainly seen some medical labs explore causal inference methods to get more bang for their buck on their observational data.
The placebo/confounding effects are non trivial in certain intervention designs. One example is psychedelics, where it is impossible to create a placebo (and perhaps this is part of the therapeutic effect). Further, protocols like therapy are distinct from more deterministic agents like medications, which arguably necessitates a more dynamic and updating version of an RCT.
The classic criticism of adjusting for confounders, as an alternative to RCTs, is that if there's any meaningful confounder not captured by your data you're just SOL-- and the confounder you haven't thought of looking for is exactly the one you should be most worried about. I wish this article had offered a real solution to that rather than just waving its hands vaguely at larger bodies of evidence and more advanced statistical techniques.