Per protocol analysis as medical malpractice

“Per protocol analysis” is when medical trial researchers drop people who didn’t follow the treatment protocol. It is outrageous and must be stopped.

For example: let’s say you want to know the impact of daily jogs on happiness. You randomly instruct 80 people to either jog daily or to simply continue their regular routine. As a per protocol analyst, you drop the many treated people who did not go jogging. You keep the whole control group because it wasn’t as hard for them to follow instructions.

At this point, your experiment is ruined. You ended up with lopsided groups: people able to jog and the unfiltered control group. It would not be surprising if the eager joggers had higher happiness. This is confounded: it could be due to preexisting factors that made them more able to jog in the first place, like being healthy. You’ve thrown away the random variation that makes experiments useful in the first place.

This sounds ridiculous enough that in many fields per protocol analysis has been abandoned. But…not all fields. Enter the Harvard hot yoga study, studying the effect of hot yoga on depression.

If the jogging example sounded contrived, this study actually did the same thing but with hot yoga. The treatment group was randomly assigned to do hot yoga. Only 64% (21 of 33) of the treatment group remained in the study until the endpoint at the 8th week compared to 94% (30 of 32) of the control. They end up with striking graphs like this which could be entirely due to the selective dropping of treatment group subjects.

What’s depressing is that there is a known fix for this: intent-to-treat analysis. It looks at effects based on the original assignment, regardless of whether someone complied or not. The core principle is that every comparison should be split on the original random assignment, otherwise you risk confounding. It should be standard practice to report the intent-to-treat and many medical papers do so—at least in the appendix somewhere. The hot yoga study does not.

It might be hard to estimate this if you’re following people over time and there’s a risk of differential attrition—you’re missing data for a selected chunk of people.

Also, hot yoga could still really work! But we just don’t know from this study. And with all the buzz, there’s a good chance this paper ends up being worse than useless: leading to scaled-up trials with null findings that might’ve not been run if there had been more transparency to begin with.