For example: let’s say you want to know the impact of daily jogs on happiness. You randomly instruct 80 people to either jog daily or to simply continue their regular routine. As a per protocol analyst, you drop the many treated people who did not go jogging. You keep the whole control group because it wasn’t as hard for them to follow instructions.
I didn’t realize this was a common practice, that does seem pretty bad!
Do you have a sense of how commonplace this is?
What’s depressing is that there is a known fix for this: intent-to-treat analysis. It looks at effects based on the original assignment, regardless of whether someone complied or not.
In my econometrics classes, we would have been instructed to take an instrumental variables approach, where “assignment to treatment group” is an instrument for “does the treatment”, and then you can use a two stage least squares regression to estimate the effect of treatment on outcome. (My mind is blurry on the details.)
IIUC this sounds similar to intent-to-treat analysis, except allowing you to back out the effect of actually doing the treatment, which is presumably what you care about in most cases.
I don’t have a sense of the overall prevalence, I’m curious about that too! I’ve just seen it enough in high-profile medical studies to think it’s still a big problem.
Yes this is totally related to two-stage least squares regression! The intent-to-treat estimate just gives you the effect of being assigned to treatment. The TSLS estimate scales up the intent-to-treat by the effect that the randomization had on treatment (so, e.g., if the randomization increased the share doing yoga from 10 in the control group to 50% in the treatment group, the intent-to-treat effect divided by 0.40 would give you the TSLS estimate).
I didn’t realize this was a common practice, that does seem pretty bad!
Do you have a sense of how commonplace this is?
In my econometrics classes, we would have been instructed to take an instrumental variables approach, where “assignment to treatment group” is an instrument for “does the treatment”, and then you can use a two stage least squares regression to estimate the effect of treatment on outcome. (My mind is blurry on the details.)
IIUC this sounds similar to intent-to-treat analysis, except allowing you to back out the effect of actually doing the treatment, which is presumably what you care about in most cases.
I don’t have a sense of the overall prevalence, I’m curious about that too! I’ve just seen it enough in high-profile medical studies to think it’s still a big problem.
Yes this is totally related to two-stage least squares regression! The intent-to-treat estimate just gives you the effect of being assigned to treatment. The TSLS estimate scales up the intent-to-treat by the effect that the randomization had on treatment (so, e.g., if the randomization increased the share doing yoga from 10 in the control group to 50% in the treatment group, the intent-to-treat effect divided by 0.40 would give you the TSLS estimate).