I appreciate this kind of detailed inspection and science writing, we need more of this in the world!
I’m writing this comment because of the expressed disdain for regressions. I do share the disappointment about how the randomization and results turned out. But for both, my refrain will be: “that’s what the regression’s for!”
This contains the same data, but stratified by if people were obese or not:
Now it looks like semaglutide isn’t doing anything.
The beauty of exploratory analyses like these is that you can find something interesting. The risk is that you can also read into noise. Unfortunately, all they did was plot these results, not report the regression, which could tell us whether there is any effect beyond the lower baseline. eTable3 confirms that the interaction between condition and week is non-significant for most outcomes, which the authors correctly characterized. That’s what the regression’s for!
This means the results are non-randomized.
Yes and no. People were still randomized to condition and it appears to be pretty even attrition. Yes, there is an element of self-selection, which can constrain the generalizability (i.e., external validity) of the results (I’d say most of the constraint is actually just due to studying people with AUD rather than the general population, but you can see why they’d do such a thing), but that does not necessarily mean it broke the randomization, which would reduce the ability to interpret differences as a result of the treatment (i.e., internal validity). To the extent that you want to control for differences that happen to occur or have been introduced between the conditions, you’ll need to run a model to covary those out. That’s what the regression’s for!
the point of RCTs is to avoid resorting to regression coefficients on non-randomized samples
My biggest critique is this. If you take condition A and B and compute/plot mean outcomes, you’d presumably be happy that it’s data. But computing/plotting predicted values from a regression of outcome on condition would directly recover those means. And from what we’ve seen above, adjustment is often desirable. Sometimes the raw means are not as useful as the adjusted/estimated means—to your worry about baseline differences, the regression allows us to adjust for that (i.e., provide statistical control where experimental control was not sufficient). And, instead of eyeballing plots, the regressions help tell you if something is reliable. The point of RCTs is not to avoid resorting to regression coefficients. You’ll run regressions in any case! The point of RCTs is to reduce the load your statistical controls will be expected to lift by utilizing experimental controls. You’ll still need to analyze the data and implement appropriate statistical controls. That’s what the regression’s for!
Thanks for the response! I must protest that I think I’m being misinterpreted a bit. Compare my quote:
the point of RCTs is to avoid resorting to regression coefficients on non-randomized sample
To the:
The point of RCTs is not to avoid resorting to regression coefficients.
The “non-randomized sample” part of that quote is important! If semaglutide had no impact on the decision to participate, then we can argue about about the theory of regressions. Yes, the fraction that participated happened to be close, but with small numbers that could easily happen by chance. The hypothesis of this research is that semaglutide would reduce the urge to drink! If the decision to participate was random, and I believed the conclusion of the experiment, then that conclusion would seem to imply that the decision to participate wasn’t random after all. It just seems incredibly strange to assume that there’s no impact of semaglutide on the probability of agreeing to the experiment, and very unlikely the other variables in the regression fix this, which is why I’m dubious that the regression coefficients reflect any causal relationship.
That said, I think the participation bias could go in either direction. I said (and maintain) that the lab experiment does provide some evidence in favor of semaglutide’s effectiveness. I just think that given the non-random selection, small sample, and general weirdness of having people drink in a room in a hospital as a measurement, it’s quite weak evidence. Given the dismal results from the drinking records (which have less of all of these issues) I think that makes the overall takeaway from this paper pretty negative.
I guess I misunderstood you. I figured that without “regression coefficients,” the sentence would be a bit tautological: “the point of randomized controlled trial is to avoid [a] non-randomized sample,” and there were other bits that made me think you had an issue with both selection bias (agree) and regressions (disagree).
I share your overall takeaway, but at this point I am just genuinely curious why the self-selection is presumed to be such a threat to internal validity here. I think we need more attention to selection effects on the margin, but I also think there is a general tendency for people to believe that once they’ve identified a selection issue the results are totally undermined. What is the alternative explanation for why semaglutide would disincline people who would have had small change scores from participating or incline people who have large change scores to participate (remember, this is within-subjects) in the alcohol self-administration experiment? Maybe those who had the most reduced cravings wanted to see more of what these researchers could do? But that process would also occur among placebo, so it’d work via the share of people with large change scores being greater in the semaglutide group, which is...efficacy. There’s nuance there, but hard to square with lack of efficacy.
That said, still agree that the results are no slam dunk. Very specific population, very specific outcomes affected, and probably practically small effects too.
> What is the alternative explanation for why semaglutide would disincline people who would have had small change scores from participating or incline people who have large change scores to participate (remember, this is within-subjects) in the alcohol self-administration experiment?
I’m a bit unsure what the non-alternative explanation is here. But imagine that semaglutide does not reduce the urge to drink but—I don’t know—makes people more patient, or makes them more likely to agree to do things doctors ask them to do, or makes them more greedy. Then take the “marginal” person, who is just on the border of participating or not. If those marginal people drink less on average, then semaglutide would look good purely due to changing selection rather than actually reducing drinking.
Now, I don’t claim that the above story is true. It’s possible, but lots of other stories are also possible, including ones where the bias could go in the other way.
I also think there is a general tendency for people to believe that once they’ve identified a selection issue the results are totally undermined.
I expected this sentence to be followed by you praising me for explicitly disavowing such a view and stating that, since the bias could be in either direction, the lab experiment does provide some evidence in favor of semaglutide. :) (Just very weak evidence.)
I appreciate this kind of detailed inspection and science writing, we need more of this in the world!
I’m writing this comment because of the expressed disdain for regressions. I do share the disappointment about how the randomization and results turned out. But for both, my refrain will be: “that’s what the regression’s for!”
The beauty of exploratory analyses like these is that you can find something interesting. The risk is that you can also read into noise. Unfortunately, all they did was plot these results, not report the regression, which could tell us whether there is any effect beyond the lower baseline. eTable3 confirms that the interaction between condition and week is non-significant for most outcomes, which the authors correctly characterized. That’s what the regression’s for!
Yes and no. People were still randomized to condition and it appears to be pretty even attrition. Yes, there is an element of self-selection, which can constrain the generalizability (i.e., external validity) of the results (I’d say most of the constraint is actually just due to studying people with AUD rather than the general population, but you can see why they’d do such a thing), but that does not necessarily mean it broke the randomization, which would reduce the ability to interpret differences as a result of the treatment (i.e., internal validity). To the extent that you want to control for differences that happen to occur or have been introduced between the conditions, you’ll need to run a model to covary those out. That’s what the regression’s for!
My biggest critique is this. If you take condition A and B and compute/plot mean outcomes, you’d presumably be happy that it’s data. But computing/plotting predicted values from a regression of outcome on condition would directly recover those means. And from what we’ve seen above, adjustment is often desirable. Sometimes the raw means are not as useful as the adjusted/estimated means—to your worry about baseline differences, the regression allows us to adjust for that (i.e., provide statistical control where experimental control was not sufficient). And, instead of eyeballing plots, the regressions help tell you if something is reliable. The point of RCTs is not to avoid resorting to regression coefficients. You’ll run regressions in any case! The point of RCTs is to reduce the load your statistical controls will be expected to lift by utilizing experimental controls. You’ll still need to analyze the data and implement appropriate statistical controls. That’s what the regression’s for!
Thanks for the response! I must protest that I think I’m being misinterpreted a bit. Compare my quote:
To the:
The “non-randomized sample” part of that quote is important! If semaglutide had no impact on the decision to participate, then we can argue about about the theory of regressions. Yes, the fraction that participated happened to be close, but with small numbers that could easily happen by chance. The hypothesis of this research is that semaglutide would reduce the urge to drink! If the decision to participate was random, and I believed the conclusion of the experiment, then that conclusion would seem to imply that the decision to participate wasn’t random after all. It just seems incredibly strange to assume that there’s no impact of semaglutide on the probability of agreeing to the experiment, and very unlikely the other variables in the regression fix this, which is why I’m dubious that the regression coefficients reflect any causal relationship.
That said, I think the participation bias could go in either direction. I said (and maintain) that the lab experiment does provide some evidence in favor of semaglutide’s effectiveness. I just think that given the non-random selection, small sample, and general weirdness of having people drink in a room in a hospital as a measurement, it’s quite weak evidence. Given the dismal results from the drinking records (which have less of all of these issues) I think that makes the overall takeaway from this paper pretty negative.
I guess I misunderstood you. I figured that without “regression coefficients,” the sentence would be a bit tautological: “the point of randomized controlled trial is to avoid [a] non-randomized sample,” and there were other bits that made me think you had an issue with both selection bias (agree) and regressions (disagree).
I share your overall takeaway, but at this point I am just genuinely curious why the self-selection is presumed to be such a threat to internal validity here. I think we need more attention to selection effects on the margin, but I also think there is a general tendency for people to believe that once they’ve identified a selection issue the results are totally undermined. What is the alternative explanation for why semaglutide would disincline people who would have had small change scores from participating or incline people who have large change scores to participate (remember, this is within-subjects) in the alcohol self-administration experiment? Maybe those who had the most reduced cravings wanted to see more of what these researchers could do? But that process would also occur among placebo, so it’d work via the share of people with large change scores being greater in the semaglutide group, which is...efficacy. There’s nuance there, but hard to square with lack of efficacy.
That said, still agree that the results are no slam dunk. Very specific population, very specific outcomes affected, and probably practically small effects too.
(Sorry for the slow reply—just saw this.)
> What is the alternative explanation for why semaglutide would disincline people who would have had small change scores from participating or incline people who have large change scores to participate (remember, this is within-subjects) in the alcohol self-administration experiment?
I’m a bit unsure what the non-alternative explanation is here. But imagine that semaglutide does not reduce the urge to drink but—I don’t know—makes people more patient, or makes them more likely to agree to do things doctors ask them to do, or makes them more greedy. Then take the “marginal” person, who is just on the border of participating or not. If those marginal people drink less on average, then semaglutide would look good purely due to changing selection rather than actually reducing drinking.
Now, I don’t claim that the above story is true. It’s possible, but lots of other stories are also possible, including ones where the bias could go in the other way.
I expected this sentence to be followed by you praising me for explicitly disavowing such a view and stating that, since the bias could be in either direction, the lab experiment does provide some evidence in favor of semaglutide. :) (Just very weak evidence.)