As a counter example to the idea that safety work isn’t compute constrained, here is a quote from an interpretability paper out of Anthropic, “Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet” :
We don’t have an estimate of how many features there are or how we’d know we got all of them (if that’s even the right frame!). We think it’s quite likely that we’re orders of magnitude short, and that if we wanted to get all the features – in all layers! – we would need to use much more compute than the total compute needed to train the underlying models.
It seems like there probably are a lot of other compute-expensive experiments that would be helpful to run if safety compute was cheap for whatever reason.
As a counter example to the idea that safety work isn’t compute constrained, here is a quote from an interpretability paper out of Anthropic, “Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet” :
It seems like there probably are a lot of other compute-expensive experiments that would be helpful to run if safety compute was cheap for whatever reason.