The argument is that academia is huge and does a lot of AI safety-adjacent research on topics such as transparency, robustness, and safe RL. Therefore, even if this work is strongly discounted because it’s only tangentially related to AGI safety, the discounted contribution is still large.
I and others would argue that at least some “prosaic” safety research, such as interpretability, may actually be increasing P(doom) from AI, even if some of the work involved turns out to have been essential later on. Partly because more-useful AI needs a modicum of steering (so AI labs fund this at all).
I and others would argue that at least some “prosaic” safety research, such as interpretability, may actually be increasing P(doom) from AI, even if some of the work involved turns out to have been essential later on. Partly because more-useful AI needs a modicum of steering (so AI labs fund this at all).
My main worry is that having steering in-place before a goal could be a strategically-bad order-of-events. Even lots of shared-structure between “steering” and “what to steer towards”, does not guarantee good outcomes.