I don’t want to claim this with too much confident cynicism (also, I’m not really tracking the “automate alignment” literature much), but for the sake of completing the hypothesis space: this is roughly what you’d expect if large swaths of the discourse were not really serious about it and were largely sliding towards “automate alignment science” because it’s a convenient cop-out for AI labs (and plausibly some other players) not having a good idea of how to make progress on this.
I don’t want to claim this with too much confident cynicism (also, I’m not really tracking the “automate alignment” literature much), but for the sake of completing the hypothesis space: this is roughly what you’d expect if large swaths of the discourse were not really serious about it and were largely sliding towards “automate alignment science” because it’s a convenient cop-out for AI labs (and plausibly some other players) not having a good idea of how to make progress on this.
https://x.com/RichardMCNgo/status/2033624407353287071