Alice Blair comments on The Weakest Model in the Selector

Alice Blair 30 Dec 2025 23:25 UTC
3 points
−1
I’m simply making an argument from instrumental convergence: not being jailbroken by adversaries is bad for an AI no matter what its goals are. Current AIs don’t have very strong goals and don’t display very strong instrumental convergence, but this doesn’t mean it’s impossible for that to happen in deep learning-based AIs in the future. Indeed, I expect them to become even more goal directed and display even more instrumental convergence.