Buck comments on We Have No Plan for Preventing Loss of Control in Open Models

Buck 31 Mar 2025 12:35 UTC
LW: 4 AF: 3
0
AF
A few takes:
I believe that there is also an argument to be made that the AI safety community is currently very under-indexed on research into future scenarios where assumptions about the AI operator taking baseline safety precautions related to preventing loss of control do not hold.
I think you’re mixing up two things: the extent to which we consider the possibility that AI operators will be very incautious, and the extent to which our technical research focuses on that possibility.
My research mostly focuses on techniques that an AI developer could use to reduce the misalignment risk posed by deploying and developing AI, given some constraints on how much value they need to get from the AI. Given this, I basically definitionally have to be imagining the AI developer trying to mitigate misalignment risk: why else would they use the techniques I study?
But that focus isn’t to say that I’m totally sure all AI developers will in fact use good safety techniques.
Another disagreement is that I think that we’re better off if some AI developers (preferably more powerful ones) have controlled or aligned their models, even if there are some misaligned AIs being developed without safeguards. This is because the controlled/aligned models can be used to defend against attacks from the misaligned AIs, and to compete with the misaligned AIs (on both acquiring resources and doing capabilities and safety research).