I agree that current frontier AI risk management frameworks (henceforth RMFs) are insufficient to ensure acceptably low risks from AI development. I also think that you’ve pointed out the key reasons for why they’re inadequate: not addressing risks from development or internal deployment, a lack of planning for what happens assuming dangerous capabilities are detected, and (most importantly) a reliance on only checking for a few known risks (as operationalized by current evals). I think that the main value proposition of current RMFs is transparency—that is, it’s good to lay out what the evals and plans, and it’s good for people to do very basic checks for model capabilities. (and I think that this is a common view at METR).
The underlying assumption seems to be that, even with DeepMind’s limited understanding, there must be a way to push forward safely without radical changes to their approach.
I also agree that this is one way to characterize an underlying assumption behind most RMFs (though not necessarily all, and the people writing them would not phrase it this way).
Another perspective on this underlying assumption is that people don’t think of AI development, but instead think of it as software development and correctly note that “real” RMFs wouldn’t allow people to develop software as it’s normally developed. So actual RMFs end up a compromise between what they are in the traditional sense and what doesn’t over encumber the normal software development cycle.
Thanks for writing this!
I agree that current frontier AI risk management frameworks (henceforth RMFs) are insufficient to ensure acceptably low risks from AI development. I also think that you’ve pointed out the key reasons for why they’re inadequate: not addressing risks from development or internal deployment, a lack of planning for what happens assuming dangerous capabilities are detected, and (most importantly) a reliance on only checking for a few known risks (as operationalized by current evals). I think that the main value proposition of current RMFs is transparency—that is, it’s good to lay out what the evals and plans, and it’s good for people to do very basic checks for model capabilities. (and I think that this is a common view at METR).
I also agree that this is one way to characterize an underlying assumption behind most RMFs (though not necessarily all, and the people writing them would not phrase it this way).
Another perspective on this underlying assumption is that people don’t think of AI development, but instead think of it as software development and correctly note that “real” RMFs wouldn’t allow people to develop software as it’s normally developed. So actual RMFs end up a compromise between what they are in the traditional sense and what doesn’t over encumber the normal software development cycle.