RSPs set a standard that roughly looks like: “we will continue scaling models and deploying if and only if our internal evals team fails to empirically elicit dangerous capabilities. If they do elicit dangerous capabilities, we will enact safety controls just sufficient for our models to be unsuccessful at, e.g., creating Super Ebola.”
I think it’s pretty clear that’s at least not what I’m advocating for—I have a very specific story of how I think RSPs go well in my post.
While these provisions are focused on AI agents rather than generative AI, it seems feasible to set a standard for AIs to be generally law-abiding (even after jailbreaking or fine-tuning attempts), which would also help reduce their potential contribution to catastrophic risk. Setting liability for AI harms, as proposed by Senators Blumenthal and Hawley, would also motivate AI labs to be more cautious.
These seem like good interventions to me! I’m certainly not advocating for “RSPs are all we need”.
I think it’s pretty clear that’s at least not what I’m advocating for—I have a very specific story of how I think RSPs go well in my post.
These seem like good interventions to me! I’m certainly not advocating for “RSPs are all we need”.