One way RSPs could fail to reduce risk — or even increase it — would be if they resulted in the following dynamic: “Cautious AI developers end up slowing down in order to avoid risks, while incautious AI developers move forward as fast as they can
The problem I have with this argument is that “cautious” and “incautious” are not some kind of fundamental/essential quality an AI developer has at their core. They are contingent qualities that are assigned based on the actions of the developer, and defined relative to the context of the situation. So if a relatively cautious AI developer A “releases themselves from the mast” and tries to keep up with a relatively incautious AI developer B who is moving forward as fast as they can—presumably by themselves also moving forward as fast as they can—then I think it is reasonable to no longer call developer A cautious.
On the object level changes, I think its better to have at least one company committed to proving a positively low risk level for their AI systems and making positive claims about their mitigations being effective than having no companies that do so. If no companies do so, this creates the idea that it is “industry standard practice” to not require positive proof that risks are contained before releasing a model, which would seem to be the opposite of Anthropic’s stated goals. That is to say, I find the changes very unfortunate especially as model capabilities increase towards ever more dangerous levels.
this creates the idea that it is “industry standard practice”
I think countering this is basically the point of the recommendations for industry-wide safety? Part of the structure of RSP v3 is to clearly say, when diverging from the practices that would provide a positively low risk level, “this practice poses unacceptably high catastrophic risk, and it would be better if the industry were to collectively do this other thing instead”.
Publishing guidelines that you don’t follow because (some complicated sociopolitical hedging/reasoning goes here which amounts to “well, sir, everyone was speeding last night”) makes it very easy for those opposed to the guidelines to accuse you of hypocrisy, “do as I say not as I do” etc. I therefore have a low estimate on the impact of the industry-standard recommendations. Basically, it’s hard to preach veganism successfully as a non-vegan.
The problem I have with this argument is that “cautious” and “incautious” are not some kind of fundamental/essential quality an AI developer has at their core. They are contingent qualities that are assigned based on the actions of the developer, and defined relative to the context of the situation. So if a relatively cautious AI developer A “releases themselves from the mast” and tries to keep up with a relatively incautious AI developer B who is moving forward as fast as they can—presumably by themselves also moving forward as fast as they can—then I think it is reasonable to no longer call developer A cautious.
On the object level changes, I think its better to have at least one company committed to proving a positively low risk level for their AI systems and making positive claims about their mitigations being effective than having no companies that do so. If no companies do so, this creates the idea that it is “industry standard practice” to not require positive proof that risks are contained before releasing a model, which would seem to be the opposite of Anthropic’s stated goals. That is to say, I find the changes very unfortunate especially as model capabilities increase towards ever more dangerous levels.
I think countering this is basically the point of the recommendations for industry-wide safety? Part of the structure of RSP v3 is to clearly say, when diverging from the practices that would provide a positively low risk level, “this practice poses unacceptably high catastrophic risk, and it would be better if the industry were to collectively do this other thing instead”.
Publishing guidelines that you don’t follow because (some complicated sociopolitical hedging/reasoning goes here which amounts to “well, sir, everyone was speeding last night”) makes it very easy for those opposed to the guidelines to accuse you of hypocrisy, “do as I say not as I do” etc. I therefore have a low estimate on the impact of the industry-standard recommendations. Basically, it’s hard to preach veganism successfully as a non-vegan.