Seems like this post was overly optimistic in what RSPs would be able to enforce/not quite clear on different scenarios for what “RSP” could refer to. Specifically, this post was equivocating between “RSP as a regulation that gets put into place” vs. “RSP as voluntary commitment”—we got the latter, but not really the former (except maybe in the form of the EU Codes of Practice).
Even at Anthropic, the way the RSP is put into practice is now basically completely excluding a scaling pause from the picture:
RSPs are pauses done right: if you are advocating for a pause, then presumably you have some resumption condition in mind that determines when the pause would end. In that case, just advocate for that condition being baked into RSPs!
Interview:
That was never the intent. That was never what RSPs were supposed to be; it was never the theory of change and it was never what they were supposed to be… So the idea of RSPs all along was less about saying, ‘We promise to do this, to pause our AI development no matter what everyone else is doing’
and
But we do need to get rid of some of this unilateral pause stuff.
Furthermore, what apparently happens now is that really difficult commitments either don’t get made or get walked back on:
Since the strictest conditions of the RSPs only come into effect for future, more powerful models, it’s easier to get people to commit to them now. Labs and governments are generally much more willing to sacrifice potential future value than realized present value.
Interview:
So I think we are somewhat in a situation where we have commitments that don’t quite make sense… And in many cases it’s just actually, I would think it would be the wrong call. In a situation where others were going ahead, I think it’d be the wrong call for Anthropic to sacrifice its status as a frontier company
and
Another lesson learned for me here is I think people didn’t necessarily think all this through. So in some ways you have companies that made commitments that maybe they thought at the time they would adhere to, but they wouldn’t actually adhere to. And that’s not a particularly productive thing to have done.
I guess the unwillingness of the government to turn RSPs into regulation is what ultimately blocked this. (Though maybe today even a US-centric RSP-like regulation would be considered “not that useful” because of geopolitical competition). We got RSP-like voluntary commitments from a surprising number of AI companies (so good job on predicting the future on this one) but that didn’t get turned into regulation.
The first RSP was also pretty explicit about their willingness to unilaterally pause:
Note that ASLs are defined by risk relative to baseline, excluding other advanced AI systems.… Just because other language models pose a catastrophic risk does not mean it is acceptable for ours to.
Which was reversed in the second:
It is possible at some point in the future that another actor in the frontier AI ecosystem will pass, or be on track to imminently pass, a Capability Threshold… such that their actions pose a serious risk for the world. In such a scenario, because the incremental increase in risk attributable to us would be small, we might decide to lower the Required Safeguards.
I don’t think this was a big difference between the first and the second version. The first version already had this bullet point:
However, in a situation of extreme emergency, such as when a clearly bad actor (such as a rogue state) is scaling in so reckless a manner that it is likely to lead to lead to imminent global catastrophe if not stopped (and where AI itself is helpful in such defense), we could envisage a substantial loosening of these restrictions as an emergency response. Such action would only be taken in consultation with governmental authorities, and the compelling case for it would be presented publicly to the extent possible.
Seems false? You can violate an RSP by developing or deploying models under conditions where your current RSP says you won’t.
It’s true that some have an escape clause that allows for deployment when others are racing ahead. (And more generally you can revise the RSP.) But this requires specific actions (public revisions or maybe making it clear when the company is relying on the escape clause), it’s not that anything goes.
Anthropic’s escape clause is footnote 17 here. Conditions are that anthropic will acknowledge risks and invest significant effort in regulation that mitigates them. (Technically that doesn’t require them to say that they’re relying on the escape clause, I guess, but if think it would be pretty egregious for them to say that they technically fulfil those criteria now. I don’t expect that anyone sees themselves as relying on the escape clause atm.)
I’m revisiting this post after listening to this section of this recent podcast with Holden Karnofsky.
Seems like this post was overly optimistic in what RSPs would be able to enforce/not quite clear on different scenarios for what “RSP” could refer to. Specifically, this post was equivocating between “RSP as a regulation that gets put into place” vs. “RSP as voluntary commitment”—we got the latter, but not really the former (except maybe in the form of the EU Codes of Practice).
Even at Anthropic, the way the RSP is put into practice is now basically completely excluding a scaling pause from the picture:
Interview:
and
Furthermore, what apparently happens now is that really difficult commitments either don’t get made or get walked back on:
Interview:
and
I guess the unwillingness of the government to turn RSPs into regulation is what ultimately blocked this. (Though maybe today even a US-centric RSP-like regulation would be considered “not that useful” because of geopolitical competition). We got RSP-like voluntary commitments from a surprising number of AI companies (so good job on predicting the future on this one) but that didn’t get turned into regulation.
The first RSP was also pretty explicit about their willingness to unilaterally pause:
Which was reversed in the second:
I don’t think this was a big difference between the first and the second version. The first version already had this bullet point:
BTW, that part of the interview is also why the claim that Anthropic violated its RSP by not stopping research/deployment of new models upon not having ASL-3 security is incorrect, as RSPs never were a framework that allowed for pausing unilaterally.
More generally, it’s useful to keep this in mind the next time a controversy over a RSP violation happens, as I predict it will happen again.
Seems false? You can violate an RSP by developing or deploying models under conditions where your current RSP says you won’t.
It’s true that some have an escape clause that allows for deployment when others are racing ahead. (And more generally you can revise the RSP.) But this requires specific actions (public revisions or maybe making it clear when the company is relying on the escape clause), it’s not that anything goes.
Anthropic’s escape clause is footnote 17 here. Conditions are that anthropic will acknowledge risks and invest significant effort in regulation that mitigates them. (Technically that doesn’t require them to say that they’re relying on the escape clause, I guess, but if think it would be pretty egregious for them to say that they technically fulfil those criteria now. I don’t expect that anyone sees themselves as relying on the escape clause atm.)