It seems pretty misleading to describe the shift away from unilateral pausing as a natural extension of the RSP being a living document.
You’re welcome to this view, but it isn’t mine. Other major projects I’ve worked on involved major, fundamental pivots that resulted in projects almost unrecognizable from the original pitch, and if you’d asked me at the time I would’ve said I expected something similar for RSPs. On priors, a completely new kind of risk management framework should be expected to change dramatically. I never would have agreed to “The policies we’re coming up with today will only change in the details.”
I think doing so marks the breaking of a meaningful promise—something many people were relying on, making career decisions on the basis of, etc.
I’ve seen many claims along these lines, and my guess is that there is at least some truth to them, but I am somewhat surprised by how little I’ve seen in the way of tangible specifics re: who promised what and who made important plans and decisions based on this. I have mostly just seen the quote from Evan, which Evan claims is a mischaracterization. I genuinely can’t recall encountering this sort of thing firsthand (though I’ve only been at Anthropic for about a year). Again, I’m not saying such things didn’t happen, but I haven’t seen enough specifics to be affirmatively convinced by claims like this or to have a clear sense of who was responsible, who was harmed, etc.
In other words, because a pause would seriously damage the company, there was pressure to misrepresent the risk. I think this should seriously call Anthropic’s ability to self-govern into question
I don’t think that feeling unhealthy pressures implies that there is a governance failure. For example, people regularly feel pressure to avoid admitting they were wrong—I don’t think this is a particular person’s fault or calls a particular governing structure into question. My statement here was about psychological pressures, not pressures imposed by e.g. executives.
Which is to say that the situation as you’ve presented it seems strictly worse relative to the one Anthropic was imagining two years ago: we’re closer to AGI, but we have much less hope of accurately assessing the risk, and the political landscape is less favorable. Yet it seems like your proposal, in response an overall more dangerous situation, is to be even more reckless.
I think the situation today is worse on the dimensions being discussed here (e.g., political will), although better on some other dimensions (IMO, the technical picture looks at least somewhat better).
I don’t accept the framing that stricter commitments are necessarily less “reckless” while more flexible frameworks are more “reckless.” I think the new policy better positions us to reduce risk worldwide. If pausing were the only or clearly best path to risk reduction, I think it would make more sense to associate greater flexibility with greater recklessness. Perhaps you believe that’s the case; I don’t.
But my god, does a post which is fundamentally premised on the inevitability of this race do so little to grapple with it. Not once does this post mention the possibility of extinction, for example, as if the real stakes and the real casualties Anthropic might cause have been forgotten. Very little attention is given to whether the race to AGI is in fact inevitable, or if there might be something Anthropic—as a leading player in this race (!)—might be able to do about that. Nor is any mention made of the role Anthropic has played in shaping this unfortunate political landscape which they now report being so helplessly beholden to. What is the point of having a seat at the table, if one doesn’t use it to wield influence in situations like this?
I don’t believe a race is inevitable, and I think there may be things Anthropic can do to make a race less likely. But I think that those things are and will continue to be highly contingent on specific circumstances, and I also believe there are non-slowdown-oriented actions that can reduce risk significantly (and potentially more than slowdown-oriented actions, depending on the circumstances).
Hm. I had thought you were pointing to something like “There isn’t actually going to be a pause in this environment triggered by if-then commitments” as the main update/vindication of interest; I was basically responding by pointing out that there also isn’t going to be (and IMO there was never a promising path to) a pause in this environment triggered by advocacy for immediate pausing.
Instead it seems like you’re doing something more like “comparing the overall impact of talking about pauses—or, more broadly, existential risks from AI—with the overall impact of talking about if-then commitments.” I think this is a much muddier comparison where there is less clearly a big update to be had.
I don’t think we have seen much traction on attempts to slow down AI in any way. Meanwhile, I do think that the framework of “test for dangerous capabilities and implement commensurate mitigations” has had quite a significant impact on company behavior in a way that does seem to set up many policy possibilities that would otherwise be rough (including much of what has already passed).
The comparison to “raise general awareness about risks of AI” (as opposed to “advocate for specific policies explicitly aimed at slowing down AI”) feels a bit harder to make—certainly I am, and long have been, positive disposed toward raising general awareness about risks of AI.
But I will probably leave that there as it seems like a pretty complex and tricky debate to have.
This doesn’t sound remotely right to me. I would say that the RSP has provided an organizing framework for a lot of safety work, but that’s different from something like “all of that safety work would make no sense if not for the RSP” or something.