HoldenKarnofsky comments on Responsible Scaling Policy v3

HoldenKarnofsky 10 Apr 2026 4:21 UTC
8 points
3
Hm. I had thought you were pointing to something like “There isn’t actually going to be a pause in this environment triggered by if-then commitments” as the main update/vindication of interest; I was basically responding by pointing out that there also isn’t going to be (and IMO there was never a promising path to) a pause in this environment triggered by advocacy for immediate pausing.

Instead it seems like you’re doing something more like “comparing the overall impact of talking about pauses—or, more broadly, existential risks from AI—with the overall impact of talking about if-then commitments.” I think this is a much muddier comparison where there is less clearly a big update to be had.
I don’t think we have seen much traction on attempts to slow down AI in any way. Meanwhile, I do think that the framework of “test for dangerous capabilities and implement commensurate mitigations” has had quite a significant impact on company behavior in a way that does seem to set up many policy possibilities that would otherwise be rough (including much of what has already passed).
The comparison to “raise general awareness about risks of AI” (as opposed to “advocate for specific policies explicitly aimed at slowing down AI”) feels a bit harder to make—certainly I am, and long have been, positive disposed toward raising general awareness about risks of AI.

But I will probably leave that there as it seems like a pretty complex and tricky debate to have.
My understanding is that most safety efforts at Anthropic were oriented around the RSP in 2024/2025, and my current model is that almost 50% of total safety talent in the ecosystem work at Anthropic.

This doesn’t sound remotely right to me. I would say that the RSP has provided an organizing framework for a lot of safety work, but that’s different from something like “all of that safety work would make no sense if not for the RSP” or something.
- habryka 10 Apr 2026 7:09 UTC
  4 points
  0
  Parent
  But I will probably leave that there as it seems like a pretty complex and tricky debate to have.
  Seems reasonable. I’ll leave a few quick clarifications.
  Instead it seems like you’re doing something more like “comparing the overall impact of talking about pauses—or, more broadly, existential risks from AI—with the overall impact of talking about if-then commitments.” I think this is a much muddier comparison where there is less clearly a big update to be had.
  The thing that I am comparing is “resources invested into advocating for direct regulation, and actions that would directly slow down AI development” vs. “resources invested into getting companies to adopt RSPs and get policies around RSPs passed”.
  I don’t think we have seen much traction on attempts to slow down AI in any way. Meanwhile, I do think that the framework of “test for dangerous capabilities and implement commensurate mitigations” has had quite a significant impact on company behavior in a way that does seem to set up many policy possibilities that would otherwise be rough (including much of what has already passed).
  I think traction has been not great, but also not terrible. Honestly, I would have been confused if many very concrete useful things had passed by now, since buy-in takes a while to build, but the things that do seem to show motion seem not very RSP-flavored. I do currently think it’s pretty unlikely that regulation that does get passed has much of any grounding in if-then commitments or RSPs, and I am not sure what you are talking about with “set up many policy possibilities that would otherwise be rough”.
  certainly I am, and long have been, positive disposed toward raising general awareness about risks of AI.
  I agree and am deeply grateful for your work in the space, and your support of work in the space.
  This doesn’t sound remotely right to me. I would say that the RSP has provided an organizing framework for a lot of safety work, but that’s different from something like “all of that safety work would make no sense if not for the RSP” or something.
  Hmm, I agree this comparison is tricky, and on-reflection I think I overstated the ratios here. The RSP has been responsible for quite a lot of safety-adjacent work (including a lot of effort spend on cyber-security, and various comms efforts, and the prioritization of various mitigations), but I agree that most of the safety-adjacent work at Anthropic is more driven by other risk models (and are IMO mostly downstream of beliefs around what the tractable parts of the general AI alignment problem are, and which which aspects of alignment-oriented work are most helpful for commercialization), and the RSP prioritization I think is probably more responsible for something like 15%-20% of the work at Anthropic.
  - HoldenKarnofsky 16 Apr 2026 16:56 UTC
    4 points
    0
    Parent
    > The thing that I am comparing is “resources invested into advocating for direct regulation, and actions that would directly slow down AI development” vs. “resources invested into getting companies to adopt RSPs and get policies around RSPs passed”.
    
    This feels like an unfair comparison in that one of these things is much more specific/narrow than the other.
    
    I think fairer comparisons would be:
    
    1. A comparison of two specific/narrow goals along the lines of “Enact policy explicitly aimed at doing X.” Perhaps (a) “Advocating for pausing or stopping AI development, or perhaps for directly slowing it down via deliberate, explicit attempts to slow it down (not e.g. regulation largely or even putatively motivated by something else that happens to increase frictions to AI development)” vs. (b) “advocating for companies to adopt RSPs and get policies around RSPs passed.” I think SB53 and the EU Code of Practice have significant elements that look like the latter, whereas I’d guess we agree the former has gotten nowhere.
    
    2. A comparison of two broad approaches to spreading general messages or frameworks with many possible policy implications. Perhaps (a) “Advocating for direct regulation [which I assume refers to the sort of examples you gave before], and actions that would directly slow down AI development” vs. (b) “Advocating for the general paradigm of doing evaluations to assess current danger, implementing mitigations to reduce current danger, and having some sort of transparency and accountability around these activities.” Here too I think we’ve seen quite a bit more motion from the latter, both from legislation and via voluntary actions from companies, though perhaps you could frame a bit of this as the former, and if you have a certain picture of the risk, you may consider all of the motion on the latter to be worthless.
    
    > I agree and am deeply grateful for your work in the space, and your support of work in the space.
    
    I appreciate your saying so.