habryka comments on Anthropic is (probably) not meeting its RSP security commitments

habryka 19 Nov 2025 2:16 UTC
41 points
36
Hi Habryka, thank you for holding us accountable. We do extend ASL-3 protections to all of our deployment environments and cloud environments are no different. We haven’t made exceptions to ASL-3 requirements for any of the named deployments, nor have we said we would treat them differently. If we had, I’d agree that we would have been in violation. But we haven’t. Eventually, we will do so for ASL-4+. I hope that you appreciate that I cannot say anything about specific partnerships.
Thanks for responding! I understand you to be saying that you feel confident that even with high-level executive buy in at Google, Microsoft or Amazon, none of the data center providers you use would be able to extract the weights of your models. Is that correct?
If so, I totally agree that that would put you in compliance with your ASL-3 commitments.^[1] I understand that you can’t provide details about how you claim to be achieving that, and so I am not going to ask further questions about the details (but would appreciate more information nevertheless).
I do find myself skeptical given just your word, but it can often be tricky with cybersecurity things like this about how to balance the tradeoff between providing verifiable information and opening up more attack surface.
(Am sending the above as an email and will update this thread if Jason responds with something I can share)
1. ^
  At least on any issues I am trying to point out here in this post outside of the postscript
- habryka 19 Nov 2025 23:38 UTC
  21 points
  1
  Parent
  Jason Clinton very briefly responded saying that the May 14th update, which excluded sophisticated insiders from the threat model, addresses the concerns in this post, plus some short off-the-record comments.
  Based on this and Holden’s comment, my best interpretation of Anthropic’s position on this issue is that they currently consider employees at compute providers to be “insiders” and executives “sophisticated insiders”, and the latter hereby excluded from Anthropic’s security commitments. They also likely furthermore think that compute providers do not have the capacity to execute attacks like this without very high chance of getting caught and so that this is not a threat model they are concerned about.
  As I argue for in the post, defining basically all employees at compute providers to be “insiders” feels like an extreme stretch of the word, and I think has all kinds of other tricky implications for the RSP, but it’s not wholly inconsistent!
  I think to bring Anthropic back into compliance with what I think is a common sense reading, I would suggest (in descending goodness):
  1. updating the RSP with a carve-out for major compute providers,
  2. use a different word from “insider” for the class that Anthropic is meaning to exclude in its May update,
  3. or at the very least provide a clear definition of “insider” within the RSP (but largely I would really advocate against using the word “insider” at all here, since I don’t think people expect that to include compute provider employees).
  I separately seem to disagree with Anthropic on this being a thing that executives at Google/Amazon/Microsoft could be motivated to do, and a thing that they would succeed at if they tried, but given the broad definition of “Insider” it doesn’t appear to be load bearing for the thesis in the OP. I have written a bit more about it anyways in this comment thread.