By the “whitepaper,” are you referring to the RSP v2.2 that Zach linked to, or something else? If so, I don’t understand how a generic standard can “call out what kind of changes would need to be required” to their current environment if they’re also claiming they meet their current standard.
Also, just to cut a little more to brass tacks here, can you describe the specific threat model that you think they are insufficiently responding to? By that, I don’t mean just the threat actor (insiders within their compute provider) and their objective to get weights, but rather the specific class or classes of attacks that you expect to occur, and why you believe that existing technical security + compensating controls are insufficient given Anthropic’s existing standards.
For example, AIUI the weights aren’t just sitting naively decrypted at inference, they’re running inside a fairly locked down trusted execution environment, with keys provided only as-needed (probably with an ephemeral keying structure?) from a HSM, and those trusted execution environments are operating inside a physical security perimeter of a data center that already is designed to mitigate insider risk. Which parts of this are you worried are attackable? To what degree are organizational boundaries between Anthropic and its compute providers salient to increasing this risk? Why should we expect that the compute providers don’t already have sufficient compensatory controls here, given that, e.g., these compute providers also provide classified compute to the US government that is secured at the Top Secret / SCI level and presumably therefore have best-in-class anti-insider-threat capabilities?
I’m extremely willing to buy into a claim that they’re not doing enough, but I would actually need to have an argument here that’s more specific.
To be clear, I think people should feel free to block freely and for any reason, including literally no reason at all. I’m open to ways of describing people’s block decisions in the future that better convey that, but I definitely didn’t think others reading this would assume “oh, Zach’s the bad guy here” as opposed to the reverse.