habryka comments on davekasten’s Shortform

habryka 20 Oct 2025 17:19 UTC
6 points
0
By the “whitepaper,” are you referring to the RSP v2.2 that Zach linked to, or something else?
No, I am referring to this, which Zach linked in his shortform: https://assets.anthropic.com/m/c52125297b85a42/original/Confidential_Inference_Paper.pdf
Also, just to cut a little more to brass tacks here, can you describe the specific threat model that you think they are insufficiently responding to? By that, I don’t mean just the threat actor (insiders within their compute provider) and their objective to get weights, but rather the specific class or classes of attacks that you expect to occur, and why you believe that existing technical security + compensating controls are insufficient given Anthropic’s existing standards.
I don’t have a complaint that Anthropic is taking insufficient defenses here. I have a complaint that Anthropic said they would do something in their RSP that they are not doing.
The threat is really not very complicated, it’s basically:
- A high-level Amazon executive decides they would like to have a copy of Claude’s weights
- They physically go to an inference server and manipulate the hardware to dump the weights unencoded to storage (or read out the weights from a memory bus directly, or use one of the many attacks available to you if you have unlimited hardware access)
data center that already is designed to mitigate insider risk
Nobody in computer security I have ever talked to thinks that data in memory in normal cloud data centers would be secure against physical access from high-level executives who run the datacenter. One could build such a datacenter, but the normal ones are not that!
The top-secret cloud you linked to might be one of those, but my understanding is that Claude weights are deployed just to like normal Amazon datacenters, not only the top-secret governance ones.