By the “whitepaper,” are you referring to the RSP v2.2 that Zach linked to, or something else? If so, I don’t understand how a generic standard can “call out what kind of changes would need to be required” to their current environment if they’re also claiming they meet their current standard.
Also, just to cut a little more to brass tacks here, can you describe the specific threat model that you think they are insufficiently responding to? By that, I don’t mean just the threat actor (insiders within their compute provider) and their objective to get weights, but rather the specific class or classes of attacks that you expect to occur, and why you believe that existing technical security + compensating controls are insufficient given Anthropic’s existing standards.
For example, AIUI the weights aren’t just sitting naively decrypted at inference, they’re running inside a fairly locked down trusted execution environment, with keys provided only as-needed (probably with an ephemeral keying structure?) from a HSM, and those trusted execution environments are operating inside a physical security perimeter of a data center that already is designed to mitigate insider risk. Which parts of this are you worried are attackable? To what degree are organizational boundaries between Anthropic and its compute providers salient to increasing this risk? Why should we expect that the compute providers don’t already have sufficient compensatory controls here, given that, e.g., these compute providers also provide classified compute to the US government that is secured at the Top Secret / SCI level and presumably therefore have best-in-class anti-insider-threat capabilities?
I’m extremely willing to buy into a claim that they’re not doing enough, but I would actually need to have an argument here that’s more specific.
Also, just to cut a little more to brass tacks here, can you describe the specific threat model that you think they are insufficiently responding to? By that, I don’t mean just the threat actor (insiders within their compute provider) and their objective to get weights, but rather the specific class or classes of attacks that you expect to occur, and why you believe that existing technical security + compensating controls are insufficient given Anthropic’s existing standards.
I don’t have a complaint that Anthropic is taking insufficient defenses here. I have a complaint that Anthropic said they would do something in their RSP that they are not doing.
The threat is really not very complicated, it’s basically:
A high-level Amazon executive decides they would like to have a copy of Claude’s weights
They physically go to an inference server and manipulate the hardware to dump the weights unencoded to storage (or read out the weights from a memory bus directly, or use one of the many attacks available to you if you have unlimited hardware access)
data center that already is designed to mitigate insider risk
Nobody in computer security I have ever talked to thinks that data in memory in normal cloud data centers would be secure against physical access from high-level executives who run the datacenter. One could build such a datacenter, but the normal ones are not that!
The top-secret cloud you linked to might be one of those, but my understanding is that Claude weights are deployed just to like normal Amazon datacenters, not only the top-secret governance ones.
By the “whitepaper,” are you referring to the RSP v2.2 that Zach linked to, or something else? If so, I don’t understand how a generic standard can “call out what kind of changes would need to be required” to their current environment if they’re also claiming they meet their current standard.
Also, just to cut a little more to brass tacks here, can you describe the specific threat model that you think they are insufficiently responding to? By that, I don’t mean just the threat actor (insiders within their compute provider) and their objective to get weights, but rather the specific class or classes of attacks that you expect to occur, and why you believe that existing technical security + compensating controls are insufficient given Anthropic’s existing standards.
For example, AIUI the weights aren’t just sitting naively decrypted at inference, they’re running inside a fairly locked down trusted execution environment, with keys provided only as-needed (probably with an ephemeral keying structure?) from a HSM, and those trusted execution environments are operating inside a physical security perimeter of a data center that already is designed to mitigate insider risk. Which parts of this are you worried are attackable? To what degree are organizational boundaries between Anthropic and its compute providers salient to increasing this risk? Why should we expect that the compute providers don’t already have sufficient compensatory controls here, given that, e.g., these compute providers also provide classified compute to the US government that is secured at the Top Secret / SCI level and presumably therefore have best-in-class anti-insider-threat capabilities?
I’m extremely willing to buy into a claim that they’re not doing enough, but I would actually need to have an argument here that’s more specific.
No, I am referring to this, which Zach linked in his shortform: https://assets.anthropic.com/m/c52125297b85a42/original/Confidential_Inference_Paper.pdf
I don’t have a complaint that Anthropic is taking insufficient defenses here. I have a complaint that Anthropic said they would do something in their RSP that they are not doing.
The threat is really not very complicated, it’s basically:
A high-level Amazon executive decides they would like to have a copy of Claude’s weights
They physically go to an inference server and manipulate the hardware to dump the weights unencoded to storage (or read out the weights from a memory bus directly, or use one of the many attacks available to you if you have unlimited hardware access)
Nobody in computer security I have ever talked to thinks that data in memory in normal cloud data centers would be secure against physical access from high-level executives who run the datacenter. One could build such a datacenter, but the normal ones are not that!
The top-secret cloud you linked to might be one of those, but my understanding is that Claude weights are deployed just to like normal Amazon datacenters, not only the top-secret governance ones.