This feels like a quite natural category to me: there are a lot of common factors in what’s hard about achieving security from people with authorized access, and in why the marginal security benefits of doing so in this context are relatively limited (because the company has self-interested reasons to keep this set of people relatively contained and vetted).
But it’s possible that I’m the one with the idiosyncratic reading here. My reading is certainly colored by my picture of the threat models. My concern for AIs at this capability level is primarily about individual or small groups of terrorists, I think security that screens off most opportunistic attackers is what we need to contain the threat, and the threat model you’re describing does not seem to me like it represents an appreciable increase in relevant risks (though it could at higher AI capability levels).
In any case, I will advocate for the next iteration of this policy to provide clarification or revision to better align with what is (in my opinion) important for the threat model.
FWIW, this is part of a general update for me that the level of specific detail in the current RSP is unlikely to be a good idea. It’s hard to be confident in advance of what will end up making the most sense from a risk reduction POV, following future progress on threat modeling, technical measures, etc., at the level of detail the current RSP has.
Having talked to something like 5-15 people about this, many of whom had at least a non-trivial cybersecurity background, I am pretty confident that your reading is idiosyncratic!
The language models also seem to continue to think so:
Chat with Gemini Pro about RSP definitions
Me: Hey, can you help me clarify what Anthropic considers an “Insider” according to their RSP? Like, would employees or executives of datacenter providers count as “insiders”? I attached the latest version of their RSP.
Gemini Pro:
Based on the text of the Responsible Scaling Policy (Version 2.2), Anthropic differentiates between “Insiders” and third-party providers (such as datacenter staff).
Here is the breakdown of how the document defines these roles:
1. Definition of “Insider”
According to Page 13 and Footnote 11, Anthropic defines “insiders” specifically as internal employees.
The “In Scope” List (Page 13): The policy explicitly lists “internal employees” as a specific threat group alongside external groups like hacktivists and state-sponsored programs.
Insider Risk Classifications (Footnote 11): The document further breaks down “internal employees” into two categories based on their access to model weights:
Basic Insider: An insider who does not have persistent or time-limited access to systems that process model weights.
Sophisticated Insider: An insider who has persistent access or can request time-limited access to systems that process model weights.
2. Status of Datacenter Providers
Employees or executives of datacenter providers are not classified as “Insiders” under this policy. Instead, they fall under the categories of “Third-party environments,”“Supply chain,” or “Vendors.”
The RSP treats them as external entities that require their own specific set of safeguards:
Third-Party Environments (Page 12 & 14): The policy states that if models are deployed in a third-party partner’s environment (like a datacenter), Anthropic must “Document how all relevant models will meet the criteria… even if they are deployed in a third-party partner’s environment that may have a different set of safeguards.”
Supply Chain (Page 19, Appendix B): The ASL-2 Security Standard distinguishes “Supply chain” from “Workforce.”
Supply Chain: Involves “Vendor and supplier security” which must be regularly reviewed.
Workforce: Involves “internal employees” and the “insider risk program.”
Summary
To answer your question directly: No, employees or executives of datacenter providers would not count as “Insiders” according to this document.
“Insiders” = Anthropic internal staff/employees.
Datacenter Staff = Vendors/Third-party partners (subject to supply chain security reviews and third-party environment agreements).
Like, my understanding is that the definition of “sophisticated insider” you propose here includes on the order of 2,000-3,000 people, whereas when I read the RSP, and asked other people about it, they thought it would be on the order of 50-100 people. That is of course a huge difference in surface area.
I don’t think this change in surface area is the kind of change that should be left up to this much ambiguity in the RSP. I think even if you update that the level of specific detail in the current RSP is unlikely to be a good idea, I think you should be able to end up with less counterintuitive definitions and less ambiguity[1] in future revisions of the RSP.
I haven’t thought as much about all the tradeoffs as you have, so maybe this is infeasible for some reason, but I currently believe that this was a pretty clear and preventable error, instead of just a case of diverging interpretations (and to be clear, it’s OK for there to be some errors, I don’t think this thing alone should update anyone that much, though this plus a few other things should).
In any case, I will advocate for the next iteration of this policy to provide clarification or revision to better align with what is (in my opinion) important for the threat model.
I appreciate it!
My concern for AIs at this capability level is primarily about individual or small groups of terrorists, I think security that screens off most opportunistic attackers is what we need to contain the threat, and the threat model you’re describing does not seem to me like it represents an appreciable increase in relevant risks (though it could at higher AI capability levels).
I think this is reasonable! I don’t think the current RSP communicates that super well, and I think “risk from competitor corporate espionage” is IMO a reasonable thing to be worried about, at least from an outside view[2]. It seems good for the RSP to be clear that it is currently not trying to be robust to at least major US competitors stealing model weights (which is I think a fine call to make given all the different tradeoffs).
Though given that I have not met a single non-Anthropic employee, or language model, who considered the definition of “Insider” you use here natural given the context of the rest of the document I struggle to call it “ambiguity” instead of simply calling it “wrong”
It is for example a thing that has come up in at least one scenario exercise game I have been part of, not too far from where current capability thresholds are at.
Hi Oli, I think that people outside of the company falling under this definition would be outnumbered by people inside the company. I don’t think thousands of people at our partners have authorized access to model weights.
I won’t continue the argument about who has an idiosyncratic reading, but do want to simply state that I remain unconvinced that it’s me (though not confident either).
I don’t think thousands of people at our partners have authorized access to model weights.
I don’t understand the relevance of this. Of course almost no one at the partners has “authorized” access to model weights. This is in the cybersecurity section of the RSP.
The question is how many people have physical or digital access to machines that process model weights, which is how I understood you to define the “sophisticated” subset of “insiders” in the RSP. To quote directly from it:
We define “sophisticated insider risk” as risk from an insider who has persistent access or can request time-limited access to systems that process model weights.
Clearly datacenter executives at Amazon, Google and Microsoft have access to systems that process model weights. They can (approximately) just walk up to them, and probably log into accounts on the same machines. Possibly you meant here that “access to systems that process model weights” was intended as equivalent to “authorized access to model weights”, but those are of course very different in the case of a datacenter provider, and to me it seemed a very intentional choice to define this threshold as “access to systems that process model weights” not “access to model weights”.
I won’t continue the argument about who has an idiosyncratic reading, but do want to simply state that I remain unconvinced that it’s me (though not confident either)
Seems good for you to state! I would be glad to take bets about what neutral third parties would consider true. I don’t think it’s a total slam-dunk either, but feel like 75% confident that a mutually agreed third-party would end up thinking the interpretation I advocate for here is the correct one.
I was in fact associating sophisticated insiders with actually having authorized access to model weights, and I’m not sure (even after asking around) why this is worded the way it is.
I don’t really understand your comment here: “I don’t understand the relevance of this. Of course almost no one at the partners has “authorized” access to model weights. This is in the cybersecurity section of the RSP.” How many people have authorized access to a given piece of sensitive info can vary enormously (making this # no bigger than necessary is among the challenges of cybersecurity), and people can have authorized access to things that they are nevertheless not able to exfiltrate for usage elsewhere. It is possible to have very good protection against people with authorized access to model weights, and possible to have very little protection against this.
My guess is that it is quite difficult for the people you’re gesturing at (e.g., people who can log in on the same machines but don’t have authorized access to model weights) to exfiltrate model weights, though I’m not personally confident of that.
I was in fact associating sophisticated insiders with actually having authorized access to model weights, and I’m not sure (even after asking around) why this is worded the way it is.
Ok, cool. This then of course makes my top level post very relevant again, since I think a large number of database executives and other people high up at Google seem likely able to exfiltrate model weights without too much of an issue (I am not that confident of this, but it’s my best guess from having thought about the topic for a few dozen hours now, and having decent familiarity with cybersecurity considerations).
This I think puts Anthropic in violation of its RSP, even given your clarifications, since you have now clarified those would not be considered “sophisticated insiders” and so are not exempt (with some uncertainties I go into at the end of this comment).
How many people have authorized access to a given piece of sensitive info can vary enormously (making this # no bigger than necessary is among the challenges of cybersecurity), and people can have authorized access to things that they are nevertheless not able to exfiltrate for usage elsewhere.
Sorry, I should have phrased things more clearly here. Let me try again:
I am describing a cybersecurity attack surface. Of course for the purpose of those attacks, we can assume that the attacker is willing to do things they are not authorized to do. People being willing to commit cybercrimes is one of the basic premises of the cybersecurity section of the RSP.
I am here describing an attack vector where anyone who has physical access to those systems is likely capable of exfiltrating model weights, at least as long as they get some amount of executive buy-in to circumvent supervision. It is extremely unlikely for most of the people who fit this description to have authorized access to model weights. As such, it is unclear what the relevance of people having “authorized access” is.
I also am honestly surprised that anyone at Google or Amazon or Microsoft is considered to have authorized access to weights. That itself is an update to me! I would have assumed nobody at those companies was allowed to look at the weights, or make inferences about e.g. the underlying architecture, etc.
I am now thinking you believe something like this: Yes, there are many people with physical access, but even if they succeed at exfiltrating the weights, realistically in order for them to do anything with the weights, word would reach the highest executive levels at these tech companies, and the highest executive levels at these tech companies all have authorized access to model weights. I.e. currently Satya or Demis have authorized access to model weights (not because you want to give them authorized access, but just because giving them authorized access is a necessity for using their compute infrastructure), and as such are considered sophisticated insiders.
Honestly, I find the idea of considering Satya or Demis “sophisticated insiders with authorized access to model weights” very confusing. Like, at least taken on its own that has pretty big implications about race dynamics and technical diffusion between frontier labs, since apparently Anthropic wouldn’t consider it a security incident if Satya or Demis were to download Claude weights to a personal computer of theirs and reverse-engineer architectural details from them (since I am inferring them both to have “authorized access”).
My guess is reality is complicated in a bunch of messy ways that is hard to capture with the RSP here. I do appreciate you taking the time to clarify things.
There are a number of things you say here that don’t seem right to me and/or aren’t capturing the intent of what I said. I prefer not to get into all of it, but just a couple of notes:
My current impression is that we are “highly protected against most attackers’ attempts at stealing model weights,” specifically highly protected against the groups listed as “in scope” (which I think of as including employees at partner orgs who have physical access to machines but not authorized access to weights), and broadly in line with the letter and spirit of the ASL-3 Security Standard. This isn’t my call and I am not up on all of the details of how we’ve vetted the security controls for partners, but it’s my impression.
An attacker being out of scope for the ASL-3 Security Standard does not meant “Anthropic wouldn’t consider it a security incident” if they stole (i.e., exfiltrated/improperly used) important assets.
In particular the first bullet point seems important and clear. I currently think this is unlikely to be true (assuming that e.g. most people in datacenter management and executives at these companies do not have authorized access to weights), but I don’t really know how to progress from here. I might write more if I happen to talk to more people in the field about it.
An attacker being out of scope for the ASL-3 Security Standard does not meant “Anthropic wouldn’t consider it a security incident” if they stole (i.e., exfiltrated/improperly used) important assets.
That makes sense, though to be clear I was not trying to equate those two. I was saying “Anthropic wouldn’t consider it a security incident if someone with authorized model access were to use those weights how they see fit”. I.e. I was equating authorized access with Anthropic wouldn’t consider it a security incident if they did stuff with the weights.
But thinking more about it, it does seem like there is a natural difference between “authorized access to model weights” and “authorized to transfer model weights to new machines” or “authorized to perform operations on model weights without extensive logging”, and it makes sense to treat the latter as a security breach even if someone is authorized to access model weights in some sense.
This still leaves me in a kind of confused spot with regards to the security model here. From my perspective this still leaves hundreds of people[1] in the world who have both opportunity and motive to gain access to Anthropic model weights, with a bunch of people clearly outside of Anthropic and with misaligned interests to Anthropic being labeled “sophisticated insiders” and therefore excluded from the threat model in a way that really isn’t obvious from reading the RSP.
And it’s not like I have no sympathy for the difficulty of getting this all right, but the attack surface here feels very different than the one I was expecting to be covered when reading the RSP.
Overall, again, thanks for you taking the time to clarify things here. Given the first point it does seem like we have a disagreement about whether Anthropic is currently meeting its commitments, but it’s not super clear whether it’s worth either of our time to dig into it more.
Maybe only tens, since I don’t actually know who you currently consider to have authorized access to model weights at these other companies, which I think would be less concerning, though doesn’t change things that much if it e.g. includes all the top-level executives at these other companies who have the biggest motive.
These systems are designed to resist individual operators subverting controls—competently built cloud infrastructure doesn’t allow subversion of access controls to production systems even with physical access to data halls. I’ll speak to AWS’s controls in particular as an example, but I want to emphasize that this is a metonym for any competently run CSP.
AWS’s Nitro System is specifically architected with “zero operator access”—there is no mechanism for any AWS personnel, including those with the highest privileges, to access customer data. These are designed and tested technical restrictions built into the hardware itself, not policy controls that can be overridden. The system uses tamper-resistant TPMs with hardware roots of trust, and there is no equivalent of a “root” user or administrative bypass—even for maintenance.
This has been independently validated by NCC Group, who found “no gaps in the Nitro System that would compromise these security claims” and “no indication that a cloud service provider employee can obtain such access...to any host.” You may also enjoy as a bonus a quick read through the Mantle whitepaper.
The assumption that datacenter executives could “just walk up to” machines and exfiltrate data conflates physical proximity with system access. Physical access to a server room doesn’t necessarily grant access to customer data.
You can’t just walk up, but there is an extremely long history of easily available exploits given unlimited hardware access to systems, and the database center hardware stack is not up to the task (yet). Indeed, Anthropic themselves published a whitepaper outlining what would be necessary for datacenters to actually promise security even with physical hardware violations, which IMO clearly implies they do not think current data-centers meet that requirement!
Like, this is not an impossible problem to solve, but based on having engaged with the literature here a good amount, and having talked to a bunch of people with experience in the space, my strong sense is that if you gave me unlimited hardware access to the median rack that has Anthropic model weights on it while it is processing them, it would only require a mildly sophisticated cybersecurity team to access the weights unencrypted.
Thanks Oli. Your reading is quite different from mine. I just googled “insider risk,” clicked the first authoritative-ish-looking link, and found https://www.cisa.gov/topics/physical-security/insider-threat-mitigation/defining-insider-threats which seems to support something more like my reading.
This feels like a quite natural category to me: there are a lot of common factors in what’s hard about achieving security from people with authorized access, and in why the marginal security benefits of doing so in this context are relatively limited (because the company has self-interested reasons to keep this set of people relatively contained and vetted).
But it’s possible that I’m the one with the idiosyncratic reading here. My reading is certainly colored by my picture of the threat models. My concern for AIs at this capability level is primarily about individual or small groups of terrorists, I think security that screens off most opportunistic attackers is what we need to contain the threat, and the threat model you’re describing does not seem to me like it represents an appreciable increase in relevant risks (though it could at higher AI capability levels).
In any case, I will advocate for the next iteration of this policy to provide clarification or revision to better align with what is (in my opinion) important for the threat model.
FWIW, this is part of a general update for me that the level of specific detail in the current RSP is unlikely to be a good idea. It’s hard to be confident in advance of what will end up making the most sense from a risk reduction POV, following future progress on threat modeling, technical measures, etc., at the level of detail the current RSP has.
Out of curiosity about usage, I ctrl-f’d through the Securing Model Weights report to see how they use the word “insider”. I found:
“A key goal of SL3 is reducing the risks from insider threats (e.g., company employees)”
This seems to imply that it’s including non-employees
Almost everywhere that the report talks about insider threats, it only mentions employees.
Thanks for the response!
Having talked to something like 5-15 people about this, many of whom had at least a non-trivial cybersecurity background, I am pretty confident that your reading is idiosyncratic!
The language models also seem to continue to think so:
Chat with Gemini Pro about RSP definitions
Me: Hey, can you help me clarify what Anthropic considers an “Insider” according to their RSP? Like, would employees or executives of datacenter providers count as “insiders”? I attached the latest version of their RSP.
Gemini Pro:
Based on the text of the Responsible Scaling Policy (Version 2.2), Anthropic differentiates between “Insiders” and third-party providers (such as datacenter staff).
Here is the breakdown of how the document defines these roles:
1. Definition of “Insider”
According to Page 13 and Footnote 11, Anthropic defines “insiders” specifically as internal employees.
The “In Scope” List (Page 13): The policy explicitly lists “internal employees” as a specific threat group alongside external groups like hacktivists and state-sponsored programs.
Insider Risk Classifications (Footnote 11): The document further breaks down “internal employees” into two categories based on their access to model weights:
Basic Insider: An insider who does not have persistent or time-limited access to systems that process model weights.
Sophisticated Insider: An insider who has persistent access or can request time-limited access to systems that process model weights.
2. Status of Datacenter Providers
Employees or executives of datacenter providers are not classified as “Insiders” under this policy. Instead, they fall under the categories of “Third-party environments,” “Supply chain,” or “Vendors.”
The RSP treats them as external entities that require their own specific set of safeguards:
Third-Party Environments (Page 12 & 14): The policy states that if models are deployed in a third-party partner’s environment (like a datacenter), Anthropic must “Document how all relevant models will meet the criteria… even if they are deployed in a third-party partner’s environment that may have a different set of safeguards.”
Supply Chain (Page 19, Appendix B): The ASL-2 Security Standard distinguishes “Supply chain” from “Workforce.”
Supply Chain: Involves “Vendor and supplier security” which must be regularly reviewed.
Workforce: Involves “internal employees” and the “insider risk program.”
Summary
To answer your question directly: No, employees or executives of datacenter providers would not count as “Insiders” according to this document.
“Insiders” = Anthropic internal staff/employees.
Datacenter Staff = Vendors/Third-party partners (subject to supply chain security reviews and third-party environment agreements).
Like, my understanding is that the definition of “sophisticated insider” you propose here includes on the order of 2,000-3,000 people, whereas when I read the RSP, and asked other people about it, they thought it would be on the order of 50-100 people. That is of course a huge difference in surface area.
I don’t think this change in surface area is the kind of change that should be left up to this much ambiguity in the RSP. I think even if you update that the level of specific detail in the current RSP is unlikely to be a good idea, I think you should be able to end up with less counterintuitive definitions and less ambiguity[1] in future revisions of the RSP.
I haven’t thought as much about all the tradeoffs as you have, so maybe this is infeasible for some reason, but I currently believe that this was a pretty clear and preventable error, instead of just a case of diverging interpretations (and to be clear, it’s OK for there to be some errors, I don’t think this thing alone should update anyone that much, though this plus a few other things should).
I appreciate it!
I think this is reasonable! I don’t think the current RSP communicates that super well, and I think “risk from competitor corporate espionage” is IMO a reasonable thing to be worried about, at least from an outside view[2]. It seems good for the RSP to be clear that it is currently not trying to be robust to at least major US competitors stealing model weights (which is I think a fine call to make given all the different tradeoffs).
Though given that I have not met a single non-Anthropic employee, or language model, who considered the definition of “Insider” you use here natural given the context of the rest of the document I struggle to call it “ambiguity” instead of simply calling it “wrong”
It is for example a thing that has come up in at least one scenario exercise game I have been part of, not too far from where current capability thresholds are at.
Hi Oli, I think that people outside of the company falling under this definition would be outnumbered by people inside the company. I don’t think thousands of people at our partners have authorized access to model weights.
I won’t continue the argument about who has an idiosyncratic reading, but do want to simply state that I remain unconvinced that it’s me (though not confident either).
I don’t understand the relevance of this. Of course almost no one at the partners has “authorized” access to model weights. This is in the cybersecurity section of the RSP.
The question is how many people have physical or digital access to machines that process model weights, which is how I understood you to define the “sophisticated” subset of “insiders” in the RSP. To quote directly from it:
Clearly datacenter executives at Amazon, Google and Microsoft have access to systems that process model weights. They can (approximately) just walk up to them, and probably log into accounts on the same machines. Possibly you meant here that “access to systems that process model weights” was intended as equivalent to “authorized access to model weights”, but those are of course very different in the case of a datacenter provider, and to me it seemed a very intentional choice to define this threshold as “access to systems that process model weights” not “access to model weights”.
Seems good for you to state! I would be glad to take bets about what neutral third parties would consider true. I don’t think it’s a total slam-dunk either, but feel like 75% confident that a mutually agreed third-party would end up thinking the interpretation I advocate for here is the correct one.
I was in fact associating sophisticated insiders with actually having authorized access to model weights, and I’m not sure (even after asking around) why this is worded the way it is.
I don’t really understand your comment here: “I don’t understand the relevance of this. Of course almost no one at the partners has “authorized” access to model weights. This is in the cybersecurity section of the RSP.” How many people have authorized access to a given piece of sensitive info can vary enormously (making this # no bigger than necessary is among the challenges of cybersecurity), and people can have authorized access to things that they are nevertheless not able to exfiltrate for usage elsewhere. It is possible to have very good protection against people with authorized access to model weights, and possible to have very little protection against this.
My guess is that it is quite difficult for the people you’re gesturing at (e.g., people who can log in on the same machines but don’t have authorized access to model weights) to exfiltrate model weights, though I’m not personally confident of that.
Ok, cool. This then of course makes my top level post very relevant again, since I think a large number of database executives and other people high up at Google seem likely able to exfiltrate model weights without too much of an issue (I am not that confident of this, but it’s my best guess from having thought about the topic for a few dozen hours now, and having decent familiarity with cybersecurity considerations).
This I think puts Anthropic in violation of its RSP, even given your clarifications, since you have now clarified those would not be considered “sophisticated insiders” and so are not exempt (with some uncertainties I go into at the end of this comment).
Sorry, I should have phrased things more clearly here. Let me try again:
I am describing a cybersecurity attack surface. Of course for the purpose of those attacks, we can assume that the attacker is willing to do things they are not authorized to do. People being willing to commit cybercrimes is one of the basic premises of the cybersecurity section of the RSP.
I am here describing an attack vector where anyone who has physical access to those systems is likely capable of exfiltrating model weights, at least as long as they get some amount of executive buy-in to circumvent supervision. It is extremely unlikely for most of the people who fit this description to have authorized access to model weights. As such, it is unclear what the relevance of people having “authorized access” is.
I also am honestly surprised that anyone at Google or Amazon or Microsoft is considered to have authorized access to weights. That itself is an update to me! I would have assumed nobody at those companies was allowed to look at the weights, or make inferences about e.g. the underlying architecture, etc.
I am now thinking you believe something like this: Yes, there are many people with physical access, but even if they succeed at exfiltrating the weights, realistically in order for them to do anything with the weights, word would reach the highest executive levels at these tech companies, and the highest executive levels at these tech companies all have authorized access to model weights. I.e. currently Satya or Demis have authorized access to model weights (not because you want to give them authorized access, but just because giving them authorized access is a necessity for using their compute infrastructure), and as such are considered sophisticated insiders.
Honestly, I find the idea of considering Satya or Demis “sophisticated insiders with authorized access to model weights” very confusing. Like, at least taken on its own that has pretty big implications about race dynamics and technical diffusion between frontier labs, since apparently Anthropic wouldn’t consider it a security incident if Satya or Demis were to download Claude weights to a personal computer of theirs and reverse-engineer architectural details from them (since I am inferring them both to have “authorized access”).
My guess is reality is complicated in a bunch of messy ways that is hard to capture with the RSP here. I do appreciate you taking the time to clarify things.
There are a number of things you say here that don’t seem right to me and/or aren’t capturing the intent of what I said. I prefer not to get into all of it, but just a couple of notes:
My current impression is that we are “highly protected against most attackers’ attempts at stealing model weights,” specifically highly protected against the groups listed as “in scope” (which I think of as including employees at partner orgs who have physical access to machines but not authorized access to weights), and broadly in line with the letter and spirit of the ASL-3 Security Standard. This isn’t my call and I am not up on all of the details of how we’ve vetted the security controls for partners, but it’s my impression.
An attacker being out of scope for the ASL-3 Security Standard does not meant “Anthropic wouldn’t consider it a security incident” if they stole (i.e., exfiltrated/improperly used) important assets.
Thank you for the clarification!
In particular the first bullet point seems important and clear. I currently think this is unlikely to be true (assuming that e.g. most people in datacenter management and executives at these companies do not have authorized access to weights), but I don’t really know how to progress from here. I might write more if I happen to talk to more people in the field about it.
That makes sense, though to be clear I was not trying to equate those two. I was saying “Anthropic wouldn’t consider it a security incident if someone with authorized model access were to use those weights how they see fit”. I.e. I was equating authorized access with Anthropic wouldn’t consider it a security incident if they did stuff with the weights.
But thinking more about it, it does seem like there is a natural difference between “authorized access to model weights” and “authorized to transfer model weights to new machines” or “authorized to perform operations on model weights without extensive logging”, and it makes sense to treat the latter as a security breach even if someone is authorized to access model weights in some sense.
This still leaves me in a kind of confused spot with regards to the security model here. From my perspective this still leaves hundreds of people[1] in the world who have both opportunity and motive to gain access to Anthropic model weights, with a bunch of people clearly outside of Anthropic and with misaligned interests to Anthropic being labeled “sophisticated insiders” and therefore excluded from the threat model in a way that really isn’t obvious from reading the RSP.
And it’s not like I have no sympathy for the difficulty of getting this all right, but the attack surface here feels very different than the one I was expecting to be covered when reading the RSP.
Overall, again, thanks for you taking the time to clarify things here. Given the first point it does seem like we have a disagreement about whether Anthropic is currently meeting its commitments, but it’s not super clear whether it’s worth either of our time to dig into it more.
Maybe only tens, since I don’t actually know who you currently consider to have authorized access to model weights at these other companies, which I think would be less concerning, though doesn’t change things that much if it e.g. includes all the top-level executives at these other companies who have the biggest motive.
Nothing more to add for now, thanks for the response!
These systems are designed to resist individual operators subverting controls—competently built cloud infrastructure doesn’t allow subversion of access controls to production systems even with physical access to data halls. I’ll speak to AWS’s controls in particular as an example, but I want to emphasize that this is a metonym for any competently run CSP.
AWS’s Nitro System is specifically architected with “zero operator access”—there is no mechanism for any AWS personnel, including those with the highest privileges, to access customer data. These are designed and tested technical restrictions built into the hardware itself, not policy controls that can be overridden. The system uses tamper-resistant TPMs with hardware roots of trust, and there is no equivalent of a “root” user or administrative bypass—even for maintenance. This has been independently validated by NCC Group, who found “no gaps in the Nitro System that would compromise these security claims” and “no indication that a cloud service provider employee can obtain such access...to any host.” You may also enjoy as a bonus a quick read through the Mantle whitepaper.
The assumption that datacenter executives could “just walk up to” machines and exfiltrate data conflates physical proximity with system access. Physical access to a server room doesn’t necessarily grant access to customer data.
You can’t just walk up, but there is an extremely long history of easily available exploits given unlimited hardware access to systems, and the database center hardware stack is not up to the task (yet). Indeed, Anthropic themselves published a whitepaper outlining what would be necessary for datacenters to actually promise security even with physical hardware violations, which IMO clearly implies they do not think current data-centers meet that requirement!
Like, this is not an impossible problem to solve, but based on having engaged with the literature here a good amount, and having talked to a bunch of people with experience in the space, my strong sense is that if you gave me unlimited hardware access to the median rack that has Anthropic model weights on it while it is processing them, it would only require a mildly sophisticated cybersecurity team to access the weights unencrypted.