If we cannot leave this up to the companies, what actions are you taking (and, as a separate question, what actions are Anthropic taking) to improve the chances that we don’t do this?
Uh, doing RSP v3?[1] Exactly the kind of stuff in the new RSP seems like a bunch of great actions a company can take for trying to promote third-party auditing ecosystems and external legislation: have a bunch of transparency about the risks posed by one’s systems, outline what kinds of mitigations would be needed at an industry-wide level to keep risks low, solicit robust third party evaluators to be in a position to evaluate AI companies’ safety cases and publicly remark on their adequancy, etc.
Is there anyone following the output of Anthropic that thinks the company’s stance is “leaving AI development up to whatever the companies feel like will definitely go well”? I really feel like this is not something that has been particularly ambiguous!
TBC I think you can totally argue for specific ways Anthropic should do more of this, or pursue it in more effective ways, but I think it’s incredibly obvious that they are doing quite a lot to try and facilitate a world in which AI companies have meaningful checks on their behavior.
First off, thanks for providing that list. I appreciate it. I do disagree with your last sentence, and I’ll write out why.
There are a couple of assumptions laid out in my stance here which I ought to make explicit. These assumptions are:
If the race to ASI is not stopped, there is an unacceptable chance that it gets us all killed. Anything that cannot and will not do this is insufficient.
Anthropic will not voluntarily decide to do this in the absence of a binding requirement, and will not actively advocate for this to be done either.
Thus, my TL:DR is: Having a bunch of voluntary ways of gathering information about the risk of AI systems is not actually going to stop them from rushing headlong into danger. I don’t think that Anthropic is facilitating a world where AI companies have meaningful checks on their behavior, because I don’t think Anthropic views any of these requirements as “meaningful checks”. It doesn’t stop them doing the one thing they most want to do—continue to train and deploy ever more powerful models that bring us closer to truly dangerous territory.
None of these criticisms are unique to Anthropic—they apply to all the frontier AI companies, but I don’t think Anthropic is doing meaningfully better on addressing this than anyone else, in the sense of being considerably more likely to break my assumptions above than anyone else. There are several ways Anthropic is unusually responsible in this space, such as Claude’s constitution, but I do not consider them as significant to the above assumptions, which are by far the most important. I know I’m hammering on about this a lot, but my views probably don’t make a lot of sense without keeping this in mind.
TL:DR ends here.
With that in mind, we can now take a look at the above items through this lens:
Promote third-party auditing systems / solicit third party evaluators: This still leaves the decision up to Anthropic. Having this seems better than not having it, but the way I read Holden’s statement is “If the AI companies get to make the call, they are unacceptably likely to get it wrong”. I agree with the statement in the previous sentence. Having third-party auditors doesn’t do this. In my view of the world, it doesn’t even really give us a saving throw—I do not imagine a situation where METR / Apollo / UK AISI tell Anthropic “This model is dangerous, do not deploy it under any circumstances” and Anthropic actually listens and avoids deploying it. Having third-party evaluators is great for Anthropic, as they get useful information about model capabilities, they appear to take safety seriously, but they are never actually compelled to make costly decisions at any point.
Having a bunch of transparency about the risks: Similar to above, except this time it’s not even a third-party auditor so you have an additional step of Anthropic needing to say out loud that something is unacceptably dangerous before you reach the step of them choosing whether or not to act. It’s in the same arena. Supporting SB 53 falls under this category.
Outline what kind of mitigations would be needed at an industry-wide level: Outlining it is not the same as doing it. I think that a mitigation that involves delaying a new model on the order of months (or, God forbid, not training a new one at all) will be prohibitively expensive and promptly abandoned when the reality sets in that this is the choice. And a mitigation that never leads to this choice at all is not going to be enough.
Funding PAC’s to support regulation: What does Anthropic themselves say this does? Here is a direct quote: In circumstances like these, we need good policy: flexible regulation that allows us to reap the benefits of AI, keep the risks in check, and keep America ahead in the AI race.
Under my own assumptions, that I’ve mentioned above, this can be read as:
Flexible: Please don’t bind us in advance to making costly decisions. Reap the benefits of AI: Let us have market share. Keep America ahead in the AI race: Let us have market share and more chips. There is also very much the worry of authoritarian governments in there, but certainly “Keep America ahead in the AI race” is not the kind of rhetoric that helps stop the AI race. Keep the risks in check: Let’s look at the next sentence for that one. What do they say this means?
That means keeping critical AI technology out of the hands of America’s adversaries, maintaining meaningful safeguards, promoting job growth, protecting children, and demanding real transparency from the companies building the most powerful AI models.
Maintaining meaningful safeguards: Human misuse is the problem. We implicitly dodge the idea that it might be the AI system itself that is inherently unsafe. Promoting job growth: I assume this means using AI for productivity, aka, help advance adoption of our products. Protecting children: Avoid CSAM. Straightforwardly good, but doesn’t meaningfully impact the race to ASI. Demand real transparency: See the above section on transparency.
Adding this all up, I don’t think this makes any ask that would risk binding them to the kind of costly decisions they want to avoid, which are the same costly decisions that could actually prevent rushing to ASI as fast as possible. (Maybe these actions slow things down a little on the margin—after all, non-zero resources are spent on them! But I don’t see it as making a meaningful difference)
Opposing state moratoriums is a straightforwardly positive action and I think Anthropic is doing the right thing by doing this. I appreciate this, but I do not think it is enough to prevent the outcomes I’m most worried about. From my point of view, approximately none of this is useful to the core problem of “Humanity is racing to unacceptably dangerous ASI as fast as possible”. And if it doesn’t address the core problem, it’s not a meaningful check. Thus, I don’t think Anthropic is doing quite a lot on the one axis that really matters, and this is why I disagree.
Is there anyone following the output of Anthropic that thinks the company’s stance is “leaving AI development up to whatever the companies feel like will definitely go well”? I really feel like this is not something that has been particularly ambiguous!
It feels pretty ambiguous to me – I realize Anthropic does some stuff consistent with saying the opposite of this, but, the public comms around it (including Dario’s recent Thinkpiece and the SB 1047 comments) seem quite cagey, and just really don’t look like the comms I would expect from an org who was really all that worried about leaving it up to the companies.
The charitable interpretations I can imagine are:
a) “Dario thinks the risk from most ways the government might intervene heavily are worse than the risks of just leaving it up to the companies as long as possible.” (Which is plausible to me but I think still counts pretty clearly as “ambiguous” given that in practice he opposes serious checks on companies at the moment)
b) “Dario thinks there is some secret 3D chess he should be doing where he conserves weirdness points until later”.
The uncharitable interpretations I can think of are:
c) “Most of Anthropic’s efforts here better thought of as PR campaigns and elaborate virtue signaling”,
d), “Dario explicitly thinks the charitable interpretation A (or I guess B), above, but, also is extremely biased/deluded/has-poor-judgment when it comes to implementing it.”
(or, like, a mix of all 4)
I think it’s reasonable to argue it is “at least ambiguous, as opposed to overdeterminedly fake”, but, the prior on companies being fake, misleading, deluded here is just really high, yes even when they have a missionary vibe.
Re b) - could Dario have an altruistic incentive to promote the company’s success, and specifically the road to IPO, given the 80% pledges and the employee donation matching and everything? Claude suggests, back of the envelope, that the donations might represent something like roughly an order of magnitude increase in yearly spending on AI safety compared to right now. Maybe there’s a frame like: the transformative impact of that money makes making hedged public statements about AI risk, and softening some of the company’s stances to be more business-compatible (to increase the odds of a good IPO) not seem so bad?
I don’t know if I actually endorse this. I don’t know the actual cause allocation the donators have planned. And if I were holding Anthropic equity and using this line of reasoning to help make decisions, I’d be worried about the conflict of interest biasing my reasoning. But it’s an interpretation that sticks out to me.
I absolutely agree Anthropic public comms and revealed preferences are far from maximally “there should be extremely strong regulation passed right now, don’t give companies any leeway”. I think it’s super reasonable to say “I think the correct point on the spectrum of leaving decisionmaking up to companies and/or future legislation versus current policy is a more heavily-regulated one than what Anthropic appears to be going for, and they should advocate for X and Y instead of Z”.
I just think it’s very clear that the point Anthropic lies at on this spectrum is clearly on the side of “the status quo poses unacceptable risks and should be more regulated”, as contrasted with a bunch of other actors. Like TBC I am not trying to stake out the claim “Anthropic’s policy views are optimal” or whatever, I don’t think they are and I personally would prefer somewhat more pause-flavored rhetoric from Anthropic, I just think it’s silly to be like “this one sentence in a blog post is the only time Anthropic has signaled that there might be something at all concerning about leaving it up to the companies, when will they do anything else to help”.
It’s not just any blog post. It’s a blog post outlining a new major strategical shift in the company, specifically in the direction of giving Anthropic far more leeway over how they decide what the risk is and how to deal with it. It seems especially important to state “we can’t leave this up to the companies” loudly and clearly here.
Uh, doing RSP v3?[1] Exactly the kind of stuff in the new RSP seems like a bunch of great actions a company can take for trying to promote third-party auditing ecosystems and external legislation: have a bunch of transparency about the risks posed by one’s systems, outline what kinds of mitigations would be needed at an industry-wide level to keep risks low, solicit robust third party evaluators to be in a position to evaluate AI companies’ safety cases and publicly remark on their adequancy, etc.
Is there anyone following the output of Anthropic that thinks the company’s stance is “leaving AI development up to whatever the companies feel like will definitely go well”? I really feel like this is not something that has been particularly ambiguous!
TBC I think you can totally argue for specific ways Anthropic should do more of this, or pursue it in more effective ways, but I think it’s incredibly obvious that they are doing quite a lot to try and facilitate a world in which AI companies have meaningful checks on their behavior.
And supporting regulation, funding PACs to support regulation, piloting third party audits of safety cases, opposing state regulation moratoriums, …
First off, thanks for providing that list. I appreciate it. I do disagree with your last sentence, and I’ll write out why.
There are a couple of assumptions laid out in my stance here which I ought to make explicit. These assumptions are:
If the race to ASI is not stopped, there is an unacceptable chance that it gets us all killed. Anything that cannot and will not do this is insufficient.
Anthropic will not voluntarily decide to do this in the absence of a binding requirement, and will not actively advocate for this to be done either.
Thus, my TL:DR is: Having a bunch of voluntary ways of gathering information about the risk of AI systems is not actually going to stop them from rushing headlong into danger. I don’t think that Anthropic is facilitating a world where AI companies have meaningful checks on their behavior, because I don’t think Anthropic views any of these requirements as “meaningful checks”. It doesn’t stop them doing the one thing they most want to do—continue to train and deploy ever more powerful models that bring us closer to truly dangerous territory.
None of these criticisms are unique to Anthropic—they apply to all the frontier AI companies, but I don’t think Anthropic is doing meaningfully better on addressing this than anyone else, in the sense of being considerably more likely to break my assumptions above than anyone else. There are several ways Anthropic is unusually responsible in this space, such as Claude’s constitution, but I do not consider them as significant to the above assumptions, which are by far the most important. I know I’m hammering on about this a lot, but my views probably don’t make a lot of sense without keeping this in mind.
TL:DR ends here.
With that in mind, we can now take a look at the above items through this lens:
Promote third-party auditing systems / solicit third party evaluators: This still leaves the decision up to Anthropic. Having this seems better than not having it, but the way I read Holden’s statement is “If the AI companies get to make the call, they are unacceptably likely to get it wrong”. I agree with the statement in the previous sentence. Having third-party auditors doesn’t do this. In my view of the world, it doesn’t even really give us a saving throw—I do not imagine a situation where METR / Apollo / UK AISI tell Anthropic “This model is dangerous, do not deploy it under any circumstances” and Anthropic actually listens and avoids deploying it. Having third-party evaluators is great for Anthropic, as they get useful information about model capabilities, they appear to take safety seriously, but they are never actually compelled to make costly decisions at any point.
Having a bunch of transparency about the risks: Similar to above, except this time it’s not even a third-party auditor so you have an additional step of Anthropic needing to say out loud that something is unacceptably dangerous before you reach the step of them choosing whether or not to act. It’s in the same arena. Supporting SB 53 falls under this category.
Outline what kind of mitigations would be needed at an industry-wide level: Outlining it is not the same as doing it. I think that a mitigation that involves delaying a new model on the order of months (or, God forbid, not training a new one at all) will be prohibitively expensive and promptly abandoned when the reality sets in that this is the choice. And a mitigation that never leads to this choice at all is not going to be enough.
Funding PAC’s to support regulation: What does Anthropic themselves say this does? Here is a direct quote: In circumstances like these, we need good policy: flexible regulation that allows us to reap the benefits of AI, keep the risks in check, and keep America ahead in the AI race.
Under my own assumptions, that I’ve mentioned above, this can be read as:
Flexible: Please don’t bind us in advance to making costly decisions.
Reap the benefits of AI: Let us have market share.
Keep America ahead in the AI race: Let us have market share and more chips. There is also very much the worry of authoritarian governments in there, but certainly “Keep America ahead in the AI race” is not the kind of rhetoric that helps stop the AI race.
Keep the risks in check: Let’s look at the next sentence for that one. What do they say this means?
That means keeping critical AI technology out of the hands of America’s adversaries, maintaining meaningful safeguards, promoting job growth, protecting children, and demanding real transparency from the companies building the most powerful AI models.
Maintaining meaningful safeguards: Human misuse is the problem. We implicitly dodge the idea that it might be the AI system itself that is inherently unsafe.
Promoting job growth: I assume this means using AI for productivity, aka, help advance adoption of our products.
Protecting children: Avoid CSAM. Straightforwardly good, but doesn’t meaningfully impact the race to ASI.
Demand real transparency: See the above section on transparency.
Adding this all up, I don’t think this makes any ask that would risk binding them to the kind of costly decisions they want to avoid, which are the same costly decisions that could actually prevent rushing to ASI as fast as possible. (Maybe these actions slow things down a little on the margin—after all, non-zero resources are spent on them! But I don’t see it as making a meaningful difference)
Opposing state moratoriums is a straightforwardly positive action and I think Anthropic is doing the right thing by doing this. I appreciate this, but I do not think it is enough to prevent the outcomes I’m most worried about. From my point of view, approximately none of this is useful to the core problem of “Humanity is racing to unacceptably dangerous ASI as fast as possible”. And if it doesn’t address the core problem, it’s not a meaningful check. Thus, I don’t think Anthropic is doing quite a lot on the one axis that really matters, and this is why I disagree.
It feels pretty ambiguous to me – I realize Anthropic does some stuff consistent with saying the opposite of this, but, the public comms around it (including Dario’s recent Thinkpiece and the SB 1047 comments) seem quite cagey, and just really don’t look like the comms I would expect from an org who was really all that worried about leaving it up to the companies.
The charitable interpretations I can imagine are:
a) “Dario thinks the risk from most ways the government might intervene heavily are worse than the risks of just leaving it up to the companies as long as possible.” (Which is plausible to me but I think still counts pretty clearly as “ambiguous” given that in practice he opposes serious checks on companies at the moment)
b) “Dario thinks there is some secret 3D chess he should be doing where he conserves weirdness points until later”.
The uncharitable interpretations I can think of are:
c) “Most of Anthropic’s efforts here better thought of as PR campaigns and elaborate virtue signaling”,
d), “Dario explicitly thinks the charitable interpretation A (or I guess B), above, but, also is extremely biased/deluded/has-poor-judgment when it comes to implementing it.”
(or, like, a mix of all 4)
I think it’s reasonable to argue it is “at least ambiguous, as opposed to overdeterminedly fake”, but, the prior on companies being fake, misleading, deluded here is just really high, yes even when they have a missionary vibe.
Re b) - could Dario have an altruistic incentive to promote the company’s success, and specifically the road to IPO, given the 80% pledges and the employee donation matching and everything? Claude suggests, back of the envelope, that the donations might represent something like roughly an order of magnitude increase in yearly spending on AI safety compared to right now. Maybe there’s a frame like: the transformative impact of that money makes making hedged public statements about AI risk, and softening some of the company’s stances to be more business-compatible (to increase the odds of a good IPO) not seem so bad?
I don’t know if I actually endorse this. I don’t know the actual cause allocation the donators have planned. And if I were holding Anthropic equity and using this line of reasoning to help make decisions, I’d be worried about the conflict of interest biasing my reasoning. But it’s an interpretation that sticks out to me.
I absolutely agree Anthropic public comms and revealed preferences are far from maximally “there should be extremely strong regulation passed right now, don’t give companies any leeway”. I think it’s super reasonable to say “I think the correct point on the spectrum of leaving decisionmaking up to companies and/or future legislation versus current policy is a more heavily-regulated one than what Anthropic appears to be going for, and they should advocate for X and Y instead of Z”.
I just think it’s very clear that the point Anthropic lies at on this spectrum is clearly on the side of “the status quo poses unacceptable risks and should be more regulated”, as contrasted with a bunch of other actors. Like TBC I am not trying to stake out the claim “Anthropic’s policy views are optimal” or whatever, I don’t think they are and I personally would prefer somewhat more pause-flavored rhetoric from Anthropic, I just think it’s silly to be like “this one sentence in a blog post is the only time Anthropic has signaled that there might be something at all concerning about leaving it up to the companies, when will they do anything else to help”.
It’s not just any blog post. It’s a blog post outlining a new major strategical shift in the company, specifically in the direction of giving Anthropic far more leeway over how they decide what the risk is and how to deal with it. It seems especially important to state “we can’t leave this up to the companies” loudly and clearly here.