I at least didn’t interpret this poll to mean that deploying it was reasonable. I think given past Anthropic commitments it was pretty unreasonable (violating your deployment commitments seems really quite bad, and is IMO one of the most central things that Anthropic should be judged on). It’s just not really clear whether it directly increased risk. I would be quite sad if that poll result would be seen as something like “approval of whether Anthropic made the right call”.
Sorry for using the poll to support a different proposition. Edited.
To make sure I understand your position (and Ben’s):
Dario committed to Dustin that Anthropic wouldn’t “meaningfully advance the frontier” (according to Dustin)
Anthropic senior staff privately gave AI safety people the impression that Anthropic would stay behind/at the frontier (although nobody has quotes)
Claude 3 Opus meaningfully advanced the frontier? Or slightly advanced it but Anthropic markets it like it was a substantial advance so they’re being similarly low-integrity?
...I don’t think Anthropic violated its deployment commitments. I mostly believe y’all about 2—I didn’t know 2 until people asserted it right after the Claude 3 release, but I haven’t been around the community, much less well-connected in it, for long—but that feels like an honest miscommunication to me. If I’m missing “past Anthropic commitments” please point to them.
I mostly believe y’all about 2—I didn’t know 2 until people asserted it right after the Claude 3 release, but I haven’t been around the community, much less well-connected in it, for long—but that feels like an honest miscommunication to me.
For the record, I have been around the community for a long time (since before Anthropic existed), in a very involved way, and I had also basically never heard of this before the Claude 3 release. I can recall only one time where I ever heard someone mention something like this, it was a non-Anthropic person who said they heard it from someone else who was a non-Anthropic person, they asked me if I had heard the same thing, and I said no. So it certainly seems clear given all the reports that this was a real rumour that was going around, but it was definitely not the case that this was just an obvious thing that everyone in the community knew about or that Anthropic senior staff were regularly saying (I talked regularly to a lot of Anthropic senior staff before I joined Anthropic and I never heard anyone say this).
That seems concerning! Did you follow up with the leadership of your organization to understand to what degree they seem to have been making different (and plausibly contradictory) commitments to different interest groups?
It seems like it’s quite important to know what promises your organization has made to whom, if you are trying to assess whether you working there will positively or negatively effect how AI will go.
(Note, I talked with Evan about this in private some other times, so the above comment is more me bringing a private conversation into the public realm than me starting a whole conversation about this. I’ve already poked Evan privately asking him to please try to get better confirmation of the nature of the commitments made here, but he wasn’t interested at the time, so I am making the same bid publicly.)
I think it was an honest miscommunication coupled to a game of telephone—the sort of thing that inevitably happens sometimes—but not something that I feel particularly concerned about.
I would take pretty strong bets that that isn’t what happened based on having talked to more people about this. Happy to operationalize and then try to resolve it.
Scenario 1, Active Lying– Anthropic staff were actively spreading the idea that they would not push the frontier.
Scenario 2,Allowing misconceptions to go unchecked– Anthropic staff were aware that many folks in the AIS world thought that Anthropic had committed to not pushing the frontier, and they allowed this misconception to go unchecked, perhaps because they realized that it was a misconception that favored their commercial/competitive interests.
Scenario 3, Not being aware– Anthropic staff were not aware that many folks had this belief. Maybe they heard it once or twice but it never really seemed like a big deal.
Scenario 1 is clearly bad. Scenarios 2 and 3 are more interesting. To what extent does Anthropic have the responsibility to clarify misconceptions (avoid scenario 2) and even actively look for misconceptions (avoid scenario 3)?
I expect this could matter tangibly for discussions of RSPs. My opinion is that the Anthropic RSP is written in such a way that readers can come away with rather different expectations of what kinds of circumstances would cause Anthropic to pause/resume.
It wouldn’t be very surprising to me if we end up seeing a situation where many readers say “hey look, we’ve reached an ASL-3 system, so now you’re going to pause, right?” And then Anthropic says “no no, we have sufficient safeguards– we can keep going now.” And then some readers say “wait a second– what? I’m pretty sure you committed to pausing until your safeguards were better than that.” And then Anthropic says “no… we never said exactly what kinds of safeguards we would need, and our leadership’s opinion is that our safeguards are sufficient, and the RSP allows leadership to determine when it’s fine to proceed.”
In this (hypothetical) scenario, Anthropic never lied, but it benefitted from giving off a more cautious impression, and it didn’t take steps to correct this impression.
I think avoiding these kinds of scenarios requires some mix of:
Clear, specific falsifiable statements on behalf of labs.
Some degree of proactive attempts to identify and alleviate misconceptions.
One counterargument is something like “Anthropic is a company, and there are lots of things to do, and this is is demanding an unusually high amount of attention-to-detail and proactive communication that is not typically expected of companies.” To which my response is something like “yes, but I think it’s reasonable to hold companies to such standards if they wish to develop AGI. I think we ought to hold Anthropic and other labs to this standard, especially insofar as they want the benefits associated with being perceived as the kind of safety-conscious lab that refuses to push the frontier or commits to scaling policies that include tangible/concrete plans to pause.”
I have been around the community for a long time (since before Anthropic existed), in a very involved way, and I had also basically never heard of this before the Claude 3 release.… So… it was definitely not the case that this was just an obvious thing that everyone in the community knew about or that Anthropic senior staff were regularly saying
For the record a different Anthropic staffer told me confidently that it was widespread in 2022, the year before you joined, so I think you’re wrong here.
(The staffer preferred that I not quote them verbatim in public so I’ve DM’d you a direct quote.)
Summarizing from the private conversation: the information there is not new to me and I don’t think your description of what they said is accurate.
As I’ve said previously, Anthropic people certainly went around saying things like “we want to think carefully about when to do releases and try to advance capabilities for the purpose of doing safety”, but it was always extremely clear at least to me that these were not commitments, just general thoughts about strategy, and I am very confident that was what was being referred to as being widespread in 2022 here.
Claude 3 Opus meaningfully advanced the frontier? Or slightly advanced it but Anthropic markets it like it was a substantial advance so they’re being similarly low-integrity?
I updated somewhat over the following weeks that Opus had meaningfully advanced the frontier, but I don’t know how much that is true for other people.
It seems like Anthropic’s marketing is in direct contradiction with the explicit commitment they made to many people, including Dustin, which seems to have quite consistently been the “meaningfully advance the frontier” line. I think it’s less clear whether their actual capabilities are, as opposed to their marketing statements. I think if you want to have any chance of enforcing commitments like this, the enforcement needs to happen at the latest when the organization publicly claims to have done something in direct contradiction to it, so I think the marketing statements matter a bunch here.
Anthropic has also continued to publish ads claiming that Claude 3 has meaningfully pushed the state of the art and is the smartest model on the market since the discussion around this happened, so it’s not just a one-time oversight by their marketing department.
Separately, multiple Anthropic staffers seem to think themselves no longer bound by their previous commitment and expect that Anthropic will likely unambiguously advance the frontier if they get the chance.
I guess I’m more willing to treat Anthropic’s marketing as not-representing-Anthropic. Shrug. [Edit: like, maybe it’s consistent-with-being-a-good-guy-and-basically-honest to exaggerate your product in a similar way to everyone else. (You risk the downsides of creating hype but that’s a different issue than the integrity thing.)]
It is disappointing that Anthropic hasn’t clarified its commitments after the post-launch confusion, one way or the other.
I guess I’m more willing to treat Anthropic’s marketing as not-representing-Anthropic. Shrug.
I feel sympathetic to this, but when I think of the mess of trying to hold an organization accountable when I literally can’t take the public statements of the organization itself as evidence, then that feels kind of doomed to me. It feels like it would allow Anthropic to weasel itself out of almost any commitment.
I guess I’m more willing to treat Anthropic’s marketing as not-representing-Anthropic.
Like, when OpenAI marketing says GPT-4 is our most aligned model yet! you could say this shows that OpenAI deeply misunderstands alignment but I tend to ignore it. Even mostly when Sam Altman says it himself.
[Edit after habryka’s reply: my weak independent impression is that often the marketing people say stuff that the leadership and most technical staff disagree with, and if you use marketing-speak to substantially predict what-leadership-and-staff-believe you’ll make worse predictions.]
Oh, I have indeed used this to update that OpenAI deeply misunderstands alignment, and this IMO has allowed me to make many accurate predictions about what OpenAI has been doing over the last few years, so I feel good about interpreting it that way.
I at least didn’t interpret this poll to mean that deploying it was reasonable. I think given past Anthropic commitments it was pretty unreasonable (violating your deployment commitments seems really quite bad, and is IMO one of the most central things that Anthropic should be judged on). It’s just not really clear whether it directly increased risk. I would be quite sad if that poll result would be seen as something like “approval of whether Anthropic made the right call”.
Sorry for using the poll to support a different proposition. Edited.
To make sure I understand your position (and Ben’s):
Dario committed to Dustin that Anthropic wouldn’t “meaningfully advance the frontier” (according to Dustin)
Anthropic senior staff privately gave AI safety people the impression that Anthropic would stay behind/at the frontier (although nobody has quotes)
Claude 3 Opus meaningfully advanced the frontier? Or slightly advanced it but Anthropic markets it like it was a substantial advance so they’re being similarly low-integrity?
...I don’t think Anthropic violated its deployment commitments. I mostly believe y’all about 2—I didn’t know 2 until people asserted it right after the Claude 3 release, but I haven’t been around the community, much less well-connected in it, for long—but that feels like an honest miscommunication to me. If I’m missing “past Anthropic commitments” please point to them.
For the record, I have been around the community for a long time (since before Anthropic existed), in a very involved way, and I had also basically never heard of this before the Claude 3 release. I can recall only one time where I ever heard someone mention something like this, it was a non-Anthropic person who said they heard it from someone else who was a non-Anthropic person, they asked me if I had heard the same thing, and I said no. So it certainly seems clear given all the reports that this was a real rumour that was going around, but it was definitely not the case that this was just an obvious thing that everyone in the community knew about or that Anthropic senior staff were regularly saying (I talked regularly to a lot of Anthropic senior staff before I joined Anthropic and I never heard anyone say this).
That seems concerning! Did you follow up with the leadership of your organization to understand to what degree they seem to have been making different (and plausibly contradictory) commitments to different interest groups?
It seems like it’s quite important to know what promises your organization has made to whom, if you are trying to assess whether you working there will positively or negatively effect how AI will go.
(Note, I talked with Evan about this in private some other times, so the above comment is more me bringing a private conversation into the public realm than me starting a whole conversation about this. I’ve already poked Evan privately asking him to please try to get better confirmation of the nature of the commitments made here, but he wasn’t interested at the time, so I am making the same bid publicly.)
I think it was an honest miscommunication coupled to a game of telephone—the sort of thing that inevitably happens sometimes—but not something that I feel particularly concerned about.
I would take pretty strong bets that that isn’t what happened based on having talked to more people about this. Happy to operationalize and then try to resolve it.
Here are three possible scenarios:
Scenario 1, Active Lying– Anthropic staff were actively spreading the idea that they would not push the frontier.
Scenario 2, Allowing misconceptions to go unchecked– Anthropic staff were aware that many folks in the AIS world thought that Anthropic had committed to not pushing the frontier, and they allowed this misconception to go unchecked, perhaps because they realized that it was a misconception that favored their commercial/competitive interests.
Scenario 3, Not being aware– Anthropic staff were not aware that many folks had this belief. Maybe they heard it once or twice but it never really seemed like a big deal.
Scenario 1 is clearly bad. Scenarios 2 and 3 are more interesting. To what extent does Anthropic have the responsibility to clarify misconceptions (avoid scenario 2) and even actively look for misconceptions (avoid scenario 3)?
I expect this could matter tangibly for discussions of RSPs. My opinion is that the Anthropic RSP is written in such a way that readers can come away with rather different expectations of what kinds of circumstances would cause Anthropic to pause/resume.
It wouldn’t be very surprising to me if we end up seeing a situation where many readers say “hey look, we’ve reached an ASL-3 system, so now you’re going to pause, right?” And then Anthropic says “no no, we have sufficient safeguards– we can keep going now.” And then some readers say “wait a second– what? I’m pretty sure you committed to pausing until your safeguards were better than that.” And then Anthropic says “no… we never said exactly what kinds of safeguards we would need, and our leadership’s opinion is that our safeguards are sufficient, and the RSP allows leadership to determine when it’s fine to proceed.”
In this (hypothetical) scenario, Anthropic never lied, but it benefitted from giving off a more cautious impression, and it didn’t take steps to correct this impression.
I think avoiding these kinds of scenarios requires some mix of:
Clear, specific falsifiable statements on behalf of labs.
Some degree of proactive attempts to identify and alleviate misconceptions.
One counterargument is something like “Anthropic is a company, and there are lots of things to do, and this is is demanding an unusually high amount of attention-to-detail and proactive communication that is not typically expected of companies.” To which my response is something like “yes, but I think it’s reasonable to hold companies to such standards if they wish to develop AGI. I think we ought to hold Anthropic and other labs to this standard, especially insofar as they want the benefits associated with being perceived as the kind of safety-conscious lab that refuses to push the frontier or commits to scaling policies that include tangible/concrete plans to pause.”
For the record a different Anthropic staffer told me confidently that it was widespread in 2022, the year before you joined, so I think you’re wrong here.
(The staffer preferred that I not quote them verbatim in public so I’ve DM’d you a direct quote.)
Summarizing from the private conversation: the information there is not new to me and I don’t think your description of what they said is accurate.
As I’ve said previously, Anthropic people certainly went around saying things like “we want to think carefully about when to do releases and try to advance capabilities for the purpose of doing safety”, but it was always extremely clear at least to me that these were not commitments, just general thoughts about strategy, and I am very confident that was what was being referred to as being widespread in 2022 here.
I updated somewhat over the following weeks that Opus had meaningfully advanced the frontier, but I don’t know how much that is true for other people.
It seems like Anthropic’s marketing is in direct contradiction with the explicit commitment they made to many people, including Dustin, which seems to have quite consistently been the “meaningfully advance the frontier” line. I think it’s less clear whether their actual capabilities are, as opposed to their marketing statements. I think if you want to have any chance of enforcing commitments like this, the enforcement needs to happen at the latest when the organization publicly claims to have done something in direct contradiction to it, so I think the marketing statements matter a bunch here.
Anthropic has also continued to publish ads claiming that Claude 3 has meaningfully pushed the state of the art and is the smartest model on the market since the discussion around this happened, so it’s not just a one-time oversight by their marketing department.
Separately, multiple Anthropic staffers seem to think themselves no longer bound by their previous commitment and expect that Anthropic will likely unambiguously advance the frontier if they get the chance.
Thanks.
I guess I’m more willing to treat Anthropic’s marketing as not-representing-Anthropic. Shrug. [Edit: like, maybe it’s consistent-with-being-a-good-guy-and-basically-honest to exaggerate your product in a similar way to everyone else. (You risk the downsides of creating hype but that’s a different issue than the integrity thing.)]
It is disappointing that Anthropic hasn’t clarified its commitments after the post-launch confusion, one way or the other.
I feel sympathetic to this, but when I think of the mess of trying to hold an organization accountable when I literally can’t take the public statements of the organization itself as evidence, then that feels kind of doomed to me. It feels like it would allow Anthropic to weasel itself out of almost any commitment.
Like, when OpenAI marketing says GPT-4 is our most aligned model yet! you could say this shows that OpenAI deeply misunderstands alignment but I tend to ignore it. Even mostly when Sam Altman says it himself.
[Edit after habryka’s reply: my weak independent impression is that often the marketing people say stuff that the leadership and most technical staff disagree with, and if you use marketing-speak to substantially predict what-leadership-and-staff-believe you’ll make worse predictions.]
Oh, I have indeed used this to update that OpenAI deeply misunderstands alignment, and this IMO has allowed me to make many accurate predictions about what OpenAI has been doing over the last few years, so I feel good about interpreting it that way.