Periodically I’ve considered writing a post similar to this. A piece that I think this doesn’t fully dive into is “did Anthropic have a commitment not to push the capability frontier?”.
I had once written a doc aimed at Anthropic employees, during SB 1047 Era, when I had been felt like Anthropic was advocating for changes to the law that were hard to interpret un-cynically.[1] I’ve had a vague intention to rewrite this into a more public facing thing, but, for now I’m just going to lift out the section talking about the “pushing the capability frontier” thing.
When I chatted with several anthropic employees at the happy hour a couple months ~year ago, at some point I brought up the “Dustin Moskowitz’s earnest belief was that Anthropic had an explicit policy of not advancing the AI frontier” thing. Some employees have said something like “that was never an explicit commitment. It might have been a thing we were generally trying to do a couple years ago, but that was more like “our de facto strategic priorities at the time”, not “an explicit policy or commitment.”
When I brought it up, the vibe in the discussion-circle was “yeah, that is kinda weird, I don’t know what happened there”, and then the conversation moved on.
I regret that. This is an extremely big deal. I’m disappointed in the other Anthropic folk for shrugging and moving on, and disappointed in myself for letting it happen.
First, recapping the Dustin Moskowitz quote (which FYI I saw personally before it was taken down)
> Well, if Dustin sees no problem in talking about it, and it’s become a major policy concern, then I guess I should disclose that I spent a while talking with Dario back in late October 2022 (ie. pre-RSP in Sept 2023), and we discussed Anthropic’s scaling policy at some length, and I too came away with the same impression everyone else seems to have: that Anthropic’s AI-arms-race policy was to invest heavily in scaling, creating models at or pushing the frontier to do safety research on, but that they would only release access to second-best models & would not ratchet capabilities up, and it would wait for someone else to do so before catching up. So it would not contribute to races but not fall behind and become irrelevant/noncompetitive.
> And Anthropic’s release of Claude-1 and Claude-2 always seemed to match that policy—even if Claude-2 had a larger context window for a long time than any other decent available model, Claude-2 was still substantially weaker than ChatGPT-4. (Recall that the causus belli for Sam Altman trying to fire Helen Toner from the OA board was a passing reference in a co-authored paper to Anthropic not pushing the frontier like OA did.)
I get that y’all have more bits of information than me about what Dario is like. But, some major hypotheses you need to be considering here are a spectrum between:
Dustin Moskowitz and Gwern both interpreted Dario’s claims as more like commitments than Dario meant, and a reasonable bystander would attribute this more to Dustin/Gwern reading too much into it.
Dario communicated poorly, in a way that was maybe understandable, but predictably would leave many people confused.
Dario in fact changed his mind explicitly (making this was more like a broken commitment, and subsequent claims that it was not a broken commitment more like lies)
Dario deliberately phrased things in an openended/confusing way, optimized to be reassuring to a major stakeholder without actually making the commitments that would have backed up that reassurance.
Dario straight up lied to both of them.
Dario is lying to/confusing himself.
This is important because:
a) even option 2 seems pretty bad given the stakes. I might cut many people slack for communicating poorly by accident, but when someone is raising huge amounts of money, building technology that is likely to be very dangerous by default, accidentally misleading a key stakeholder is not something you can just shrug off.
b) if we’re in worlds with options 3, 4 or 5 or 6 (and, really, even option 2), you should be more skeptical of other reassuring things Dario has said. It’s not that important to distinguish between these two because the question isn’t “how good a person is Dario?”, it’s “how should you interpret and trust things Dario says”.
In my last chat with Anthropic employees, people talked about meetings and slack channels where people asked probing, important questions, and Dario didn’t shy away from actually answering them, in a way that felt compelling. But, if Dario is skilled at saying things to smart people with major leverage over him that sound reassuring, but leave them with a false impression, you need to be a lot more skeptical of your-sense-of-having-been-reassured.
in particular, advocating for removing the whistleblower clause, and simulaneously arguing that “we don’t know how to make a good SSP yet, which is why there shouldn’t yet be regulations about how to do it” while also arguing “companies liability for catastrophic harms should be dependent on how good their SSP was.”
I keep checking back here to see if people have responded to this seemingly cut and dry breach of promise by the leadership, but the lack of commentary is somewhat worrying.
Periodically I’ve considered writing a post similar to this. A piece that I think this doesn’t fully dive into is “did Anthropic have a commitment not to push the capability frontier?”.
I had once written a doc aimed at Anthropic employees, during SB 1047 Era, when I had been felt like Anthropic was advocating for changes to the law that were hard to interpret un-cynically.[1] I’ve had a vague intention to rewrite this into a more public facing thing, but, for now I’m just going to lift out the section talking about the “pushing the capability frontier” thing.
in particular, advocating for removing the whistleblower clause, and simulaneously arguing that “we don’t know how to make a good SSP yet, which is why there shouldn’t yet be regulations about how to do it” while also arguing “companies liability for catastrophic harms should be dependent on how good their SSP was.”
I keep checking back here to see if people have responded to this seemingly cut and dry breach of promise by the leadership, but the lack of commentary is somewhat worrying.