Much better chances of a good end in worlds where superalignment doesn’t require strong technical philosophy[2] (but I put very low odds on being in this world)
Somewhat better chances of a good end in worlds where superalignment does require strong technical philosophy[3]
There being a number of people who might otherwise not have been willing to work for a scaling lab, or not do so as enthusiastically/effectively (~55% weight)
Encouraging race dynamics (~30%)
Making it less likely that there’s a broad alliance against scaling labs (15%)
Partly counterbalanced by encouraging better infosec practices and being more encouraging of regulation than the alternatives.
They’re trying a bunch of the things which if alignment is easy, might actually work, and no other org has the level of leadership buy in for investing in as hard.
Probably though using AI assisted alignment schemes, but building org competence in doing this kind of research manually so they can direct the systems to the right problems and sort slop from sound ideas is going to need to be a priority.
eh, <5%? More that we might be able to get the AIs to do most of the heavy lifting of figuring this out, but that’s a sliding scale of how much oversight the automated research systems need to not end up in wrong places.
Or I’d put 20% chance on us being in the worlds “where superalignment doesn’t require strong technical philosophy”, that’s maybe not very low.
Overall I think the existance of Anthropic is a mild net positive, and the only lab for which this is true (major in the sense of building frontier models).
“the existence of” meaning, if they shut down today or 2 years ago, it would’ve not increased our chance of survival, maybe lowered it.
I’m also somewhat more optimistic about the research they’re doing helping us in the case where alignment is actually hard.
My current guess as to Anthropic’s effect:
0-8 months shorter timelines[1]
Much better chances of a good end in worlds where superalignment doesn’t require strong technical philosophy[2] (but I put very low odds on being in this world)
Somewhat better chances of a good end in worlds where superalignment does require strong technical philosophy[3]
Shorter due to:
There being a number of people who might otherwise not have been willing to work for a scaling lab, or not do so as enthusiastically/effectively (~55% weight)
Encouraging race dynamics (~30%)
Making it less likely that there’s a broad alliance against scaling labs (15%)
Partly counterbalanced by encouraging better infosec practices and being more encouraging of regulation than the alternatives.
They’re trying a bunch of the things which if alignment is easy, might actually work, and no other org has the level of leadership buy in for investing in as hard.
Probably though using AI assisted alignment schemes, but building org competence in doing this kind of research manually so they can direct the systems to the right problems and sort slop from sound ideas is going to need to be a priority.
how low?
eh, <5%? More that we might be able to get the AIs to do most of the heavy lifting of figuring this out, but that’s a sliding scale of how much oversight the automated research systems need to not end up in wrong places.
I basically agree with this.
Or I’d put 20% chance on us being in the worlds “where superalignment doesn’t require strong technical philosophy”, that’s maybe not very low.
Overall I think the existance of Anthropic is a mild net positive, and the only lab for which this is true (major in the sense of building frontier models).
“the existence of” meaning, if they shut down today or 2 years ago, it would’ve not increased our chance of survival, maybe lowered it.
I’m also somewhat more optimistic about the research they’re doing helping us in the case where alignment is actually hard.