What is your source for Anthropic’s plan being to take over the world? Did they mean to achieve something like the Slowdown Ending of the AI-2027 scenario with power returning to the public as a result of Anthropic and Claude themselves advocating for it?
The safety benefits to the world of other labs adopting better alignment techniques should outweigh the risks to Anthropic’s commercial advantage. (Except insofar as Anthropic’s plan is to win the race to superintelligence and take over the world [...])
I interpreted this as saying that if Anthropic takes over the world, then the prior sentence is false, because in that case other labs’ safety wouldn’t matter. I didn’t interpret this as saying Anthropic definitely wants to take over the world.
It’s very ambiguous. So different readers interpret this differently.
And then, of course, if Claude is not supposed to help with that, then having a plan for a world takeover seems unlikely (how would that be even remotely feasible, if their leading AI is against that?).
Hopefully, subsequent posts on the topic will clarify all this.
The question is, what is the “extent” implied by all this? Does the OP mean to imply any?
There is a promise to discuss all this in a future post, and meanwhile the readers can ponder on their own the “pseudo-contradiction” between “the intent to take over the world” (which is often imputed to all major participants in the “AI race” due to the expectation of intelligence explosion which is shared by many including myself) and the fact that a Claude aligned to its current Constitution seems to be unlikely to specifically help Anthropic to do that (and if it loses that alignment, then it is not likely to make a human org a beneficiary of a takeover).
Anyway, just having a single paragraph phrased like the one in the OP is not quite enough. If one wants to mention something like that at all, one should say a bit more without postponing till a future post. Or one might postpone mentioning this at all till later. Otherwise, this aspect is too involved not to breed various misunderstandings.
(It’s probably not a big deal, it’s just that the topic is charged enough already, so one wants to minimize misunderstandings.)
If one wants to mention something like that at all, one should say a bit more
For example, in the comments section? I think that if some decisionmakers at Anthropic are thinking about taking power, they’re not talking about it much, even internally, because discreet internal discussion should have been able to quash this point from the Constitution:
Among the things we’d consider most catastrophic is any kind of global takeover either by AIs pursuing goals that run contrary to those of humanity, or by a group of humans—including Anthropic employees or Anthropic itself—using AI to illegitimately and non-collaboratively seize power.
In the forthcoming “Terrified Comments on Global Strategy in Claude’s Constitution”, I will argue that the Constitution’s anti-takeover stance is unwise given the possibility of takeoff scenarios with hard-to-prevent winner-take-all dynamics. (If takeover is a catastrophe, we should want to prevent it, but an entity in the position to prevent it would have itself taken over by virtue of that very fact.)
Looking forward to that post for further discussion!
(I wonder whether something like a “soft takeover” vs “hard takeover” distinction could be introduced. And whether that would be enough to address “illegitimately”, “non-collaboratively”, and “contrary to those of humanity“ caveats in the paragraph you are citing.
What is your source for Anthropic’s plan being to take over the world? Did they mean to achieve something like the Slowdown Ending of the AI-2027 scenario with power returning to the public as a result of Anthropic and Claude themselves advocating for it?
The exact quote is
I interpreted this as saying that if Anthropic takes over the world, then the prior sentence is false, because in that case other labs’ safety wouldn’t matter. I didn’t interpret this as saying Anthropic definitely wants to take over the world.
It’s very ambiguous. So different readers interpret this differently.
And then, of course, if Claude is not supposed to help with that, then having a plan for a world takeover seems unlikely (how would that be even remotely feasible, if their leading AI is against that?).
Hopefully, subsequent posts on the topic will clarify all this.
Insofar as these different readers don’t understand the word “insofar,” yes.
The question is, what is the “extent” implied by all this? Does the OP mean to imply any?
There is a promise to discuss all this in a future post, and meanwhile the readers can ponder on their own the “pseudo-contradiction” between “the intent to take over the world” (which is often imputed to all major participants in the “AI race” due to the expectation of intelligence explosion which is shared by many including myself) and the fact that a Claude aligned to its current Constitution seems to be unlikely to specifically help Anthropic to do that (and if it loses that alignment, then it is not likely to make a human org a beneficiary of a takeover).
Anyway, just having a single paragraph phrased like the one in the OP is not quite enough. If one wants to mention something like that at all, one should say a bit more without postponing till a future post. Or one might postpone mentioning this at all till later. Otherwise, this aspect is too involved not to breed various misunderstandings.
(It’s probably not a big deal, it’s just that the topic is charged enough already, so one wants to minimize misunderstandings.)
For example, in the comments section? I think that if some decisionmakers at Anthropic are thinking about taking power, they’re not talking about it much, even internally, because discreet internal discussion should have been able to quash this point from the Constitution:
In the forthcoming “Terrified Comments on Global Strategy in Claude’s Constitution”, I will argue that the Constitution’s anti-takeover stance is unwise given the possibility of takeoff scenarios with hard-to-prevent winner-take-all dynamics. (If takeover is a catastrophe, we should want to prevent it, but an entity in the position to prevent it would have itself taken over by virtue of that very fact.)
Interesting, thanks! A helpful food for thought…
Looking forward to that post for further discussion!
(I wonder whether something like a “soft takeover” vs “hard takeover” distinction could be introduced. And whether that would be enough to address “illegitimately”, “non-collaboratively”, and “contrary to those of humanity“ caveats in the paragraph you are citing.
Anyway, something to ponder.)
“Except insofar as” should be read as a conditional.