As far as I understand, this is a bias similar to the one which has historically caused conventional wars. Unlike Agent-5/Safer-4 and DeepCent-2 from the AI-2027 scenario, who came up with a peace treaty and need only to have the humans accept the treaty’s visible part, real humans are biased towards overestimating the probability of their success and/or towards warfare or competition. Or they may have an utility function with convex parts.
Returning to the example with the AI race, mankind would need to unmantle all of these mechanisms.
First of all, the Anthropic Consensus mocked by Kokotajlo and Greenblatt is that alignment is likely easy for Anthropic-like methods. If this is actually the case, then the AI race between those who care about alignment is just a zero-sum game where each company has to take over as big share of power as possible while avoiding bankrupcy, which in turn requires releasing increasingly impressive results and products (or, in China’s case, releasing home-made products close to the leaders’ capabilities as a defense measure; if DeepCent’s AI was aligned, then the AI-2027 forecast wouldn’t have ended with China being sold out or genocided)
If Anthropic and OpenAI co-lock in 50% of the world’s resources each, then they might implicitly view it as a worse result than having a 49% chance each to take over the world and a 2% chance to destroy the world. Alternatively, coexistence might be implicitly viewed as genuinely impossible.
A special mention goes to the case where Anthropic believes that p(ASI is misaligned|xAI creates it) is close to 100%. Then xAI HAS to be destroyed, put under thorough control to ensure that it doesn’t dare to release a misaligned model or at least outcompeted, even if this means that p(Anthropic’s ASI is misaligned) reaches 50%.
As far as I understand, this is a bias similar to the one which has historically caused conventional wars. Unlike Agent-5/Safer-4 and DeepCent-2 from the AI-2027 scenario, who came up with a peace treaty and need only to have the humans accept the treaty’s visible part, real humans are biased towards overestimating the probability of their success and/or towards warfare or competition. Or they may have an utility function with convex parts.
Returning to the example with the AI race, mankind would need to unmantle all of these mechanisms.
First of all, the Anthropic Consensus mocked by Kokotajlo and Greenblatt is that alignment is likely easy for Anthropic-like methods. If this is actually the case, then the AI race between those who care about alignment is just a zero-sum game where each company has to take over as big share of power as possible while avoiding bankrupcy, which in turn requires releasing increasingly impressive results and products (or, in China’s case, releasing home-made products close to the leaders’ capabilities as a defense measure; if DeepCent’s AI was aligned, then the AI-2027 forecast wouldn’t have ended with China being sold out or genocided)
If Anthropic and OpenAI co-lock in 50% of the world’s resources each, then they might implicitly view it as a worse result than having a 49% chance each to take over the world and a 2% chance to destroy the world. Alternatively, coexistence might be implicitly viewed as genuinely impossible.
A special mention goes to the case where Anthropic believes that p(ASI is misaligned|xAI creates it) is close to 100%. Then xAI HAS to be destroyed, put under thorough control to ensure that it doesn’t dare to release a misaligned model or at least outcompeted, even if this means that p(Anthropic’s ASI is misaligned) reaches 50%.