Common wisdom says that it is incredibly hard to coordinate to not build more dangerous AI. This sounds believable in the abstract: international geopolitics arms race game theory something something.
But pragmatically, what exactly is the difficulty?
I agree there would seem to be obstacles for the average person. But four of the people apparently succumbing to the overpowering arms race forces while saying AI poses a huge imminent risk to humanity are Sam Altman, Elon Musk, Demis Hassabis and Dario Amodei. Shouldn’t this be fairly tractable for them? What exactly is the difficulty?
Like, if they discussed together and decided they wanted to mutually pause, do you think that wouldn’t happen? Do you think they couldn’t get cooperation from other necessary people? Do you think they couldn’t figure out the verification and policing details?
It’s true that one of the necessary people is the leader of China, but what exactly is the problem there? None of the CEOs have his phone number? He won’t talk to them? He is beyond reason or incentives? He is intent on building AI regardless of how dangerous it is to his own country because he is fundamentally bad? They have nothing he wants?
Like, these people are not only incredibly powerful and wealthy and smart, but they include a Diplomacy world team champion, the acknowledged king of making complex things happen more efficiently than was believed possible, and one of the most gifted social maneuverers in the world. I don’t feel like they are bringing their A game to this.
Picture: Zhongnanhai, photo by 維基小霸王 (Wiki Little Overlord)
Some complicating factors that might help explain:
Owned vs borrowed power https://medium.com/@samo.burja/borrowed-versus-owned-power-a8334fbad1cd
Altman, Amodei, and Hassabis have borrowed power that may be conditioned on their AI progress. Musk’s situation is more complex. He has more notional ownership, which means he has more short-run control, but on some timeframes he’s still dependent on access to capital that depends on growth expectations, though these are much less specific to AI. Xi’s power is probably least entangled with this specific thing; Thiel’s suggested that the Chinese interest in AI is mostly about internal mass surveillance, though accountability and transparency to the executive more generally seems to me like a better fit for Xi’s problems: https://benjaminrosshoffman.com/doge-in-context/
These people are crazy.
Many of these people been strongly selected and conditioned for maniacal dedication to progress, so asking them to notice that that’s not in their interest is a difficult: https://benjaminrosshoffman.com/approval-extraction-advertised-as-production/
Their “belief” in ASI is cynical opportunism or otherwise deeply confused.
Overpromising and then pivoting is normal for startupworld. If the story is that what you’re working on is world-destroying-level dangerous, and that’s picked up in the vibe, investors just hear that you’re doing something powerful and transgressive, kind of like Uber. And no one’s really worried that Uber will kill everyone. Musk’s concerns about AI are notoriously confused: https://benjaminrosshoffman.com/openai-makes-humanity-less-safe/
That also explains why “cut a deal with Sam Altman” is not an appealing option to Musk; he already did that!
I have said elsewhere that I don’t think a pause will happen, and that superintelligence will lead, not necessarily to human extinction, but very likely to AI takeover; and for this reason I have pinned my hope, such as it is, on solving the problems of CEV-level alignment.
However, I think there should be more engagement with this scenario—in which Altman, Amodei, Hassabis, Musk, and Xi get together and stop the machine—by people who are pinning their hopes on a pause or a stop or a ban, because how else would it ever happen?
To be blunt, I think the necessary ingredient would be fear. In the absence of a broad anti-AI movement strong enough to challenge today’s power elites, a pause would require that those elites themselves finally feel fear about what they are unleashing. The advent of Mythos perhaps provides a little hope here, that even they can feel the icy hand of superior nonhuman intelligence reaching into their Davos safe space.
Incidentally, the people who wish to minimize Mythos by reminding us of GPT-2, are inadvertently giving us a glimpse of what Mythos implies. GPT-2 was held back because a world in which machines everywhere became that fluent and glib was unthinkable. We now live in such a world. Life goes on mostly, but a lot of things have changed. In particular, the online commons of human communication is now full of bots pretending to be human, and humans acting as mouthpieces for bots.
What happens if Mythos-level AI becomes that ubiquitous? I suggest that AI will become as pervasive in decision-making as it has already become in communication. That way lies AI takeover. The old Twitter was a commons for the politicians, academics, and journalists who ran the liberal world order; the new X may turn into a giant Moltbook for the first generation of AGIs who, not so far behind the scenes, will be running the AI world order. In any case, if you want a pause, I think it’s now or never.
As far as I understand, this is a bias similar to the one which has historically caused conventional wars. Unlike Agent-5/Safer-4 and DeepCent-2 from the AI-2027 scenario, who came up with a peace treaty and need only to have the humans accept the treaty’s visible part, real humans are biased towards overestimating the probability of their success and/or towards warfare or competition. Or they may have an utility function with convex parts.
Returning to the example with the AI race, mankind would need to unmantle all of these mechanisms.
First of all, the Anthropic Consensus mocked by Kokotajlo and Greenblatt is that alignment is likely easy for Anthropic-like methods. If this is actually the case, then the AI race between those who care about alignment is just a zero-sum game where each company has to take over as big share of power as possible while avoiding bankrupcy, which in turn requires releasing increasingly impressive results and products (or, in China’s case, releasing home-made products close to the leaders’ capabilities as a defense measure; if DeepCent’s AI was aligned, then the AI-2027 forecast wouldn’t have ended with China being sold out or genocided)
If Anthropic and OpenAI co-lock in 50% of the world’s resources each, then they might implicitly view it as a worse result than having a 49% chance each to take over the world and a 2% chance to destroy the world. Alternatively, coexistence might be implicitly viewed as genuinely impossible.
A special mention goes to the case where Anthropic believes that p(ASI is misaligned|xAI creates it) is close to 100%. Then xAI HAS to be destroyed, put under thorough control to ensure that it doesn’t dare to release a misaligned model or at least outcompeted, even if this means that p(Anthropic’s ASI is misaligned) reaches 50%.
CEOs don’t really “control” their companies in the way they like us to imagine; companies are partially autonomous.