Relitigating the Race to Build Friendly AI

Recently I’ve been relitigating some of my old debates with Eliezer, to right the historical wrongs. Err, I mean to improve the AI x-risk community’s strategic stance. (Relevant to my recent theme of humans being bad at strategy—why didn’t I do this sooner?)

Of course the most central old debate was over whether MIRI’s circa 2013 plan, to build a world-altering Friendly AI[1], was a good one. If someone were to defend it today, I imagine their main argument would be that back then, there was no way to know how hard solving Friendliness/​alignment would be, so it was worth a try in case it turned out to be easy. This may seem plausible because new evidence about the technical difficulty of alignment was the main reason MIRI pivoted away from their plan, but I want to argue that actually even without this information, there were good enough arguments back then to conclude that the plan was bad:

  1. MIRI was rolling their own metaethics (deploying novel or controversial philosophy) which is not a good idea even if alignment turned out to be not that hard in a narrow technical sense.

  2. The plan was very risky given the possibility of illegible safety problems. What were the chances that a small team would be able to find and make legible all of the relevant problems in time? Even if alignment was actually easy and had no hidden traps, there was no way that a small team could reach high enough justified confidence in this to justify pushing the “launch” button, making the plan either pointless (if the team was rational/​cautious enough to ultimately not push the button), or reckless (if the team would have pushed the button anyway).

  3. If otherwise successful, the plan would have caused a small group to have world-altering (or world-destroying) power, somewhere along the way.

  4. Most of the world would not have trusted MIRI (or any similar group) to do this, if they were informed, so MIRI would have had to break some widely held ethical constraints. (This is the same argument behind the current Statement on Superintelligence, that nobody should be building SI without “1. broad scientific consensus that it will be done safely and controllably, and 2. strong public buy-in.”)

  5. It predictably inspired others (such as DeepMind and OpenAI) to join the race[2], and made it very difficult for voices calling for AI pause/​stop to attract attention and resources.

(The main rhetorical innovation in my current arguments that wasn’t available back then is the concept of “illegible safety problems”, but the general idea that there could be hidden traps that a small team could easily miss had been brought up, or should have been obvious to MIRI and the nearby community.)

Many of these arguments are still relevant today, considering the plans of the remaining and new race participants, but are not well known due to historical reasons (i.e., MIRI and its supporters argued against them to defend MIRI’s plan, so they were never established as part of the LW consensus or rhetorical toolkit). This post is in part an effort to correct this, and help shift the rhetorical strategy away from putting everything on technical alignment difficulty.


(This post was pulled back into draft, in order to find more supporting evidence for my claims, which also gave me a chance to articulate some further thoughts.)

My regular hobby horse in recent years has been how terrible humans are at making philosophical progress relative to our ability to innovate in technology, how terrible AIs may also be at this (or even worse, in the same relative sense), and how this greatly contributes to x- and s-risks. But I’ve recently come to realize (or pay more attention to) how terrible we also are at strategic thinking, and how terrible AIs may also be at this (in a similar relative sense), which may be an even greater contribution to x- and s-risks.[3]

(To spell this out more, if MIRI’s plan was in fact a bad one, even from our past perspective, why didn’t more people argue against it? Weren’t there anyone whose day jobs were to think strategically about how humanity should navigate complex and highly consequential future technologies/​events like the AI transition, and if so why weren’t they trying to talk Eliezer/​MIRI out of what they were planning? Either way, if you were observing this course of history in an alien species, how would you judge their strategic competence and chances of successfully navigating such events?)

A potential implication from all of this is that improving AI strategic competence (relative to their technological abilities) may be of paramount importance (so that they can help us with strategic thinking and/​or avoid making disastrous mistakes of their own), but this is clearly even more of a double-edged blade than AI philosophical competence. Improving human strategic thinking is more robustly good, but suffers from the same lack of obvious tractability as improving human philosophical competence. Perhaps the conclusion remains the same as it was 12 years ago: we should be trying to pause or slow down the AI transition to buy time to figure all this out.

  1. ^

    This was edited from “to build a Friendly AI to take over the world in service of reducing x-risks” after discussion with @habryka and @jessicata. Jessica also found this passage to support this claim: “MIRI co-founder Eliezer Yudkowsky usually talks about MIRI in particular — or at least, a functional equivalent — creating Friendly AI.” (Interestingly, what was common knowledge on LW just 12 years ago now requires hard-to-find evidence to establish.)

  2. ^

    According to the linked article, Shane Legg was introduced to the idea of AGI through a 2000 talk by Eliezer, and then co-founded DM in 2010 (following an introduction by Eliezer to investor Peter Thiel, which is historically interesting, especially as to Eliezer’s motivations for doing so, which I’ve been unable to find online). I started arguing against SIAI/​MIRI’s plan to build FAI in 2004: “Perhaps it can do more good by putting more resources into highlighting
    the dangers of unsafe AI, and to explore other approaches to the
    Singularity, for example studying human cognition and planning how to do
    IA (intelligence amplification) once the requisite technologies become
    available.”

  3. ^

    If we’re bad at philosophy but good at strategy, we can do things like realize the possibility of illegible x-risks (including ones caused by philosophical errors), and decide to stop or slow down the development of risky technologies on this basis. If we’re good at philosophy but bad at strategy, we might avoid making catastrophic philosophical errors but still commit all kinds of strategic errors in the course of making highly consequential decisions.