A better analogy and example for teaching AI takeover: the ML Inferno
Epistemic Status: entirely speculative
Note: this article was assisted by ChatGPT
A very common example given for AI takeover is silently killing all humans with nanotech.
The concrete example I usually use here is nanotech, because there’s been pretty detailed analysis of what definitely look like physically attainable lower bounds on what should be possible with nanotech, and those lower bounds are sufficient to carry the point.
I will explain the advantages shortly, but I think this is a suboptimal example to use when introducing someone to the idea of AI existential risk.
The reason I bring this up is I notice that Yudkowsky used this example of the bankless podcast, and numerous people use it on Twitter.
The main drawback is that, in my internal theory of mind, this causes people to pattern match the AI to “High IQ evil bioterrorist scientists”. The most problematic bit is that this is being pattern matched to human evil, which leads to many misunderstandings.
Instead, I will propose a different scenario. I’m not quite sure what to call it, but I think ML Inferno or Optimizer Firestorm would do. The basic idea is that strong AI is like a fire. Just like a fire burns fuel, an AI exploits opportunities. And it won’t just exploit them one at a time; in this scenario it exploits every opportunity as soon as possible.
If controlled, this becomes very useful. Otherwise, it becomes a dangerous complex system. And just as a human body becomes fuel when faced with a hot enough fire, psychological and societal weakness becomes an opportunity when faced with a strong enough AI.
Here is a table for the analogy:
Strengths of the Silent Nanotech Example
The strength of the nanotech example is game theoretic. Suppose someone says “I have a way to defeat the evil AI, that I will use once I learn it is released”. This is a losing strategy because the AI has one to defeat it; kill all humans silently. This argument does not claim the AI will use this strategy; it just claims that since it’s an option, the wait and see strategy is a loser.
From the same article:
Losing a conflict with a high-powered cognitive system looks at least as deadly as “everybody on the face of the Earth suddenly falls over dead within the same second”.
This example is well optimized for “argue with high probability that the wait-and-see strategy loses” by carefully avoiding the conjunction Fallacy (by assuming only one or two specific capabilities, the argument is stronger than one that directly or indirectly assumes many specific capabilities).
However, I think from a pedagogical point of view, it is a bad introduction to AI x-risk because it’s also a very human seeming strategy.
The Tribe of the Dry Forest Parable
Skip this section if you don’t like parables.
In a dry forest, a tribe of cave people lived. One day, Cave-Alan Turing discovered how to make a small fire, and the tribe was amazed. They realized that they could make bigger and more awesome fires and decided to devote their free time to it.
Cave-Yudkowsky, while also excited about fire, was concerned about the risks it posed to their environment. “If we aren’t careful,” he warned, “we will destroy the dry forest we live in.”
But the Cave-AI skeptics dismissed his concerns. They believed that the fire could only ignite small things like tinder, and it wasn’t hot enough to catch a tree on fire. They thought that Cave-Yudkowsky’s fears were baseless.
Cave-Yudkowsky explained that the fire could spread to the entire forest, destroying their homes and potentially harming them. He was more concerned about this than the issue of curious people touching the fire. However, the Cave-AI skeptics continued to brush off his worries and urged the tribe to focus on preventing people from touching the fire instead.
Later, Cave-Altman announced that they had discovered how to transfer fire from tinder to logs. The Cave-AI skeptics remained skeptical and warned the tribe to stay away from the fire. They had received reports of people getting burned without even touching it.
Cave-Yudkowsky also agreed that safety was important, but he was more concerned that the tribe could destroy itself with a bad fire. He pointed out that a giant fire could generate many sparks that were as hot as the original fire, which could be dangerous.
Cave-Altman reassured them that they were working on special gloves to keep people safe from the fire. He also believed that the tribe could control a larger fire by using a series of smaller fires, similar to the concept of controlled burns proposed by Cave-Paul Christiano.
Cave-Yudkowsky expressed his concern that this was a butchered version of Cave-Christiano’s idea and that they needed more careful planning. However, Cave-Altman and the rest of the tribe were excited about the prospect of a great fire and decided to move forward with their plan.
Cave-Yudkowsky was not happy with this decision and let out a cry of despair.
The remainder of the parable is left as an exercise for the reader.
The ML Inferno Scenario
An AI tries to exploit opportunities to achieve its goals. If an AI becomes sufficiently intelligent, the AI will naturally start to exploit opportunities that we don’t necessarily want it to. AI tends to exploit them very quickly, having no trouble “multitasking”. See for example Facebook’s diplomacy AI, which would negotiates with all other players at once.
The AI does this because it’s an optimizer, and leaving an opportunity unexploited when it would lead to a better solution is not optimal. This is regardless of how many other opportunities it is exploiting. (It’s important that people internalize what optimization algorithms do.)
Imagine a human ML researcher correcting flaws in AI. Now, picture the AI correcting all the flaws simultaneously.
Consider a hacker exploiting a security vulnerability. Now, envision the AI exploiting the same vulnerability globally. Finally, imagine the AI exploiting all security vulnerabilities simultaneously.
Imagine a hedge fund or investor, like RenTech, exploiting a mispriced stock market. Now, visualize an AI exploiting all mispricings at once. Finally, imagine the AI exploiting all markets simultaneously.
Imagine a scammer tricking someone, for instance, by pretending to be a family member, bank employee, or romantic partner. Now, think of the AI exploiting all these targets at once.
Imagine a state actor taking over an autonomous vehicle or hacking critical infrastructure. Now, imagine the AI hacking all of them at once.
Imagine a corrupt politician or businessman getting bribed. Now, imagine the AI bribing all of them at once.
Imagine a bioterrorist releasing a deadly bacteria. Now imagine an AI releasing every deadly disease at once.
Imagine a pyromaniac lighting a fire. Now think of an AI lighting all of them at once.
Imagine an AI exploiting any one of these opportunities. Imagine an AI exploiting them one after the other. Now imagine an AI exploiting all of these, and others like them, simultaneously.
The bolded bit is what I’m calling the ML Inferno scenario.
I think this is better pedagogically because this presents an AI like a natural disaster. When you imagine the ML Inferno scenario, you don’t think “wow, the AI sure hates us”. You think “wow, this AI doesn’t even seem like an intelligent person, it seems like a natural disaster”.
Fire is an apt analogy because:
Fire (specifically wild fire) is an example of a complex system. Specifically it is an example of self-organized criticality. Given that other examples include the evolution of proteins, the human brain, and neural networks, it seems plausible that the spread of an uncontrollable AI would also be an example of self-organized criticality.
People have intuition for why large fires are so difficult to control. You have to worry about extremely hot sparks escaping and spontaneous combustion from the heat. And it only takes one reckless person to let the fire get loose. That’s why you shouldn’t play with fire.
Small fires and big fires are qualitatively different.
Like an AI, fire also “exploits” it’s environment in an indifferent way. Fire will combust anything that gets hot enough, including humans and buildings. This is true even if started by a human; it’s simply a law of fire. The only way not to get burned is to control the heat.
A large fire is basically impossible to defeat without specialized tools, regardless of how clever if of a human you are. We don’t have those tools for AI yet.
I think Yudkowsky’s example of the diamondoid bacteria is strictly more probable (because conjunction fallacy) mine is more “representative”. If you were explaining how a six sided dice works to someone, you shouldn’t give an example of rolling 1 a hundred times in a row, even though it’s technically as likely as any other sequence of a hundred rolls.
Importantly, I think this example (and accompanying analogy) is much better pedagogically as well.