Alignment Risk Doesn’t Require Superintelligence

JustisMills15 Jun 2022 3:12 UTC

35 points

Outsize destructive coordination is rare

I was 9 years old when 9/11 happened. My gut reaction was, basically, “wait, doesn’t this sort of thing happen all the time?” I don’t mean to suggest I was wise or world-weary or anything. I was, in fact, mistaken. But I had an ambient view that small groups of human beings wreaked massive destruction on each other all the time, randomly. It certainly seemed to me, a kid with a tendency towards worry, that they could. But no. Events like 9/11, where a small number of people have a gigantic destructive impact, are pretty rare.

If this mindset is new to you, I strongly recommend Gwern’s On The Absence Of True Fanatics. It’s the best treatment I’ve read of the idea that truly dedicated human beings, if we set our mind to it, can cause a huge amount of harm.

Indeed, it wouldn’t surprise me if 50 perfectly coordinated 16-year-olds of top 1% (but not top 0.1%) intelligence, with total trust in each other and a shared monomaniacal mission to bring down human civilization within their lifetimes, could succeed. Not that I think it would be a sure thing; plenty of opportunity for them to get noticed or caught, or to make a planning error and get themselves killed by accident. But I can think of ways, including ones that aren’t particularly esoteric.

A quick, unpleasant sketch

I don’t like dwelling on awful hypotheticals, so I’ll trace just one possibility out; to be clear, I do not think this is anywhere near the best strategy for such a group, but I don’t like writing or encouraging thinking about better ones.

The 50 teenagers could coordinate to get a few of them as politically advanced as possible. Ideally they’d get one to be President of the US. That sounds very hard, but remember that in this hypothetical we’ve granted the 50 teenagers perfect coordination and monomania, two superpowers. Normal humans want other things, like to be cool, have sex, and eat tasty food. They are vulnerable to scandals. But our teens could pick 10 of them, maybe the most charismatic, to go hard in politics, and the other 40 could do whatever careers would best support them.

If you get the presidency, plus a few key positions nearby, you slowly flush out non-confederates and then trigger a nuclear war. Pretty bad! My gut reaction is that our group could have a maybe 5% chance of success. But even if I’m totally wrong, I bet you can imagine other ways 50 perfectly dedicated and reasonably intelligent teens could wreak havoc.

Back to reality

Fortunately for us, human teens don’t work that way. Humans in general don’t work that way. Very few of us want to cause mass destruction, and even fewer of us have perfect willpower. There are lots of safeguards in the human brain from even really considering actions that would be extremely bad for everybody. As an academic exercise, sure, maybe. But for real? It just doesn’t happen very often. The evidence for this is that 9/11s don’t happen all the time—people either aren’t that imaginative, aren’t that destructive, or aren’t that effective, or at least the same people are very rarely in the center of that Venn diagram.

Unfortunately, we don’t have any assurance that other sorts of intelligent minds will share this property. Whatever combination of evolutionarily-developed traits cause human beings to just not do 9/11s that often, probably won’t be present for AI. Not to say AI will necessarily want to do bad things. Just that by default, there aren’t safeguards.

I know the idea that AI might be misaligned is pretty old news for this crowd. But I think it’s relevant to note that just like x-risk doesn’t require longtermism, catastrophic misalignment doesn’t require superintelligence. Ordinary intelligence will do. No foom necessary, no minds that are to us as we are to lions. Just smart inhuman teenagers with no safeguards built in, and we could be in a significant amount of trouble.

JustisMills15 Jun 2022 3:12 UTC

35 points

4 comments2 min readLW link

AI Risk AI

burmesetheater 15 Jun 2022 7:07 UTC
7 points
0
Destructive alignment issues in our species are more mundane. Several leaders in the 20th century killed outright very large numbers of people for completely banal reasons like political ambition. Actually, your intuition that 9/11 events happen “all the time” is only off in a temporal sense; the number of humans unambiguously killed by the coordinated actions of relatively few other unaligned humans in the last 100 years is so great that it is probably enough to have at least one 9/11 a day during that time. Humans are generally unaligned on several levels from personal to egregoric and the only reason this is lately becoming a problem in a species-risk sense is because only now are we getting some powerful technology. A more probable version of the scenario in this post is a suicidal leader triggering a large scale nuclear war through use of their own arsenal either through deception or after taking steps to reduce the possibility of refusal. Of course it would be a great irony if now that global thermonuclear war is actually tested, the opposing forces are unable to make use of their deterrent.
- Kayden 15 Jun 2022 17:06 UTC
  3 points
  0
  Parent
  Agreed. Look at the wars of just the past 100 years, the Spanish flu, and the damage caused by ignorant statements of a few famous or powerful people during the COVID-19 pandemic. We start to see a picture where a handful of people are capable of causing a large amount of damage, even if they didn’t anticipate it. If they set their mind to it, as probably with the Ukraine war at the moment, then the amount of destruction is very asymmetrically proportioned to the number of people responsible for it.
Shmi 15 Jun 2022 4:23 UTC
5 points
0
I agree that the AI x-risk gets high way earlier than “when AGI fooms and becomes superhuman”. I suspect there is a phase transition where scaling up without extra high intelligence or self-modification makes it possible for the machine to completely overwhelm humans.
The good news is, it is possible to do meaningful research in that area, and to build the safeguards, and to have multiple attempts, unlike in the doomed one-shot superintelligent AI alignment case.
Dagon 15 Jun 2022 18:33 UTC
2 points
0
Humans have a lot of advantages in this, which an AI may need to be significantly smarter than top 1% (maybe still in human range, maybe not) to succeed without.
Primarily, these humans have camouflage for a whole lot of their lifetimes—they can easily hide their differences from normal (maybe; monomania may be incompatible with many kinds of success) until the treacherous turn. Many avenues of destruction by getting formal power over other humans would be unavailable.
I don’t know of a good model for how individual intelligences add up to a greater optimization capability, mediated by just how consistent their goals are both within and among them. Your point is very valid, though, that we underestimate just how powerful monomania is—not having human/evolved drives and distractions may be a significant boost in ability to impact.