Alignment Risk Doesn’t Require Superintelligence

Outsize destructive coordination is rare

I was 9 years old when 9/​11 happened. My gut reaction was, basically, “wait, doesn’t this sort of thing happen all the time?” I don’t mean to suggest I was wise or world-weary or anything. I was, in fact, mistaken. But I had an ambient view that small groups of human beings wreaked massive destruction on each other all the time, randomly. It certainly seemed to me, a kid with a tendency towards worry, that they could. But no. Events like 9/​11, where a small number of people have a gigantic destructive impact, are pretty rare.

If this mindset is new to you, I strongly recommend Gwern’s On The Absence Of True Fanatics. It’s the best treatment I’ve read of the idea that truly dedicated human beings, if we set our mind to it, can cause a huge amount of harm.

Indeed, it wouldn’t surprise me if 50 perfectly coordinated 16-year-olds of top 1% (but not top 0.1%) intelligence, with total trust in each other and a shared monomaniacal mission to bring down human civilization within their lifetimes, could succeed. Not that I think it would be a sure thing; plenty of opportunity for them to get noticed or caught, or to make a planning error and get themselves killed by accident. But I can think of ways, including ones that aren’t particularly esoteric.

A quick, unpleasant sketch

I don’t like dwelling on awful hypotheticals, so I’ll trace just one possibility out; to be clear, I do not think this is anywhere near the best strategy for such a group, but I don’t like writing or encouraging thinking about better ones.

The 50 teenagers could coordinate to get a few of them as politically advanced as possible. Ideally they’d get one to be President of the US. That sounds very hard, but remember that in this hypothetical we’ve granted the 50 teenagers perfect coordination and monomania, two superpowers. Normal humans want other things, like to be cool, have sex, and eat tasty food. They are vulnerable to scandals. But our teens could pick 10 of them, maybe the most charismatic, to go hard in politics, and the other 40 could do whatever careers would best support them.

If you get the presidency, plus a few key positions nearby, you slowly flush out non-confederates and then trigger a nuclear war. Pretty bad! My gut reaction is that our group could have a maybe 5% chance of success. But even if I’m totally wrong, I bet you can imagine other ways 50 perfectly dedicated and reasonably intelligent teens could wreak havoc.

Back to reality

Fortunately for us, human teens don’t work that way. Humans in general don’t work that way. Very few of us want to cause mass destruction, and even fewer of us have perfect willpower. There are lots of safeguards in the human brain from even really considering actions that would be extremely bad for everybody. As an academic exercise, sure, maybe. But for real? It just doesn’t happen very often. The evidence for this is that 9/​11s don’t happen all the time—people either aren’t that imaginative, aren’t that destructive, or aren’t that effective, or at least the same people are very rarely in the center of that Venn diagram.

Unfortunately, we don’t have any assurance that other sorts of intelligent minds will share this property. Whatever combination of evolutionarily-developed traits cause human beings to just not do 9/​11s that often, probably won’t be present for AI. Not to say AI will necessarily want to do bad things. Just that by default, there aren’t safeguards.

I know the idea that AI might be misaligned is pretty old news for this crowd. But I think it’s relevant to note that just like x-risk doesn’t require longtermism, catastrophic misalignment doesn’t require superintelligence. Ordinary intelligence will do. No foom necessary, no minds that are to us as we are to lions. Just smart inhuman teenagers with no safeguards built in, and we could be in a significant amount of trouble.