AGI deployment as an act of aggression

I started a draft of this post some days ago, but then a lot of things happened and so I’m rewriting it from scratch. Most importantly, a TIME editorial in which Eliezer Yudkowski talks of bombing AI training data centres happened, which pretty much both makes AI doom discourse fairly mainstream and throws a giant rock through the Overton window on the topic. This has elicited some ridicule and enough worry that a few clarifications may be needed.

A lot of the discourse here focuses usually on alignment: is it easy, is it hard, what happens if we can’t achieve it, and how do we achieve it. I want to make a broader point that I feel like might not have received as much focus. Essentially, my thesis is that from the viewpoint of the average person, developing and deploying agentic AGI at all might be viewed as a hostile act. I think that the AI industry may be liable of being regulated into economic unviability and/​or that MAD-like equilibria as the one Eliezer suggested might form not because everyone is scared of unaligned AGI, but because everyone is almost equally scared of aligned AGI, and not for no reason. As such, I think that perhaps the sanest position to take in the upcoming public debate as AI-safety-minded people is “We should just not build AGI (and rather focus on more specialised, interpretable, non-agentic AI tools that merely empower humans, but leave the power to define their goals always firmly into our hands)”. In other words, I think at this stage of things powerful friendly AGI is simply a mirage that holds us back from supporting the solutions that would have the best chance of success and makes us look hostile or untrustworthy to a large part of the public, including potential allies.

The fundamental steps for my thesis are:

  1. building AGI probably comes with a non-trivial existential risk. This, in itself, is enough for most to consider it an act of aggression;

  2. even if the powerful AGI is aligned, there are many scenarios in which its mere existence transforms the world in ways that most people don’t desire or agree with; whatever value system it encodes gets an immense boost and essentially Wins Culture; very basic evidence from history suggests that people don’t like that;

  3. as a result of this, lots of people (and institutions, and countries, possibly of the sort with nukes) might turn out to be willing to resort to rather extreme measures to prevent an aligned AGI take off, simply because it’s not aligned with their values.

Note that actually 2 and 3 can be valid even if for whatever reason AGI doesn’t trigger a take off that leads to intelligence explosion and ASI. The stakes are less extreme in that case but there are still lots of potentially very undesirable outcomes which might trigger instability and violent attempts to prevent its rise.

I’ll go through the steps one by one in more detail.

Non-aligned AGI is bad

First comes, obviously, the existential risk. This one I think is pretty straightforward. If you want to risk your life on some cockamamie bet that will make you a fortune if you win, go ahead. If you want to also risk my life on the same bet that will make you a fortune, we may need to have words. I think that is a pretty sound principle that even the most open-minded people on the planet would agree with. There’s a name for what happens when the costs of your enterprise fall on other people, even just on expectation: we call them negative externalities.

“But if I won the bet, you would benefit too!,” you could say. “Aligned AGI would make your life so much better!”. But that doesn’t fly either. First, if you’re a for-profit company trying to build AGI it still seems like even success will benefit you far more than me. But more importantly, it’s just not a good way to behave in general. I wear glasses; I am short-sighted. If you grabbed me by force in a dark alley, drugged me, then gave me LASIK while I am unconscious, I wouldn’t exactly be happy with it as long as it turned out fine. What if the operation went badly? What if you overdosed me with the anaesthetic and killed me? There are obvious reasons why this kind of pseudo-utilitarian thinking doesn’t work, mainly that however positive the outcomes on my material well-being, simply by doing that you have taken away my ability to choose for myself, and that is in itself a harm you visited upon me. Whether things go well or badly doesn’t change that.

If you bet on something that could cause the destruction of the world, you are betting the lives of every single living being on this planet. Every old man, every child, every animal. Everyone who never even heard of AI or owned a computer, everyone who never asked you to do this nor consented to it but was thrown on the plate as wager regardless. You are also risking the destruction of human heritage, of the biosphere and of its potential to ever spawn intelligent life again, all things that many agree have intrinsic value above and beyond that of even our own personal survival (if I had to die, I’d rather do so knowing the rest of humanity will live; if I had to die along with the rest of humanity, I’d rather do so knowing that at least maybe one day something else will look at the ruins we left behind and wonder, and maybe think of us). That is one mighty hefty bet.

But I am no deontologist, and I can imagine that perhaps the odds of extinction are low enough, and the benefits of winning the bet so spectacular, that maybe you could make a case that they offset that one harm (and it’s a big harm!) and make it at best a necessary evil. Unfortunately, I really don’t think that’s the case, because...

Aligned AGI is not necessarily that good either

If you want to change the world, your best bet is probably to invent something useful. Technology gets to change the world even from very humble beginnings—sometimes a few people and resources are enough to get the ball rolling, and at that point, if the conditions are right, nothing can stop it. Investors will sniff the opportunity and fund it, early adopters will get into it for the advantage it gives them; eventually it spreads enough that the world itself reshapes around the new thing, and the last holdouts have to either adapt or be left hopelessly behind. You could live in 1990 without internet, but in 2023 you would likely have a trouble finding a job, a house or a date without it. Moloch made sure of that.

Summoning Moloch to change the world on your behalf is a seductive proposition. It is also a dangerous one. There is no guarantee that the outcomes will be precisely what you hoped for regardless of your intentions; there is no guarantee that the outcomes will be good at all, in fact. You may as well just trigger a race to the bottom in which any benefits are only temporary, and eventually everything settles on an equilibrium where everyone is worse off. What will it be, penicillin or nuclear weapons? When you open your Pandora’s Box, you’ve just decided to change the world for everyone, for good or for bad, billions of people who had absolutely no say in what now will happen around them. We can’t just hold a worldwide referendum every time we want to invent something, of course, so there’s no getting around that. But while your hand is on the lid, at least, you ought to give it a think.

AGI has been called the last invention that humanity will ever need to make. It is very appropriate thus that it comes with all these warnings turned up to eleven: it promises to be more transformative than any other invention, and it promises to spread more quickly and more finally than any other invention (in fact, it would be able to spread itself). And if you are the one creating it, you have the strongest and possibly last word on what the world will become. Powerful AGI isn’t like any other invention. Regular inventions are usually passive tools, separate from the will of their creator (I was about to make a snide remark about how the inventor of the guillotine died by it, but apparently that’s a myth). Some inventions, like GMOs, are agents in their own way, but much less smart than us, and so we engineer ways to control them and prevent them from spreading too much. AGI however would be a smart agent; aligned AGI would be a smart agent imbued with the full set of values of its creator. It would change the world with absolutely fidelity to that vision.

Let’s go over some possible visions that such an AGI might spread into the world:

  • the creator is an authoritarian state that wants to simply rule everything with an iron fist;

  • the creator is a private corporation that comes up with some set of poorly thought out rules by committee that are mostly centred around its profit;

  • the creator is a strong ideologue who believes imposing their favourite set of values on everyone on Earth will be the best for everyone regardless of their opinion;

  • the creator is a genuinely well-intentioned person who only wishes for everyone to have as much freedom as allowed, but regardless of that has blind spots that they fail to identify and that slip their way into the rules;

  • the creator is a genuinely well-intentioned person who somehow manages the nigh-superhuman task of coming up with the minimal and sufficient set of rules that do indeed satisfy optimally everyone’s preferences to such a degree that it offsets any harms done in the process of unilaterally changing the world.

I believe some people might class some of these scenarios as cases of misalignment, but here I want to stress the difference between not being able to determine what the AI will do, and being able to determine it but just being evil (or incompetent). I think we can all agree that the last scenario feels the one possible lucky outcome at the end of a long obstacle course of pit falls. I also suspect (though I’ve not really tried to formalize it) that there is a fundamental advantage in trying to encode something as simple and lacking nuance as “make Dave the God-King of Earth and execute his every order, caring for no one else” than something much more sophisticated, which gives to the worst possible actors another leg up in this race (Dave of course might then paperclip the Earth by mistake by giving a wrongly worded order, which makes the scenario even worse).

So from my point of view, as a person who’s not creating the AGI, many aligned AGI scenarios might still be less than ideal. In some cases, the material benefits might be somehow lessened by these effects, but not so much that the outcome still isn’t a net positive for me (silly example: in the utopia in which I’m immortal and have all I wish for, but I am no longer allowed to say the word “fuck”, I might be slightly miffed but I’ll take what I get). In other cases the restrictions might be so severe and oppressive that, to me, they essentially make life a net negative, which would actually turn even immortality into a curse (not so silly example: in the dystopia in which everyone is a prisoner in a fascistic panopticon there might be no escape at all from compliance or torture). Still, I think that on the net, me and most people reading this would overall be more ok than not with most of the non-blatantly-oppressive varieties of this sort of take off. There’s a lot of the oppressive variety ones, though, and my guess is that they are more likely than the other kind (both because many powerful actors lack the insight and/​or moral fibre to actually succeed at creating a good one, and because the bad ones might be easier to create).

It gets even worse, though. Among relatively like-minded peers, we might at least roughly agree on which scenarios count as bad and which as good, and perhaps even on how likely the latter are. But that all crumbles on a global scale, because in the end...

People are people

“It may help to understand human affairs to be clear that most of the great triumphs and tragedies of history are caused, not by people being fundamentally good or fundamentally bad, but by people being fundamentally people.”

Good Omens

Suppose you had your aligned powerful AGI, ready to be deployed and change the world at the push of a big red button. Suppose then someone paraded in front of you each and every one of the eight billion people in this world, explained them calmly the situation, what would happen if you push the button, then gave them a gun and told them that if they want to stop you from pushing the button, the only way is to shoot you, and they will suffer no consequences for it. You’re not allowed to push the button until every single last person has left.

My guess is that you’d be dead before the hundredth person.

I’d be very surprised if you reached one thousand.

There are A Lot of cultures and systems of belief in this world[1]. Many of these are completely at odds with each other on very fundamental matters. Many will certainly be at odds with yours in one way or the other. There are people who will oppose making work obsolete. There are people who will oppose making death obsolete. Lots of them, in fact. You can think that some of these beliefs are stupid or evil, but that doesn’t change the fact that they think the same of yours, and will try to stop you if they can. You don’t need to look much in history to see how many people have regularly put their lives on the line, sometimes explicitly put them second, when it came to defending some identity or belief they really held dearly onto; it’s a very obvious revealed preference. If you are about to simply override all those values with an act of force, by using a powerful AGI to reshape the world in your image, they’ll feel that is an act of aggression—and they will be right.

There are social structures and constructs born of these beliefs. Religions, institutions, states. You may conceptualize them as memetic superorganisms that have a kind of symbiotic (or parasitic) relationship with their human hosts. Even if their hosts might be physically fine, your powerful AGI is like a battery of meme-tipped ICBMs aimed to absolutely annihilate them. To these social constructions, an aligned AGI might as well be as much of an existential threat as a misaligned one would be, and they’ll react and defend themselves to avoid being destroyed. They’ll strike pre-emptively, if that’s the only thing they can do. Even if you think that the people might eventually grow to like the post-singularity state of affairs, they won’t necessarily be of that opinion yet before, because they believe strongly in the necessity and goodness of those constructs, and that’s all that matters.

If enough people feel threatened enough, regardless of whether the alignment problem was solved, AGI training data centres might get bombed anyway.

I think we’re beginning to see this; talk of AGI has already started taking on the tones of geopolitics. “We can’t let China get there first!” is a common argument in favour of spurring a faster race and against slowing down. I can imagine similar arguments on the other side. To the democracy, the autocracy ruling the world would be a tragedy; similarly to the autocracy, democracy winning would be equally repulsive. We might think neither outcome is worth destroying the world over, but that’s not necessarily a shared sentiment either; just like in the Cold War someone might genuinely think “better dead than red”.

I’m not saying here that I have no opinion, that I think all value systems are equally valid, or any other strawman notion of perfect centrism. I am saying it doesn’t much matter who’s right if all sides feel cornered enough and are armed well enough to lash out. If you start a fight, someone else might finish it, and seeking to create powerful AGI is effectively starting a fight. Until now it seems to me like the main plan from people involved in this research has been “let’s look like a bunch of innocuous overenthusiastic nerds tinkering with software right until the very end, when it’s conquerin’ the world time… haha just kidding… unless...”, which honestly strikes me as offensively naïve and more than a bit questionable. But that ship might as well have sailed for good. Now AI risk is in the news, Italy bans ChatGPT over privacy concerns (with more EU countries that might follow) and people are pushing the matter to the Federal Trade Commission. If anyone had been sleeping until now, it’s wake up time.

Not everyone will believe that AGI can trigger an intelligence explosion, of course. But even if for some reason it didn’t, it might still be enough to create plenty of tensions, externally and internally. From an international viewpoint, a country with even just regular human-level AGI would command an immense amount of cognitive labour, might field an almost entirely robotic army, and perhaps sophisticated intelligent defence systems able to shield it effectively from a nuclear strike. The sheer increase in productivity and the intelligence available would be an unsurmountable strategic and economic advantage. On the internal front, of course, AGI could have a uniquely disruptive impact on the economy; automation has a way of displacing the freed labour towards higher tasks, but with AGI, there would be no task left to displace workers to. The best value a human worker might have left to offer would be that their body is still cheaper than a robot’s, and that’s really not a great bargaining position. A country with “simple” human level AGI thus might face challenges both on the external and internal fronts, and those might materialize even before AGI itself does. The dangers would be lesser than with superintelligence, but the benefits would be proportionally reduced too, so I think it still roughly cancels out.

I don’t think that having a peaceful, coordinated path to powerful aligned AGI is completely hopeless, overall. But I just don’t think that as a society we’re nearly there yet. Even beyond the technical difficulties of alignment, we lack the degree of cooperation and harmonization on a global scale that would allow us to organize the transition to a post-ASI future with enough shared participation that no one feels like they’re getting such a harsh deal they’d rather blow everyone up than suffer the future to come. As things stand, a race to AGI is a race to supremacy: the only way it ends is either with everyone dead, suppression of one side (if we’re lucky, via powerful aligned AGI, if we’re not, via nuclear weapons), or with all sides begrudgingly acknowledging that the situation is too dangerous for all involved and somehow slowly deescalating, possibly leading to a MAD-like equilibrium in which AGI is simply banned for all parties involved. The only way to accept you can’t have it, after all, is if no one else can have it either.

Conclusion

The usual argument from people who are optimistic about AGI alignment is that even if there’s <insert percentage> of X-risk, the upsides in case of success are so spectacular they are worth the risk. Here I am taking a bit more of a sombre view suggesting that if you want to weigh the consequences of AGI you also have to consider harms to the agency of many people who would be impacted by it without having had a say in its creation. These harms might be so acute that some people might expect an AGI future to be a net negative for them, and thus actively seek to resist or stop the creation of AGI; states might get particularly dangerous if they do feel existentially threatened by it. This then compounds to the potential harms of AGI for everyone else, since if you get caught in a nuclear strike before it’s deployed you don’t get to enjoy whatever comes afterwards anyway.

As AGI discourse becomes more mainstream, it’s important to appreciate perspectives beyond our own and not fall in the habit of downplaying or ignoring them. This is necessary both morally (revealed preferences matter and are about the only window we have in other people’s utility!) and strategically: AI research and development still exists embedded in the social and political realities of this world, however much it may wish to transcend them via a quick electronic apotheosis.

The good news is that if you believe that AI will likely destroy the world, this actually opens a possible path to survival. Everyone’s expectation on AI’s value will be different, but it’s becoming clear that many, many people see it as a net negative. In general people place themselves at different spots of the “expected AI power” axis based on their knowledge, experience, and general feelings; some don’t expect AI to get any worse than a tool to systematically concentrate value produced by individuals (e.g. art) into the hands of corporations via scraping and inference training. Other fear its misinformation potential, or its ability to rob people of their jobs on a massive scale, or its deployment as a weapon of war. Others believe its potential to be great enough to eventually be an extinction event. Some worry about AI being out of control, others about it being controlled far too well but for bad goals. Different expected levels of power affect people’s expectations about how much good or bad it can do, but in the end, many seem to then fall on the belief that it will still cause mostly bad, not because of technical reasons involved in the AI’s workings but because the social structures within which the AI is being created don’t allow for a good outcome. The same holds for powerful AGI: aligning it wouldn’t just be a prodigious technical challenge, but a social one on a global scale. Trying to race to it as a way to etch one’s supremacy into eternity is just about the worst reason and the worst way to go about it. We should be clear about this to both others and ourselves, avoid the facile trap of hoping for an outcome so arbitrarily good that it offsets entirely its improbability, and focus on a more realistic short term goal and path for humanity. We’re not yet quite willing or ready to hand off the reins our future to something else, and perhaps we may never be.

  1. ^

    Citation needed