Bringing Agency Into AGI Extinction Is Superfluous

Epistemic status: Maybe I am missing something here, normally I’d publish this on my own blog, but I’d actually love to get some feedback before I signal boost it.

I’ve seen a lot of debates around AGI risk hedge on the question of whether or not AGI will be “agentic” in a meaningful sense, and, subsequently, on whether AGI can be aligned sufficiently to not accidentally or intentionally kill all humans.

I personally buy the argument that, if something like AGI comes about, it will be alien in terms of its goals and necessarily agentic. It seems obvious. But it doesn’t seem obvious to many people and it seems like a very bad hill to die on, because it’s entirely unnecessary.


What the AGI risk debate really hinges on, in my view, is the limit of intelligence itself. Whether or not knowledge about the world, reason, and thinking speed is sufficient to get a large edge on influencing the material world.

I doubt this is true, but if you believe it is true, then the conclusion that AGI will drive the human race extinct needn’t require any agency or ill intent on the part of this intelligent system.

Once you assume that an AGI can, say:

  • Construct a virus more virulent and deadly than smallpox that persists for hundreds of years on surfaces

  • Design self-replicating nanobots that can make use of almost any substrate to propagate themselves

  • Genetically engineer an archaeon that, once seeded into the oceans, will quickly fill the atmosphere with methane

The question of whether or not an AGI will “want” to do this is about as moot as the question about whether or not a nuclear weapon will “want” to murder a large number of Japanese civilians.

There will be people, millions if not billions of them, which will at one time or another desire to murder the entire human race. There are entire cults with thousands or dozens of thousands of members each that would like to see most humans eradicated. Cults that have tried building nuclear weapons and attacking civilian centers with bioweapons.

Every day, across the world, terrorists, from those in structured organizations to random 12-year-olds shooting up schools, try to inflict maximal harm upon their fellow humans with no regard to their own safety out of a primordial need for vengeance against and abstract feeling of having been wronged.

These people would gladly use, help, encourage and pour all of their resources into an AGI system that can aid them in destroying humanity.


And this is not counting people that are indifferent or slightly pro-human race going extinct by omission, out of fear for their own hides making them engage in unethical actions. Which likely includes you and me, dear friend.

If you own, say, any part of a wide-reaching enough Vanguard or Black Rock ETF you are, at this very moment, contributing to corporations that are poisoning the skies and the oceans or building weapons that escalate conflicts via their mere need to be sold.

If you have ever brought a fun but useless toy from China you’ve contributed to the launch of an inefficient shipping vessel abusing international waters to burn the vilest of chemicals to propel their pointless cargo forward.

And this is not to mention our disregard for the inherent worthwhileness of consciousness and our indifference or encouragement for inflicting suffering upon others.

That same toy from China, the one that provides no real value to you or anyone, has helped build concentration camps.

If you’ve ever had unprotected sex you’ve taken a huge risk toward creating a new human being with no regard as to whether that human being will have a life of misery.

If you’ve ever eaten meat from an animal that you believe is conscious without considering the conditions of its farming, you’ve been literally paying torturers to inflict suffering upon your conscious life for a moment of mild culinary delight.

Indeed, if you’ve ever spent money on anything without considering the long-term implications of that thing, you’ve thrown dice frivolously into the wind to dictate the course of the entire human race towards a direction that may well not be beneficial to its flourishing.

And please don’t take all of this as a “holier than thou” type argument because it’s not, the above is simply a list of things I do that I know are bad and I do regardless. Why? I couldn’t tell you, coherently, maybe it’s inertia, maybe it’s fear, maybe I’m a bad person. But regardless, you and all those dear to you probably do them too.

A way to translate this into a thought experiment is that most of us if presented with a button that will yield us our deepest desire, be that love or money or power or whatever in exchange for a tiny-tiny 0.01% risk of all humans going extinct the next moment, would gladly press that button. And if we wouldn’t, we’d gladly press it once the 0.01% risk of all humans going extinct is obfuscated by sufficient layers of abstraction.


AGI is risky because it increases our capabilities to act upon the world, and most of our actions have nothing to do with the preservation of the human species or of our civilizations. Our existence as a species is predicted on such a fragile balance that, by simply acting randomly and with enough force, we are bound to at one time or another push hard enough in a direction that we’ll be dead before we even realize where we’re headed.

We already did this many times, the most famous of which might have been donating our first nuclear device after imperfect calculations of the then-seemingly-reasonable possibility of it starting a chain reaction that would have destroyed our atmosphere and, together with it, in mere minutes, every single human.

We currently don’t have the tools to push hard enough, so we can notice that the changes we inflict upon the world are negative and act, but with increased capabilities comes increased risk, and if our capabilities advance at an amazing rate we’ll be taking a “will a nuclear weapon light up the atmosphere” style gamble every year, month, day or even minute for the most frivolous of reasons.


All of this is not to say that AGI will actually lead to those kinds of capabilities increases. I’m pretty well established in the camp that thinks it won’t, or that by the time it will, the world will be weird enough that whatever we do now in order to prevent this will be of no consequence.

Nor do I think that the “systemic” take to AGI risk is the best stance to take in a public debate, since most people will resolutely refuse to believe they are acting in immoral ways and will contrive the most complex of internal defense to obfuscate such facts.

However, the “there are death cults” take seems easier to defend, after all, we know that the capabilities to destroy and to protect are highly asymmetrical, so destruction is easier. At which point we’d be more clearly debating the capabilities of an AGI system, as opposed to debating if the system would agentically use those capabilities to do us harm.

I assume that most people arguing for doomsday AGI might predict that such an AGI will so trivially “take the controls” from the hand of humans that it will be more reasonable to think about “it” and ignore any human desires or actions. I for one agree, and I think this argument even applies to tool-like AGIs in-so-far as the way we use them is through systems similar to markets and governments, systems with little regard for long-term human welfare.

But alas this is a harder-to-stomach and hard-to-get take. I personally find it harder to reason about because there are a lot of added “ifs”. So why not strip everything away and just think about the capabilities of AGI to do harm once in the hands of people that wish nothing more than to do harm? Such people are plentiful and there’s no reason to think they’d be excluded from using or building these sorts of systems, even if they’d be a few years behind the SOTA.