Seth Herd comments on Notes on fatalities from AI takeover

Seth Herd 23 Sep 2025 19:49 UTC
2 points
−2
Interesting and good breakdown.
I place much higher odds on the “death due to takeover” for a pretty specific reason. We seem to have an excellent takeover mechanism in place which kills all or most of us: nukes. We have a gun pointed at our collective heads, and it’s deadlier to humans than AGIs.
Igniting a nuclear exchange and having just enough working robots, power sources, and industrial resources (factories, etc) to rebuild seems like a pretty viable route to fast takeover.
Igniting that exchange could be done via software intrusion or perhaps more easily by spoofing human communications to launch. This is basically the only use of deepfakes that really concerns me, but it concerns me a lot.
This becomes increasingly concerning to me in a multipolar scenario, which in turn seems all too likely at this point. Then every misaligned AGI is incentivized to take over as quickly as possible. A risky plan becomes more appealing if you have to worry that another AGI with different goals may launch their coup at any point.
This logic also applies if we solve alignment and have intent-aligned AGIs in a multipolar scenario with different human masters. I guess it also favors everyone who can, getting themselves or a backup to a safe location.
This is also conditional on progress in robotics vs. timelines; even with short timelines it seems like robotics will probably be far enough along for clumsy robots to build better robots. But here my knowledge of robotics, EMP effects, etc, fails. It does seem like a nontrivial chance that triggering a nuclear exchange is the easiest/fastest route to takeover.
One very well-informed individual told me nobody knows if any humans would survive a nuclear winter; but that’s probably less important than whether the “winning” AGI/human wanted them to survive.
The number of likely deaths given takeover seems higher than your estimate if that logic about nukes as a route to takeover mostly goes through.
But the question of extinction still hinges largely on whether AGI has any interest in humanity surviving. I think you’re assuming it will have a distribution of interests like humans do; I don’t think that’s a safe assumption at all, even given LLM-based AGI and noting that LLMs do really seem to have a distribution of motivations, including kindness toward humans.
I think how they reason about their goals once they’re capable of doing so is very hard to predict, so I’d give it more like a 50% chance they wind up being effectively maximizers for whatever their top motivation happens to be. There seems to be some instrumental pressure toward reflective stability, and that might favor one motivation winning outright vs. sharing power within that mind. I wrote a little about that in Section 5 and more in the remainder of LLM AGI may reason about its goals and discover misalignments by default, but it’s pretty incomplete.
Like the rest of alignment, it’s uncertain. LLMs have some kindness, but that doesn’t mean that LLM-based AGI will retain it. If we could be confident they would, we’d be a lot better set to solve alignment.
- ryan_greenblatt 23 Sep 2025 21:05 UTC
  4 points
  −2
  Parent
  
  I place much higher odds on the “death due to takeover” for a pretty specific reason. We seem to have an excellent takeover mechanism in place which kills all or most of us: nukes. We have a gun pointed at our collective heads, and it’s deadlier to humans than AGIs.
  
  I ended up feeling moderately persuaded by an argument that human coups typically kill a much smaller fraction of people in that country. I think some of the relevant reference classes don’t look that deadly and then there are some specific arguments for thinking AIs will kill lots of people as part of takeover, and I overall come to a not-that-high bottom line.
  
  The number of likely deaths given takeover seems higher than your estimate if that logic about nukes as a route to takeover mostly goes through.
  
  My understanding is that a large scale nuclear exchange would probably kill well less than half of people (I’d guess <25%). The initial exchange would probably directly kill <1 billion and then I don’t see a strong argument for nuclear winter killing far more people. This winter would be occuring during takeoff which has an unclear effect on the number of deaths.
  
  But the question of extinction still hinges largely on whether AGI has any interest in humanity surviving. I think you’re assuming it will have a distribution of interests like humans do
  
  I don’t think I’m particularly assuming this. I guess I think there is a roughly 50% chance that AI will be slightly kind in the sense needed to want to keep humans alive. Then, the rest is driven by trade. (Edit: I think trade is more important than slightly kind, but both are >50% likely.)
  - Seth Herd 23 Sep 2025 22:59 UTC
    2 points
    0
    Parent
    GPT5T roughly agrees with your estimate of nuclear winter deaths, so I’m probably way off. Looks like my information was well out of date; that conversation was from 2017 or so. And yes, deaths from winter-induced famine might be substantially mitigated if someone with AGI-developed tech bothered.
    
    Thanks for the clairifications on trade vs kindness. I am less optimistic about trade, causal or acausal, than you, but it’s a factor.