Seth Herd comments on eggsyntax’s Shortform

Seth Herd 18 Sep 2025 18:03 UTC
10 points
0
I’m looking forward to seeing your post, because I think this deserves more careful thought.
I think that’s right, and that there are some more tricky assumptions and disanalogies underlying that basic error.
Before jumping in, let me say that I think that multipolar scenarious are pretty obviously more dangerous to a first approximation. There may be more carefully thought-out routes to equilibria that might work and are worth exploring. But just giving everyone an AGI and hoping it works out would probably be very bad.
Here’s where I think the mistake usually comes from. Looking around, multiagent systems are working out fairly well for humanity. Cooperation seems to beat defection on average; civilization seems to be working rather well, and better as we get smarter about it.
The disanalogy is that humans need to cooperate because we have sharp limitations in our own individual capacities. We can’t go it alone. But AGI can. AGI, including that intent-aligned to individuals or small groups, has no such limitations; it can expand relatively easily with compute, and run multiple robotic “bodies.” So the smart move from an individual actor who cares about the long-term (and they will, because now immortality is in reach) is to defect by having their AGI self-improve and create weapons and strategies before someone else does it to them. Basically, it’s easier to blow stuff up than to protect it from all possible sources of physical attack. So those willing to take the conflict into the physical world have a first-mover’s advantage (barring some new form of mutually assured destruction).
I’ve written about this more in Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours and Whether governments will control AGI is important and neglected. But those are just starting points on the logic. So I look forward to your post and resulting discussions.
The problem is that a unipolar situation with a single intent-aligned AGI (or few enough that their masters can coordinate) is still very dangerous; vicious humans will try and may succeed at seizing control of those systems.
If we get intent-aligned AGI for our first takeover-capable systems (and it seems very likely; having humans selfless enough and competent enough to build value-aligned systems on the first critical try seems highly unlikely on a careful inspection of the incentives and technical issues; see this and followups for more). we’re giving humans a lot of power in a brand-new situation. That has led to a lot of violence when it’s happened in the past.

That’s why I’d like to see some more safety brainpower going into the issue.