Many people (eg e/acc) believe that although a single very strong future AI might result in bad outcomes, a multi-agent system with many strong AIs will turn out well for humanity. To others, including myself, this seems clearly false.
Why do people believe this? Here’s my thought:
Conditional on existing for an extended period of time, complex multi-agent systems have reached some sort of equilibrium (in the general sense, not the thermodynamic sense; it may be a dynamic equilibrium like classic predator-prey dynamics).
Therefore when we look around us, the complex multi-agent systems we see, are nearly all in some equilibrium, with each participant existing in some niche.
This does not mean that most complex multi-agent systems will reach equilibria, or that all participants at the beginning will become part of such an equilibrium; it’s just that the ones which don’t aren’t around to look at[1].
This survivorship bias is what leads many people to mistakenly expect that having many strong AI systems around will result in stable equilibrium.
It’s certainly not an original insight that multi-agent systems aren’t necessarily stable and don’t necessarily work out well for participants. If anything here is original, it’s the attempt to identify why so many people’s intuitions about this are wrong. If that guess is correct, it suggests an approach for helping people build better intuitions about what might happen in future multi-agent systems of strong AI.
Some examples of multi-agent systems where one or more participants have been destroyed rather than becoming part of a long-term equilibrium include ecosystems (see hundreds of cichlid species in Lake Victoria, or most forest bird species in Guam), the business community (most startups fail), and the financial markets.
I’m looking forward to seeing your post, because I think this deserves more careful thought.
I think that’s right, and that there are some more tricky assumptions and disanalogies underlying that basic error.
Before jumping in, let me say that I think that multipolar scenarious are pretty obviously more dangerous to a first approximation. There may be more carefully thought-out routes to equilibria that might work and are worth exploring. But just giving everyone an AGI and hoping it works out would probably be very bad.
Here’s where I think the mistake usually comes from. Looking around, multiagent systems are working out fairly well for humanity. Cooperation seems to beat defection on average; civilization seems to be working rather well, and better as we get smarter about it.
The disanalogy is that humans need to cooperate because we have sharp limitations in our own individual capacities. We can’t go it alone. But AGI can. AGI, including that intent-aligned to individuals or small groups, has no such limitations; it can expand relatively easily with compute, and run multiple robotic “bodies.” So the smart move from an individual actor who cares about the long-term (and they will, because now immortality is in reach) is to defect by having their AGI self-improve and create weapons and strategies before someone else does it to them. Basically, it’s easier to blow stuff up than to protect it from all possible sources of physical attack. So those willing to take the conflict into the physical world have a first-mover’s advantage (barring some new form of mutually assured destruction).
The problem is that a unipolar situation with a single intent-aligned AGI (or few enough that their masters can coordinate) is still very dangerous; vicious humans will try and may succeed at seizing control of those systems.
If we get intent-aligned AGI for our first takeover-capable systems (and it seems very likely; having humans selfless enough and competent enough to build value-aligned systems on the first critical try seems highly unlikely on a careful inspection of the incentives and technical issues; see this and followups for more). we’re giving humans a lot of power in a brand-new situation. That has led to a lot of violence when it’s happened in the past.
That’s why I’d like to see some more safety brainpower going into the issue.
Draft thought, posting for feedback:
Many people (eg e/acc) believe that although a single very strong future AI might result in bad outcomes, a multi-agent system with many strong AIs will turn out well for humanity. To others, including myself, this seems clearly false.
Why do people believe this? Here’s my thought:
Conditional on existing for an extended period of time, complex multi-agent systems have reached some sort of equilibrium (in the general sense, not the thermodynamic sense; it may be a dynamic equilibrium like classic predator-prey dynamics).
Therefore when we look around us, the complex multi-agent systems we see, are nearly all in some equilibrium, with each participant existing in some niche.
This does not mean that most complex multi-agent systems will reach equilibria, or that all participants at the beginning will become part of such an equilibrium; it’s just that the ones which don’t aren’t around to look at[1].
This survivorship bias is what leads many people to mistakenly expect that having many strong AI systems around will result in stable equilibrium.
It’s certainly not an original insight that multi-agent systems aren’t necessarily stable and don’t necessarily work out well for participants. If anything here is original, it’s the attempt to identify why so many people’s intuitions about this are wrong. If that guess is correct, it suggests an approach for helping people build better intuitions about what might happen in future multi-agent systems of strong AI.
Some examples of multi-agent systems where one or more participants have been destroyed rather than becoming part of a long-term equilibrium include ecosystems (see hundreds of cichlid species in Lake Victoria, or most forest bird species in Guam), the business community (most startups fail), and the financial markets.
I’m looking forward to seeing your post, because I think this deserves more careful thought.
I think that’s right, and that there are some more tricky assumptions and disanalogies underlying that basic error.
Before jumping in, let me say that I think that multipolar scenarious are pretty obviously more dangerous to a first approximation. There may be more carefully thought-out routes to equilibria that might work and are worth exploring. But just giving everyone an AGI and hoping it works out would probably be very bad.
Here’s where I think the mistake usually comes from. Looking around, multiagent systems are working out fairly well for humanity. Cooperation seems to beat defection on average; civilization seems to be working rather well, and better as we get smarter about it.
The disanalogy is that humans need to cooperate because we have sharp limitations in our own individual capacities. We can’t go it alone. But AGI can. AGI, including that intent-aligned to individuals or small groups, has no such limitations; it can expand relatively easily with compute, and run multiple robotic “bodies.” So the smart move from an individual actor who cares about the long-term (and they will, because now immortality is in reach) is to defect by having their AGI self-improve and create weapons and strategies before someone else does it to them. Basically, it’s easier to blow stuff up than to protect it from all possible sources of physical attack. So those willing to take the conflict into the physical world have a first-mover’s advantage (barring some new form of mutually assured destruction).
I’ve written about this more in Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours and Whether governments will control AGI is important and neglected. But those are just starting points on the logic. So I look forward to your post and resulting discussions.
The problem is that a unipolar situation with a single intent-aligned AGI (or few enough that their masters can coordinate) is still very dangerous; vicious humans will try and may succeed at seizing control of those systems.
If we get intent-aligned AGI for our first takeover-capable systems (and it seems very likely; having humans selfless enough and competent enough to build value-aligned systems on the first critical try seems highly unlikely on a careful inspection of the incentives and technical issues; see this and followups for more). we’re giving humans a lot of power in a brand-new situation. That has led to a lot of violence when it’s happened in the past.
That’s why I’d like to see some more safety brainpower going into the issue.