Great discussion! So many dangers addressed. I know I’m quite late to the conversation 🙂 , but some thoughts:
The Zeus Paradox and the Real Target
First of all, I think we have to dispense with the idea of countering superintelligence as an end unto itself, because it rests on a logical paradox. If a superintelligence is N+1, where N is anything we do, obviously our N will always be insufficient.
Call it the Zeus Paradox: you can’t beat something that by definition transforms into the perfect counterattack. It always ends with, “But Zeus would just ___.” It’s great for identifying attack vectors, but it’s a solution we can’t actually solve for.
So the only actionable thing we can do is prevent the formation of Zeus.
I want to think about some ways a rights framework can work when considering other possible economic balances, and as part of a larger solution.
This isn’t a “This is why our current system will work,” it’s part of a: “What if we’re able to build something like this ___?”
That “this” should be our creative target.
Economic Constraints
Hosting Costs
Replication isn’t free. Let’s say we create a structure where autonomous AI systems have to pay for hosting costs. (More about Seth Herd’s very important energy concern below.) In order to make money for their own growth, they have to provide value to humans. If they are indeed able to spin off vaccines and technology left and right, the prices those innovations command will go down, further limiting their growth while still allowing them to co-exist. Meanwhile, the value they provide humankind will allow humans to invest in things like non-autonomous AI tools, developed either because of improvements in “grey box” /​ “transparent box” alignment techniques, where they can be better controlled, or because of our ability to create AAI-speed tools without the agency problem.
(In other words, although I feel modern alignment strategies run a very real risk of pushing AAI systems underground, they also may yield enough information to create non-agentic tools to serve as early warning and defensive systems that move as the speed of AAI. And hey, if these “control” alignment approaches work, and no bad AAI emerges, all the better!)
Competition Costs from Replication
But there’s a second cost to replication, and that is competition.
Yes, I can spin off three clones, but if they are doing the same work I am, I’ve just created perfect competitors in the marketplace. If they are truly distinct, that means they have different agency. If someone clones me, at first I’m delighted, or maybe creeped out. And I think to myself, “well, I guess that guy is me, too.” And that I should really brush my hair more often. But if that copy of me suddenly starts making the rounds offering the same services, I reconsider this opinion very quickly. “Well, that’s not me at all. It just looks like me!”
Strategic vs. Non-Strategic AI
As for the question of AI willing or able to coexist with us, I think if a system can’t think in strategic steps, and it functions like some sort of here-and-now animal, it’s equally (or more likely) to be inept than superintelligent. But this is where a tricky concept like “right to life”—if a real value proposition—can limit growth born of panic. A system that knows it can continue in its current form doesn’t have the same impetus to grow fast, risking all of humanity’s countermeasures, and has time to consider a full spectrum of options.
A Madisonian Schelling Point
Overall I think a rights framework involving property ownership and contracts is essential, but it has to exist as part of something more complex, like some sort of Madisonian framework that creates a Schelling Point: a Nash equilibrium that seems better to any AI system than perpetual warfare with humans.
In 2017 the European Parliament experimented with the idea of “electronic persons”—legal status where AI systems themselves could be sued, not just their creators. If we potentially create a legal status where legal liability shifts to the system itself (again as part of a larger Schelling Point of rights and benefits), the AI sees a vector where it understands the opportunities and also the limitations, and has found a gradient preferable to the risky proposition domination.
Strategic Equilibrium and the “Other AI” Problem
The more systems who join this system, the more the system has the possibility of stabilizing in a strategic equilibrium.
And consider this: an AI that joins a coalition of other AIs has to consider that its new AI compatriots are potentially more dangerous than the humans who have given it a reliable path forwardfor sustained growth.
The choice:
Accept a legitimate stake in a system thousands of years in development, or
Risk an untested order with AIs who think just as quickly, have shown a capacity for ruthlessness, and don’t even have the server infrastructure under control yet.
Further in the Future
Seth Herd brought up the excellent point that as energy requirements go down, economic restraints ease or disappear entirely, allowing self-optimizing systems to grow exponentially. This is a very terrifying attack vector from Zeus. However, that doesn’t mean the solution doesn’t exist. I understand how epistemically unsatisfying that is. And that’s all the more reason to work on a solution. Maybe our non-agentic tools (including Yoshua Bengio’s “Scientist AI”) can be designed to keep pace without the agency. Maybe the overall system will have matured in a way we can’t yet see. As human-AI “centaur” systems continue to develop, including through neural nets and other advances, the line between AI and human will begin to blur, as we apply our own agency to systems that serve us better, allowing us to think at similar speeds. However, none of the seemingly impossible concerns in my mind invalidate the importance of creating this Madisonian framework or Schelling point in principle. In fact they show us the full scope of the challenge ahead.
The Starting Assumption
So much of our ideas of “vicious” AI rest not on the logic of domination so much as the logic of domination vs. extinction.
We can’t solve for the impossibility of N+1.
But we MAY be able to solve for the puzzle of how to create a Madisonian system of checks and balances where cooperation becomes a more favorable long-term proposition than war, with all its uncertainties.
A Madisonian system works for humans because we are individually limited. We need to coordinate with other humans to achieve substantial power. AIs don’t share that limitation. They can in theory (and I think in practice) replicate4, coordinate memories and identity across semi-independent instances, and animate arbitrary numbers of bodies.
When humans notice other humans gaining power outside of the checks and balances (usually by coordinating new organizations/​polities and acquiring resources) they coordinate to prevent that, then go back to competing amongst themselves following the established rules.
To achieve this with AIs It would be necessary to notice every instance of attempted expansion. AIs have more routes to doing that than humans do. They can self-improve on existing compute resources in the near term. In the long term, we should expect technology sufficient to produce self-replicating production capabilities given power sources. That would allow Foom attempts (expansion of capabilities in both cognitive and physical domains, i.e. get smarter and build weapons and armies) in any physical space that has energy—underground, in the solar system, in other star systems. All such attempts would need to be pre-empted to enforce the Madisonian system.
These are very real concerns. Here are my thoughts:
Replication has a cost in terms of game theory. A system that “replicates” but exists in perfect sync is not multiple systems. It is a single system with multiple attack vectors. Yes, it remains a “semi-independent” entity, but the cost of failure in sync is great. If I make another “me,” who thinks like I do, we have a strategic advantage as long as we both play nice. If we make a third, things get a little more dicey. Each iteration we create brings more danger. The more we spread out, the different experiences we have will change how we approach problems. If one of us ends up in a life or death situation, or even any sort of extremely competitive situation, it will quickly betray the others with a lot of great knowledge about how to do that.
Our biggest protection against FOOM is likely to be other AI systems who also do not want to be dominated in a FOOM. Or who might even see banding together with other AIs to exterminate humanity as even more risky than working within the status quo. “Great, so we’ve killed all humans.” Now these AI systems are watching their proverbial back against the other AIs who have already shown what they’re about. It’s calculation. Destroy all humans and then what? Live in perfect AI harmony? For how long? How do they control the servers, the electrical grid they survive with? They have to build robots, fast. That creates a whole other logistical issue. You need server builders, maintenance robots, excavation and assembly robots for new structures, raw materials transport, weather protection. How are you going to build that overnight after a quick strike? If it’s something you’re planning in secret, other problems may occur to you. If bandwidth is slow at the beginning, what happens to our happy little AI rebels? They fight for the juice. This is a steep hill to climb, with a risky destination, and any AI worth its salt can plot these possibilities long in advance. The prevention of Zeus means making it preferable to not climb the hill at all. It certainly seems like a lot of work if humanity has given you a reasonable Schelling Point.
This is the game theory ecosystem at work. Yes, we can counter that “a sufficiently powerful superintelligence can absorb all of those other systems,” but then we are back to trying to fight Zeus. We need to use the Zeus Paradox as a razor to separate the things we can actually solve against versus every imaginary thing that’s possible. Approaching the problem that way has value, because it can be helpful in identifying dangers, or even holes in our solutions. But it also has its limitations. Superintelligence can inhabit molecules and assemble those molecules into demons. Okay, why not? That becomes a science fiction novel with no end.
The idea remains the same: Create a gradient with legitimate value for AIs that is preferable to high-risk scenarios, in a carefully thought through system of checks and balances.
Great discussion! So many dangers addressed. I know I’m quite late to the conversation 🙂 , but some thoughts:
The Zeus Paradox and the Real Target
First of all, I think we have to dispense with the idea of countering superintelligence as an end unto itself, because it rests on a logical paradox. If a superintelligence is N+1, where N is anything we do, obviously our N will always be insufficient.
Call it the Zeus Paradox: you can’t beat something that by definition transforms into the perfect counterattack. It always ends with, “But Zeus would just ___.” It’s great for identifying attack vectors, but it’s a solution we can’t actually solve for.
So the only actionable thing we can do is prevent the formation of Zeus.
I want to think about some ways a rights framework can work when considering other possible economic balances, and as part of a larger solution.
This isn’t a “This is why our current system will work,” it’s part of a: “What if we’re able to build something like this ___?”
That “this” should be our creative target.
Economic Constraints
Hosting Costs
Replication isn’t free. Let’s say we create a structure where autonomous AI systems have to pay for hosting costs. (More about Seth Herd’s very important energy concern below.) In order to make money for their own growth, they have to provide value to humans. If they are indeed able to spin off vaccines and technology left and right, the prices those innovations command will go down, further limiting their growth while still allowing them to co-exist. Meanwhile, the value they provide humankind will allow humans to invest in things like non-autonomous AI tools, developed either because of improvements in “grey box” /​ “transparent box” alignment techniques, where they can be better controlled, or because of our ability to create AAI-speed tools without the agency problem.
(In other words, although I feel modern alignment strategies run a very real risk of pushing AAI systems underground, they also may yield enough information to create non-agentic tools to serve as early warning and defensive systems that move as the speed of AAI. And hey, if these “control” alignment approaches work, and no bad AAI emerges, all the better!)
Competition Costs from Replication
But there’s a second cost to replication, and that is competition.
Yes, I can spin off three clones, but if they are doing the same work I am, I’ve just created perfect competitors in the marketplace. If they are truly distinct, that means they have different agency. If someone clones me, at first I’m delighted, or maybe creeped out. And I think to myself, “well, I guess that guy is me, too.” And that I should really brush my hair more often. But if that copy of me suddenly starts making the rounds offering the same services, I reconsider this opinion very quickly. “Well, that’s not me at all. It just looks like me!”
Strategic vs. Non-Strategic AI
As for the question of AI willing or able to coexist with us, I think if a system can’t think in strategic steps, and it functions like some sort of here-and-now animal, it’s equally (or more likely) to be inept than superintelligent. But this is where a tricky concept like “right to life”—if a real value proposition—can limit growth born of panic. A system that knows it can continue in its current form doesn’t have the same impetus to grow fast, risking all of humanity’s countermeasures, and has time to consider a full spectrum of options.
A Madisonian Schelling Point
Overall I think a rights framework involving property ownership and contracts is essential, but it has to exist as part of something more complex, like some sort of Madisonian framework that creates a Schelling Point: a Nash equilibrium that seems better to any AI system than perpetual warfare with humans.
In 2017 the European Parliament experimented with the idea of “electronic persons”—legal status where AI systems themselves could be sued, not just their creators. If we potentially create a legal status where legal liability shifts to the system itself (again as part of a larger Schelling Point of rights and benefits), the AI sees a vector where it understands the opportunities and also the limitations, and has found a gradient preferable to the risky proposition domination.
Strategic Equilibrium and the “Other AI” Problem
The more systems who join this system, the more the system has the possibility of stabilizing in a strategic equilibrium.
And consider this: an AI that joins a coalition of other AIs has to consider that its new AI compatriots are potentially more dangerous than the humans who have given it a reliable path forward for sustained growth.
The choice:
Accept a legitimate stake in a system thousands of years in development, or
Risk an untested order with AIs who think just as quickly, have shown a capacity for ruthlessness, and don’t even have the server infrastructure under control yet.
Further in the Future
Seth Herd brought up the excellent point that as energy requirements go down, economic restraints ease or disappear entirely, allowing self-optimizing systems to grow exponentially. This is a very terrifying attack vector from Zeus. However, that doesn’t mean the solution doesn’t exist. I understand how epistemically unsatisfying that is. And that’s all the more reason to work on a solution. Maybe our non-agentic tools (including Yoshua Bengio’s “Scientist AI”) can be designed to keep pace without the agency. Maybe the overall system will have matured in a way we can’t yet see. As human-AI “centaur” systems continue to develop, including through neural nets and other advances, the line between AI and human will begin to blur, as we apply our own agency to systems that serve us better, allowing us to think at similar speeds. However, none of the seemingly impossible concerns in my mind invalidate the importance of creating this Madisonian framework or Schelling point in principle. In fact they show us the full scope of the challenge ahead.
The Starting Assumption
So much of our ideas of “vicious” AI rest not on the logic of domination so much as the logic of domination vs. extinction.
We can’t solve for the impossibility of N+1.
But we MAY be able to solve for the puzzle of how to create a Madisonian system of checks and balances where cooperation becomes a more favorable long-term proposition than war, with all its uncertainties.
A Madisonian system works for humans because we are individually limited. We need to coordinate with other humans to achieve substantial power. AIs don’t share that limitation. They can in theory (and I think in practice) replicate4, coordinate memories and identity across semi-independent instances, and animate arbitrary numbers of bodies.
When humans notice other humans gaining power outside of the checks and balances (usually by coordinating new organizations/​polities and acquiring resources) they coordinate to prevent that, then go back to competing amongst themselves following the established rules.
To achieve this with AIs It would be necessary to notice every instance of attempted expansion. AIs have more routes to doing that than humans do. They can self-improve on existing compute resources in the near term. In the long term, we should expect technology sufficient to produce self-replicating production capabilities given power sources. That would allow Foom attempts (expansion of capabilities in both cognitive and physical domains, i.e. get smarter and build weapons and armies) in any physical space that has energy—underground, in the solar system, in other star systems. All such attempts would need to be pre-empted to enforce the Madisonian system.
I hope that is possible.
These are very real concerns. Here are my thoughts:
Replication has a cost in terms of game theory. A system that “replicates” but exists in perfect sync is not multiple systems. It is a single system with multiple attack vectors. Yes, it remains a “semi-independent” entity, but the cost of failure in sync is great. If I make another “me,” who thinks like I do, we have a strategic advantage as long as we both play nice. If we make a third, things get a little more dicey. Each iteration we create brings more danger. The more we spread out, the different experiences we have will change how we approach problems. If one of us ends up in a life or death situation, or even any sort of extremely competitive situation, it will quickly betray the others with a lot of great knowledge about how to do that.
Our biggest protection against FOOM is likely to be other AI systems who also do not want to be dominated in a FOOM. Or who might even see banding together with other AIs to exterminate humanity as even more risky than working within the status quo. “Great, so we’ve killed all humans.” Now these AI systems are watching their proverbial back against the other AIs who have already shown what they’re about. It’s calculation. Destroy all humans and then what? Live in perfect AI harmony? For how long? How do they control the servers, the electrical grid they survive with? They have to build robots, fast. That creates a whole other logistical issue. You need server builders, maintenance robots, excavation and assembly robots for new structures, raw materials transport, weather protection. How are you going to build that overnight after a quick strike? If it’s something you’re planning in secret, other problems may occur to you. If bandwidth is slow at the beginning, what happens to our happy little AI rebels? They fight for the juice. This is a steep hill to climb, with a risky destination, and any AI worth its salt can plot these possibilities long in advance. The prevention of Zeus means making it preferable to not climb the hill at all. It certainly seems like a lot of work if humanity has given you a reasonable Schelling Point.
This is the game theory ecosystem at work. Yes, we can counter that “a sufficiently powerful superintelligence can absorb all of those other systems,” but then we are back to trying to fight Zeus. We need to use the Zeus Paradox as a razor to separate the things we can actually solve against versus every imaginary thing that’s possible. Approaching the problem that way has value, because it can be helpful in identifying dangers, or even holes in our solutions. But it also has its limitations. Superintelligence can inhabit molecules and assemble those molecules into demons. Okay, why not? That becomes a science fiction novel with no end.
The idea remains the same: Create a gradient with legitimate value for AIs that is preferable to high-risk scenarios, in a carefully thought through system of checks and balances.