So I note that our industrial civilization has not in fact been plunged into nuclear fire. With that in mind, do you think that von Neumann’s model of the world was missing anything? If so, does that missing thing also apply here? If not, why hasn’t there been a nuclear war?
The missing piece is mutually assured destruction. Given that we did not play the Nash equilibrium as von Neumann suggested, the next best thing is MAD and various counterproliferation treaties that happened to work okay for humans. With an AGI counterparty, we can hope to build in a MAD-like assurance, but it will be a lot more challenging. The equilibrium move is to right now not build AGI.
I think this is basically right on the object level—specifically, I think that what von Neumann missed was that by changing the game a little bit, it was possible to get to a much less deadly equilibrium. Specifically, second strike capabilities and a pre-commitment to use them ensure that the expected payoff for a first strike is negative.
On the meta level, I think that very smart people who learn some game theory have a pretty common failure mode, which looks like
Look at some real-world situation
Figure out how to represent it as a game (in the game theory sense)
Find a Nash Equilibrium in that game
Note that the Nash Equilibrium they found is horrifying
Shrug and say “I can’t argue with math, I guess it’s objectively correct to do the horrifying thing”
In some games, multiple Nash equilibria exist. In others, it may be possible to convince the players to play a slightly different game instead.
In this game, I think our loss condition is “an AGI gains a decisive strategic advantage, and is able to maintain that advantage by destroying any entities that could oppose it, and determines humans are such entities, and, following that logic, destroys human civilization”.
The “make sure that future AIs are aligned with humanity” seems, to me, to be a strategy targeting the “determines humans are such entities” step of the above loss condition. But I think there are two additional stable Nash equilibria, namely “no single entity is able to obtain a strategic advantage” and “attempting to destroy anyone who could oppose you will, in expectation, leave you worse off in the long run than not doing that”. If there are three I have thought of there are probably more that I haven’t thought of, as well.
You are correct that my argument would be stronger if I could prove that the NE I identified is the only one.
I do not think it is reasonable that AGI would fail to obtain strategic advantage if sought, unless we pre-built in MAD-style assurances. But perhaps under my assumptions a stable “no one manages to destroy the other” outcome results. I would need to do more work to bring in assumptions about AGI becoming vastly more powerful and definitely winning, to prevent this. And I think this is the case, but maybe I should make it more clear.
Similarly, if we can achieve a provable alignment, rather than probabilistic, then we simply do not have the game arise. The AGI would never be in a position to protect its own existence at the expense of ours, due to that provable alignment.
In each case I think you are changing the game, which is something we can and I think should do, but barring some actual work to do that, I think we are left with a game as I’ve described, maybe without sufficient technical detail.
Correct. Are you intending for this to be a reductio ad absurdum?
So I note that our industrial civilization has not in fact been plunged into nuclear fire. With that in mind, do you think that von Neumann’s model of the world was missing anything? If so, does that missing thing also apply here? If not, why hasn’t there been a nuclear war?
The missing piece is mutually assured destruction. Given that we did not play the Nash equilibrium as von Neumann suggested, the next best thing is MAD and various counterproliferation treaties that happened to work okay for humans. With an AGI counterparty, we can hope to build in a MAD-like assurance, but it will be a lot more challenging. The equilibrium move is to right now not build AGI.
I think this is basically right on the object level—specifically, I think that what von Neumann missed was that by changing the game a little bit, it was possible to get to a much less deadly equilibrium. Specifically, second strike capabilities and a pre-commitment to use them ensure that the expected payoff for a first strike is negative.
On the meta level, I think that very smart people who learn some game theory have a pretty common failure mode, which looks like
Look at some real-world situation
Figure out how to represent it as a game (in the game theory sense)
Find a Nash Equilibrium in that game
Note that the Nash Equilibrium they found is horrifying
Shrug and say “I can’t argue with math, I guess it’s objectively correct to do the horrifying thing”
In some games, multiple Nash equilibria exist. In others, it may be possible to convince the players to play a slightly different game instead.
In this game, I think our loss condition is “an AGI gains a decisive strategic advantage, and is able to maintain that advantage by destroying any entities that could oppose it, and determines humans are such entities, and, following that logic, destroys human civilization”.
I totally agree with your diagnosis of how some smart people sometimes misuse game theory. And I agree that that’s the loss condition
The “make sure that future AIs are aligned with humanity” seems, to me, to be a strategy targeting the “determines humans are such entities” step of the above loss condition. But I think there are two additional stable Nash equilibria, namely “no single entity is able to obtain a strategic advantage” and “attempting to destroy anyone who could oppose you will, in expectation, leave you worse off in the long run than not doing that”. If there are three I have thought of there are probably more that I haven’t thought of, as well.
You are correct that my argument would be stronger if I could prove that the NE I identified is the only one.
I do not think it is reasonable that AGI would fail to obtain strategic advantage if sought, unless we pre-built in MAD-style assurances. But perhaps under my assumptions a stable “no one manages to destroy the other” outcome results. I would need to do more work to bring in assumptions about AGI becoming vastly more powerful and definitely winning, to prevent this. And I think this is the case, but maybe I should make it more clear.
Similarly, if we can achieve a provable alignment, rather than probabilistic, then we simply do not have the game arise. The AGI would never be in a position to protect its own existence at the expense of ours, due to that provable alignment.
In each case I think you are changing the game, which is something we can and I think should do, but barring some actual work to do that, I think we are left with a game as I’ve described, maybe without sufficient technical detail.