Hell is Game Theory Folk Theorems

Link post

[content warning: simulated very hot places; extremely bad Nash equilibria]

(based on a Twitter thread)

Rowan: “If we succeed in making aligned AGI, we should punish those who committed cosmic crimes that decreased the chance of an positive singularity sufficiently.”

Neal: “Punishment seems like a bad idea. It’s pessimizing another agent’s utility function. You could get a pretty bad equilibrium if you’re saying agents should be intentionally harming each others’ interests, even in restricted cases.”

Rowan: “In iterated games, it’s correct to defect when others defect against you; that’s tit-for-tat.”

Neal: “Tit-for-tat doesn’t pessimize, though, it simply withholds altruism sometimes. In a given round, all else being equal, defection is individually rational.”

Rowan: “Tit-for-tat works even when defection is costly, though.”

Neal: “Oh my, I’m not sure if you want to go there. It can get real bad. This is where I pull out the game theory folk theorems.”

Rowan: “What are those?”

Neal: “They’re theorems about Nash equilibria in iterated games. Suppose players play normal-form game G repeatedly, and are infinitely patient, so they don’t care about their positive or negative utilities being moved around in time. Then, a given payoff profile (that is, an assignment of utilities to players) could possibly be the mean utility for each player in the iterated game, if it satisfies two conditions: feasibility, and individual rationality.”

Rowan: “What do those mean?”

Neal: “A payoff profile is feasible if it can be produced by some mixture of payoff profiles of the original game G. This is a very logical requirement. The payoff profile could only be the average of the repeated game if it was some mixture of possible outcomes of the original game. If some player always receives between 0 and 1 utility, for example, they can’t have an average utility of 2 across the repeated game.”

Rowan: “Sure, that’s logical.”

Neal: “The individual rationality condition, on the other hand, states that each player must get at least as much utility in the profile as they could guarantee getting by min-maxing (that is, picking their strategy assuming other players make things as bad as possible for them, even at their own expense), and at least one player must get strictly more utility.”

Rowan: “How does this apply to an iterated game where defection is costly? Doesn’t this prove my point?”

Neal: “Well, if defection is costly, it’s not clear why you’d worry about anyone defecting in the first place.”

Rowan: “Perhaps agents can cooperate or defect, and can also punish the other agent, which is costly to themselves, but even worse for the other agent. Maybe this can help agents incentivize cooperation more effectively.”

Neal: “Not really. In an ordinary prisoner’s dilemma, the (C, C) utility profile already dominates both agents’ min-max utility, which is the (D, D) payoff. So, game theory folk theorems make mutual cooperation a possible Nash equilibrium.”

Rowan: “Hmm. It seems like introducing a punishment option makes everyone’s min-max utility worse, which makes more bad equilibria possible, without making more good equilibria possible.”

Neal: “Yes, you’re beginning to see my point that punishment is useless. But, things can get even worse and more absurd.”

Rowan: “How so?”

Neal: “Let me show you my latest game theory simulation, which uses state-of-the-art generative AI and reinforcement learning. Don’t worry, none of the AIs involved are conscious, at least according to expert consensus.”

Neal turns on a TV and types some commands into his laptop. The TV shows 100 prisoners in cages, some of whom are screaming in pain. A mirage effect appears across the landscape, as if the area is very hot.

Rowan: “Wow, that’s disturbing, even if they’re not conscious.”

Neal: “I know, but it gets even worse! Look at one of the cages more closely.”

Neal zooms into a single cage. It shows a dial, which selects a value ranging from 30 to 100, specifically 99.

Rowan: “What does the dial control?”

Neal: “The prisoners have control of the temperature in here. Specifically, the temperature in Celsius is the average of the temperature selected by each of the 100 denizens. This is only a hell because they have made it so; if they all set their dial to 30, they’d be enjoying a balmy temperature. And their bodies repair themselves automatically, so there is no release from their suffering.”

Rowan: “What? Clearly there is no incentive to turn the dial all the way to 99! If you set it to 30, you’ll cool the place down for everyone including yourself.”

Neal: “I see that you have not properly understood the folk theorems. Let us assume, for simplicity, that everyone’s utility in a given round, which lasts 10 seconds, is the negative of the average temperature. Right now, everyone is getting −99 utility in each round.. Clearly, this is feasible, because it’s happening. Now, we check if it’s individually rational. Each prisoner’s min-max payoff is −99.3: they set their temperature dial to 30, and since everyone else is min-maxing against them, everyone else sets their temperature dial to 100, leading to an average temperature of 99.3. And so, the utility profile resulting from everyone setting the dial to 99 is individually rational.”

Rowan: “I see how that follows. But this situation still seems absurd. I only learned about game theory folk theorems today, so I don’t understand, intuitively, why such a terrible equilibrium could be in everyone’s interest to maintain.”

Neal: “Well, let’s see what happens if I artificially make one of the prisoners select 30 instead of 99.”

Neal types some commands into his laptop. The TV screen splits to show two different dials. The one on the left turns to 30; the prisoner attempts to turn it back to 99, but is dismayed at it being stuck. The one on the right remains at 99. That is, until 6 seconds pass, at which point the left dial releases; both prisoners set their dials to 100. Ten more seconds pass, and both prisoners set the dial back to 99.

Neal: “As you can see, both prisoners set the dial to the maximum value for one round. So did everyone else This more than compensated for the left prisoner setting the dial to 30 for one round, in terms of average temperature. So, as you can see, it was never in the interest of that prisoner to set the dial to 30, which is why they struggled against it.”

Rowan: “That just passes the buck, though. Why does everyone set the dial to 100 when someone set it to 30 in a previous round?”

Neal: “The way it works is that, in each round, there’s an equilibrium temperature, which starts out at 99. If anyone puts the dial less than the equilibrium temperature in a round, the equilibrium temperature in the next round is 100. Otherwise, the equilibrium temperature in the next round is 99 again. This is a Nash equilibrium because it is never worth deviating from. In the Nash equilibrium, everyone else selects the equilibrium temperature, so by selecting a lower temperature, you cause an increase of the equilibrium temperature in the next round. While you decrease the temperature in this round, it’s never worth it, since the higher equilibrium temperature in the next round more than compensates for this decrease.”

Rowan: “So, as a singular individual, you can try to decrease the temperature relative to the equilibrium, but others will compensate by increasing the temperature, and they’re much more powerful than you in aggregate, so you’ll avoid setting the temperature lower than the equilibrium, and so the equilibrium is maintained.”

Neal: “Yes, exactly!”

Rowan: “If you’ve just seen someone else violate the equilibrium, though, shouldn’t you rationally expect that they might defect from the equilibrium in the future?”

Neal: “Well, yes. This is a limitation of Nash equilibrium as an analysis tool, if you weren’t already convinced it needed revisiting based on this terribly unnecessarily horrible outcome in this situation. Possibly, combining Nash equilibrium with Solomonoff induction might allow agents to learn each others’ actual behavioral patterns even when they deviate from the original Nash equilibrium. This gets into some advanced state-of-the-art game theory (1, 2), and the solution isn’t worked out yet. But we know there’s something wrong with current equilibrium notions.”

Rowan: “Well, I’ll ponder this. You may have convinced me of the futility of punishment, and the desirability of mercy, with your… hell simulation. That’s… wholesome in its own way, even if it’s horrifying, and ethically questionable.”

Neal: “Well, I appreciate that you absorbed a moral lesson from all this game theory!”