[Question] Game theory of “Nuclear Prisoner’s Dilemma”—on nuking rocks

Eliezer Yudkowsky wrote, in a place where I can’t ask a follow-up question:

  • A rational agent should always do at least as well for itself as a rock, unless it’s up against some other agent that specifically wants to punish particular decision algorithms and will pay costs itself to do that; just doing what a rock does isn’t very expensive or complicated, so a rational agent which isn’t doing better than a rock should just behave like a rock instead. An agent benefits from building into itself a capacity to respond to positive-sum trade offers; it doesn’t benefit from building into itself a capacity to respond to threats.

  • Consider the Nuclear Prisoner’s Dilemma, in which as well as Cooperate and Defect there’s a third option called Nuke, which if either player presses it causes both players to get (-100, −100). Suppose that both players are programs each allowed to look at each other’s source code (a la our paper “Robust Cooperation in the Prisoner’s Dilemma”), or political players with track records of doing what they say. If you’re up against a naive counterparty, you can threaten to press Nuke unless the opponent presses Cooperate (in which case you press Defect). But you’d have no reason to ever press Nuke if you were facing a rock; the only reason you’d ever set up a strategy of conditionally pressing Nuke is because of a prediction about how your opponent would respond in a complicated way to that strategy by their pressing Cooperate (even though you would then press Defect, and they’d know that). So a rational agent does not want to build into itself the capacity to respond to threats of Nuke by choosing Cooperate (against Defect); it would rather be a rock. It does want to build into itself a capacity to move from Defect-Defect to Cooperate-Cooperate, if both programs know the other’s code, or two entities with track records can negotiate.

Well, what if I told you that I had a perfectly good reason to to become someone that would threaten to nuke Defection Rock, and that it was because I wanted to make it clear that agents that self-modify into a rock get nuked anyway, so there’s no advantage to adopting a strategy that does something other than playing Cooperate while I play Defect. I want to keep my other victims convinced that surrendering to me is their best option, and nuking the occasional rock is a price I’m willing to pay to achieve that. In other words, I’ve transformed the game you’re playing from Prisoner’s Dilemma to Hawk/​Dove, and I’m a rock that always plays Hawk. So what does LDT have to say about that? Are you going to use a strategy that plays “Hawk” (anything other than Cooperate) against a rock that always plays Hawk and gets us both nuked, or are you going to do the sensible thing and play Dove (Cooperate)?

No comments.