The true prisoner’s dilemma with skewed payoff matrix

Related to The True Prisoner’s Dilemma, Let’s split the cake, lengthwise, upwise and slantwise, If you don’t know the name of the game, just tell me what I mean to you

tl;dr: Playing the true PD, it might be that you should co-operate when expecting the other one to defect, or vice versa, in some situations, against agents that are capable of superrationality. This is because relative weight of outcomes for both parties might vary. This could lead this sort of agents to outperform even superrational ones.

So, it happens that our benevolent Omega has actually an evil twin, that is as trustworthy as his sibling, but abducts people into a lot worse hypothetical scenarios. Here we have one:

You wake up in a strange dimension, and this Evil-Omega is smiling at you, and explains that you’re about to play a game with unknown paperclip maximizer from another dimension that you haven’t interacted with before and won’t interact with ever after. The alien is like GLUT when it comes to consciousness, it runs a simple approximation of rational decision algorithm but nothing that you could think of as “personality” or “soul”. Also, since it doesn’t have a soul, you have absolutely no reason to feel bad for it’s losses. This is true PD.

You are also told some specifics about the algorithm that the alien uses to reach its decision, and likewise told that alien is told about as much about you. At this point I don’t want to nail the algorithm the opposing alien uses down to one specific. We’re looking for a method that wins when summing up all these possibilities. Next, especially, we’re looking at the group of AI’s that are capable of superrationality, since against other’s the game is trivial.

The payoff matrix is like this:

DD=(lose 3 billion lives and be tortured, lose 4 paperclips), CC=(2 billion lives and be made miserable, lose 2 paperclips), CD=(lose 5 billion lives and be tortured a lot, nothing), DC=(nothing, lose 8 paperclips)

So, what do you do? Opponent is capable of superrationality. In the post “The True Prisoner’s Dilemma”, it was(kinda, vaguely, implicitly) assumed for simplicity’s sake that this information is enough to decide whether to defect or not. Answer, based on this information, could be to co-operate. However, I argue that information given is not enough.

Back to the hypothetical: In-hypothetical you is still wondering about his/​her decision, but we zoom out and observe that, unbeknownst to you, Omega has abducted your fellow LW reader and another paperclip maximizer from that same dimension, and is making them play PD. But this time their payoff matrix is like this:

DD=(lose $0.04, make 2 random, small changes to alien’s utility function and 200 paperclips lost), CC=(lose $0.02, 1 change, 100 paperclips), CD=(lose $0.08, nothing), DC=(nothing, 4 changes, 400 paperclips)

Now, if it’s not “rational” to take the relative loss into account, we’re bound to find ourselves in a situation where billions of humans die. You could be regretting your rationality, even. It should become obvious now that you’d wish you could somehow negotiate both of these PD’s so that you would defect and your opponent co-operate. You’d be totally willing to take a $0.08 hit for that, maybe paying it in its entirety for your friend. And so it happens, paperclip maximizers would also have an incentive to do this.

But, of course, players don’t know about this entire situation, so they might not be able to operate in optimal way in this specific scenario. However, if they take into account how much the other cares about those results, using some unknown method, they just might be able to systematically perform better(if we made more of this sorts of problems, or if we selected payoffs at random for the one-shot game), than “naive” PD-players playing against each other. Naivity here would imply that they simply and blindly co-operate against equally rational opponents. How to achieve that is the open question.

-

Stuart Armstrong, for example, has an actual idea of how to co-operate when the payoffs are skewed, while I’m just pointing out that there’s a problem to be solved, so this is not really news or anything. Anyways, I still think that this topic has not been explored as much as it should be.

Edit. Added this bit: You are also told some specifics about the algorithm that the alien uses to reach its decision, and likewise told that alien is told about as much about you. At this point I don’t want to nail the algorithm the opposing alien uses down to one specific. We’re looking for a method that wins when summing up all these possibilities. Next, especially, we’re looking at the group of AI’s that are capable of superrationality, since against other sort of agents the game is trivial.

Edit. Corrected some huge errors here and there, like, mixing hypothetical you and hypothetical LW-friend.

Edit. Transfer Discussion → Real LW complete!