I think the headline (“Rational Agents Cooperate in the Prisoner’s Dilemma”) is assuming that “Rational Agent” is pointing to a unique thing, such that if A and B are both “rational agents”, then A and B are identical twins whose decisions are thus perfectly correlated. Right? If so, I disagree; I don’t think all pairs of rational agents are identical twins with perfectly-correlated decisions. For one thing, two “rational agents” can have different goals. For another thing, two “rational agents” can run quite different decision-making algorithms under the hood. The outputs of those algorithms have to satisfy certain properties, in order for the agents to be labeled “rational”, but those properties are not completely constraining on the outputs, let alone on the process by which the output is generated.
(I do agree that it’s insane not to cooperate in the perfect identical twin prisoner’s dilemma.)
I think OP might be thinking of something along the lines of an Aumann agreement—if both agents are rational, and possess the same information, how could they possibly come to a different conclusion?
Honestly I think the math definition of Nash equilibrium is best for the Defect strategy: it’s the strategy that maximises your expected outcome regardless of what the other does. That’s more like a total ignorance prior. If you start positing stuff about the other prisoner, like you know they’re a rational decision theorist, or you know they adhere to the same religion as you whose first commandment is “snitches get stitches”, then things change, because your belief distribution about their possible answer is not of total ignorance any more.
Well, the situation is not actually symmetric, because A is playing against B whereas B is playing against A, and A’s beliefs about B’s decision-making algorithm need not be identical to B’s beliefs about A’s decision-making algorithm. (Or if they exchange source code, they need not have the same source code, nor the same degree of skepticism that the supposed source code is the real source code and not a sneaky fake source code.) The fact that A and B are each individually rational doesn’t get you anything like that—you need to make additional assumptions.
Anyway, symmetry arguments are insufficient to prove that defect-defect doesn’t happen, because defect-defect is symmetric :)
Changing the game to a different game doesn’t mean the answer to the original game is wrong. In the typical prisoner’s dilemma, the players have common knowledge of all aspects of the game.
It’s usually not stated in computational terms like that, but to my understanding yes. The prisoner’s dilemma is usually posed as a game of complete information, with nothing hidden.
Oh wow, that’s a completely outlandish thing to believe, from my perspective. I’ll try to explain why I feel that way:
I haven’t read very much of the open-source game theory literature, but I have read a bit, and everybody presents it as “this is a weird and underexplored field”, not “this is how game theory works and has always worked and everybody knows that”.
Here’s an example: an open-source game theory domain expert writes “I sometimes forget that not everyone realizes how poorly understood open-source game theory is … open-source game theory can be very counterintuitive, and we could really use a lot more research to understand how it works before we start building a lot of code-based agents that are smart enough to read and write code themselves”
Couple more random papers: this one & this one. Both make it clear from their abstracts that they are exploring a different problem than the normal problems of game theory.
You can look up any normal game theory paper / study ever written, and you’ll notice that the agents are not sharing source code
Game theory is purported to be at least slightly relevant to humans and human affairs and human institutions, but none of those things can can share source code. Granted, everybody knows that game theory is idealized compared to the real world, but its limitations in that respect are frequently discussed, and I have never seen “humans can’t share source code / read each other’s minds” listed as one of those limitations.
The prisoner’s dilemma is usually posed as a game of complete information, with nothing hidden.
Wikipedia includes “strategies” as being common knowledge by definition of “complete information”, but I think that’s just an error (or a poor choice of words—see next paragraph). EconPort says complete information means each player is “informed of all other players payoffs for all possible action profiles”; Policonomics says “each agent knows the other agent’s utility function and the rules of the game”; Game Theory: An Introduction says complete information is “the situation in which each player i knows the action set and payoff function of each and every player j, and this itself is common knowledge”; Game Theory for Applied Economists says “a game has incomplete information if one player does not know another player’s payoffs”. I also found several claims that chess is a complete-information game, despite the fact that chess players obviously don’t share source code or explain to their opponent what exactly they were planning when they sacrificed their bishop.
I haven’t yet found any source besides Wikipedia that says anything about the player’s reading each other’s minds (not just utility functions, but also immediate plans, tactics, strategies, etc.) as being part of “complete information games” universally and by definition. Actually, upon closer reading, I think even Wikipedia is unclear here, and probably not saying that the players can read each other’s minds (beyond utility functions). The intro just says “strategies” which is unclear, but later in the article it says “potential strategies”, suggesting to me that the authors really mean something like “the opponents’ space of possible action sequences” (which entails no extra information beyond the rules of the game), and not “the opponents’ actual current strategies” (which would entail mind-reading).
Yep, a game of complete information is just one is which the structure of the game is known to all players. When wikipedia says
The utility functions (including risk aversion), payoffs, strategies and “types” of players are thus common knowledge.
it’s an unfortunately ambiguous phrasing but it means
The specific utility function each player has, the specific payoffs each player would get from each possible outcome, the set of possible strategies available to each player, and the set of possible types each player can have (e.g. the set of hands they might be dealt in cards) are common knowledge.
It certainly does not mean that the actual strategies or source code of all players are known to each other player.
Well in that case classical game theory doesn’t seem up to the task, since in order to make optimal decisions you’d need a probability distribution over the opponent’s strategies, no?
Right, vanilla game theory is mostly not a tool for making decisions.
It’s about studying the structure of strategic interactions, with the idea that some kind of equilibrium concept should have predictive power about what you’ll see in practice. On the one hand, if you get two humans together and tell them the rules of a matrix game, Nash equilibrium has relatively little predictive power. But there are many situations across biology, computer science, economics and more where various equilibrium concepts have plenty of predictive power.
The situation is slightly complicated, in the following way. You’re broadly right; source code sharing is new. But the old concept of Nash equilibrium is I think sometimes justified like this: We assume that not only do the agents know the game, but they also know each other. They know each other’s beliefs, each other’s beliefs about the other’s beliefs, and so on ad infinitum. Since they know everything, they will know what their opponent will do (which is allowed to be a stochastic policy). Since they know what their opponent will do, they’ll of course (lol) do a causal EU-maxxing best response. Therefore the final pair of strategies must be a Nash equilibrium, i.e. a mutual best-response.
This may be what Isaac was thinking of when referring to “common knowledge of everything”.
OSGT then shows that there are code-reading players who play non-Nash strategies and do better than Nashers.
Yes, but the idea (I think!) is that you can recover the policy from just the beliefs (on the presumption of CDT EU maxxing). Saying that A does xyz because B is going to do abc is one thing; it builds in some of the fixpoint finding. The common knowledge of beliefs instead says: A does xyz because he believes “B believes that A will do xyz, and therefore B will do abc as the best response”; so A chooses xyz because it’s the best response to abc.
But that’s just one step. Instead you could keep going:
--> A believes that
----> B believes that
------> A believes that
--------> B believes that A will do xyz,
--------> and therefore B will do abc as the best response
------> and therefore A will do xyz as the best response
----> and therefore B will do abc as the best response
so A does xyz as the best response.
And then you go to infinityyyy.
Being able to deduce a policy from beliefs doesn’t mean that common knowledge of beliefs is required.
The common knowledge of policy thing is true but is external to the game. We don’t assume that players in prisoner’s dilemma know each others policies. As part of our analysis of the structure of the game, we might imagine that in practice some sort of iterative responding-to-each-other’s-policy thing will go on, perhaps because players face off regularly (but myopically), and so the policies selected will be optimal wrt each other. But this isn’t really a part of the game, it’s just part of our analysis. And we can analyse games in various different ways e.g. by considering different equilibrium concepts.
In any case it doesn’t mean that an agent in reality in a prisoner’s dilemma has a crystal ball telling them the other’s policy.
Certainly it’s natural to consider the case where the agents are used to playing against each other so the have the chance to learn and react to each other’s policies. But a case where they each learn each other’s beliefs doesn’t feel that natural to me—might as well go full OSGT at that point.
Being able to deduce a policy from beliefs doesn’t mean that common knowledge of beliefs is required.
Sure, I didn’t say it was. I’m saying it’s sufficient (given some assumptions), which is interesting.
In any case it doesn’t mean that an agent in reality in a prisoner’s dilemma has a crystal ball telling them the other’s policy.
Sure, who’s saying so?
But a case where they each learn each other’s beliefs doesn’t feel that natural to me
It’s analyzed this way in the literature, and I think it’s kind of natural; how else would you make the game be genuinely perfect information (in the intuitive sense), including the other agent, without just picking a policy?
It’s assumed that both agents are offered equal utility from each outcome. It’s true that one of them might care less about “years in jail” than the other, but then you can just offer them something else. In the prisoner’s dilemma with certainty (no probabilistic reasoning) it actually doesn’t matter what the exact payouts are, just their ordering of D/C > C/C > D/D > C/D.
For another thing, two “rational agents” can run quite different decision-making algorithms under the hood. The outputs of those algorithms have to satisfy certain properties, in order for the agents to be labeled “rational”, but those properties are not completely constraining on the outputs, let alone on the process by which the output is generated.
My claim is that the underlying process is irrelevant. Either one output is better than another, in which case all rational agents will output that decision regardless of the process they used to arrive at it, or two outputs are tied for best in which case all rational agents would calculate them as being tied and output indifference.
My claim is that the underlying process is irrelevant.
OK, then I disagree with your claim. If A’s decision-making process is very different from B’s, then A would be wrong to say “If I choose to cooperate, then B will also choose to cooperate.” There’s no reason that A should believe that; it’s just not true. Why would it be? But that logic is critical to your argument.
And if it’s not true, then A gets more utility by defecting.
Either one output is better than another, in which case all rational agents will output that decision regardless of the process they used to arrive at it, or two outputs are tied for best in which case all rational agents would calculate them as being tied and output indifference.
Being in a prisoner’s dilemma with someone whose decision-making process is known by me to be very similar to my own, is a different situation from being in prisoner’s dilemma with someone whose decision-making process is unknown by me in any detail but probably extremely different from mine. You can’t just say “either C is better than D or C is worse than D” in the absence of that auxiliary information, right? It changes the situation. In one case, C is better, and in the other case, D is better.
The situation is symmetric. If C is better for one player, it’s better for the other. If D is better for one player, it’s better for the other. And we know from construction that C-C is better for both than D-D, so that’s what a rational agent will pick.
All that matters is the output, not the process that generates it. If one agent is always rational, and the other agent is rational on Tuesdays and irrational on all other days, it’s still better to cooperate on Tuesdays.
The rational choice for player A depends on whether (RCC × P(B cooperates | A cooperates) + RCD × P(B defects | A cooperates)) is larger or smaller than (RDC × P(B cooperates | A defects) + RDD × P(B defects | A defects)). (“R” for “Reward”) Right?
So then one extreme end of the spectrum is that A and B are two instantiations of the exact same decision-making algorithm, a.k.a. perfect identical twins, and therefore P(B cooperates | A defects) = P(B defects | A cooperates) = 0.
The opposite extreme end of the spectrum is that A and B are running wildly different decision-making algorithms with nothing in common at all, and therefore P(B cooperates | A defects) = P(B cooperates | A cooperates) = P(B cooperates) and ditto for P(B defects).
In the former situation, it is rational for A to cooperate, and also rational for B to cooperate. In the latter situation, it is rational for A to defect, and also rational for B to defect. Do you agree? For example, in the latter case, if you compare A to an agent A’ with minimally-modified source code such that A’ cooperates instead, then B still defects, and thus A’ does worse than A. So you can’t say that A’ is being “rational” and A is not—A is doing better than A’ here.
(The latter counterfactual is not an A’ and B’ who both cooperate. Again, A and B are wildly different decision-making algorithms, spacelike separated. When I modify A into A’, there is no reason to think that someone-very-much-like-me is simultaneously modifying B into B’. B is still B.)
In between the former and the latter situations, there are situations where the algorithms A and B are not byte-for-byte identical but do have something in common, such that the output of algorithm A provides more than zero but less than definitive evidence about the output of algorithm B. Then it might or might not be rational to cooperate, depending on the strength of this evidence and the exact payoffs.
Hmm, maybe it will help if I make it very concrete. You, Isaac, will try to program a rational agent—call it AI,—and I, Steve, will try to program my own rational agent—call it AS. As it happens, I’m going to copy your entire source code because I’m lazy, but then I’ll add in a special-case that says: my AS will defect when in a prisoner’s dilemma with your AI.
Now let’s consider different cases:
You screwed up; your agent AI is not in fact rational. I assume you don’t put much credence here—you think that you know what rational agents are.
Your agent AI is a rational agent, and so is my agent AS. OK, now suppose there’s a prisoner’s dilemma with your agent AI and my agent AS. Then your AI will cooperate because that’s presumably how you programmed it: as you say in the post title, “rational agents cooperate in the prisoner’s dilemma”. And my AS is going to defect because, recall, I put that as a special-case in the source code. So my agent is doing strictly better than yours. Specifically: My agent and your agent take the same actions in all possible circumstances except for this particular AI-and-AS prisoner’s dilemma, where my agent gets the biggest prize and yours is a sucker. So then I might ask: Are you sure your agent AI is rational? Shouldn’t rationality be about systematized winning?
Your agent AI is a rational agent, but my agent AS is not in fact a rational agent. If that’s your belief, then my question for you is: On what grounds? Please point to a specific situation where my agent AS is taking an “irrational” action.
Hmm, yeah, there’s definitely been a miscommunication somewhere. I agree with everything you said up until the cases at the end (except potentially your formula at the beginning; I wasn’t sure what “RCC” denotes).
You screwed up; your agent AI is not in fact rational. If this is intended to be a realistic hypothetical, this is where almost all of my credence would be. Nobody knows how to formally define rational behavior in a useful way (i.e. not AIXI), many smart people have been working on it for years, and I certainly don’t think I’d be more likely to succeed myself. (I don’t understand the relevance of this bullet point though, since clearly the point of your thought experiment is to discuss actually-rational agents.)
Your agent AI is a rational agent, and so is my agent AS. N/A, your agent isn’t rational.
Your agent AI is a rational agent, but my agent AS is not in fact a rational agent. Yes, this is my belief. Your agent is irrational because it’s choosing to defect against mine, which causes mine to defect against it, which results in a lower payoff than if it cooperated.
Sorry, RCC is the Reward / payoff to A if A Cooperates and if B also Cooperates, etc.
Nobody knows how to formally define rational behavior in a useful way
OK sure, let’s also imagine that you have access to a Jupiter-brain superintelligent oracle and can ask it for advice.
Your agent is irrational because it’s choosing to defect against mine, which causes mine to defect against it
How does that “causes” work?
My agent has a module in the source code, that I specifically added when I was writing the code, that says “if I, AS, am in a prisoner’s dilemma with AI specifically, then output ‘defect’”.
Your agent has no such module.
How did my inserting this module change the behavior of your AI? Is your AI reading the source code of my AS or something? (If the agents are reading each other’s source code, that’s a very important ingredient to the scenario, and needs to be emphasized!!)
More generally, I understand that your AI follows the rule “always cooperate if you’re in a prisoner’s dilemma with another rational agent”. Right? But the rule is not just “always cooperate”, right? For example, if a rational agent is in a prisoner’s dilemma against cooperate-bot ( = the simple, not rational, agent that always cooperates no matter what), and if the rational agent knows for sure that the other party to the prisoner’s dilemma is definitely cooperate-bot, then the rational agent is obviously going to defect, right?
And therefore, AI needs to figure out whether the other party to its prisoner’s dilemma is or is not “a rational agent”. How does it do that? Shouldn’t it be uncertain, in practice, in many practical situations?
And if two rational agents are each uncertain about whether the other party to the prisoner’s dilemma is “a rational agent”, versus another kind of agent (e.g. cooperate-bot), isn’t it possible for them both to defect?
It’s specified in the premise of the problem that both players have access to the other player’s description; their source code, neural map, decision theory, whatever. My agent considers the behavior of your agent, sees that your agent is going to defect against mine no matter what mine does, and defects as well. (It would also defect if your additional module said “always cooperate with IsaacBot”, or “if playing against IsaacBot, flip a coin”, or anything else that breaks the correlation.)
“Always cooperate with other rational agents” is not the definition of being rational, it’s a consequence of being rational. If a rational agent is playing against an irrational agent, it will do whatever maximizes its utility; cooperate if the irrational agent’s behavior is nonetheless correlated with the rational agent, and otherwise defect.
OK cool. If the title had been “LDT agents cooperate with other LDT agents in the prisoner’s dilemma if they can see, trust, and fully understand each other’s source code; and therefore it’s irrational to be anything but an LDT agent if that kind of situation might arise” … then I wouldn’t have objected. That’s a bit verbose though I admit :) (If I had seen that title, my reaction would have been “That might or might not be true; seems plausible but maybe needs caveats, whatever, it’s beyond my expertise”, whereas with the current title my immediate reaction was “That’s wrong!”.)
I think I was put off because:
The part where the agents see each other’s source code (and trust it, and can reason omnisciently about it) is omitted from the title and very easy to miss IMO even when reading the text [this is sometimes called “open-source prisoner’s dilemma”—it has a special name because it’s not the thing that people are usually talking about when they talk about “prisoner’s dilemmas”];
Relatedly, I think “your opponent is constitutionally similar to you and therefore your decisions are correlated” and “your opponent can directly see and understand your source code and vice-versa” are two different reasons that an agent might cooperate in the prisoner’s dilemma, and your post and comments seem to exclusively talk about the former but now it turns out that we’re actually relying on the latter;
I think everyone agrees that the rational move is to defect against a CDT agent, and your title “rational agents cooperate in the prisoner’s dilemma” omits who the opponent is;
Even if you try to fix that by adding “…with each other” to the title, I think that doesn’t really help because there’s a kind of circularity, where the way you define “rational agents” (which I think is controversial, or at least part of the thing you’re arguing for) determines who the prisoner’s dilemma opponent is, which in turn determines what the rational move is, yet your current title seems to be making an argument about what it implies to be a rational agent, so you wind up effectively presupposing the answer in a confusing way.
I think the headline (“Rational Agents Cooperate in the Prisoner’s Dilemma”) is assuming that “Rational Agent” is pointing to a unique thing, such that if A and B are both “rational agents”, then A and B are identical twins whose decisions are thus perfectly correlated. Right? If so, I disagree; I don’t think all pairs of rational agents are identical twins with perfectly-correlated decisions. For one thing, two “rational agents” can have different goals. For another thing, two “rational agents” can run quite different decision-making algorithms under the hood. The outputs of those algorithms have to satisfy certain properties, in order for the agents to be labeled “rational”, but those properties are not completely constraining on the outputs, let alone on the process by which the output is generated.
(I do agree that it’s insane not to cooperate in the perfect identical twin prisoner’s dilemma.)
I think OP might be thinking of something along the lines of an Aumann agreement—if both agents are rational, and possess the same information, how could they possibly come to a different conclusion?
Honestly I think the math definition of Nash equilibrium is best for the Defect strategy: it’s the strategy that maximises your expected outcome regardless of what the other does. That’s more like a total ignorance prior. If you start positing stuff about the other prisoner, like you know they’re a rational decision theorist, or you know they adhere to the same religion as you whose first commandment is “snitches get stitches”, then things change, because your belief distribution about their possible answer is not of total ignorance any more.
Well, the situation is not actually symmetric, because A is playing against B whereas B is playing against A, and A’s beliefs about B’s decision-making algorithm need not be identical to B’s beliefs about A’s decision-making algorithm. (Or if they exchange source code, they need not have the same source code, nor the same degree of skepticism that the supposed source code is the real source code and not a sneaky fake source code.) The fact that A and B are each individually rational doesn’t get you anything like that—you need to make additional assumptions.
Anyway, symmetry arguments are insufficient to prove that defect-defect doesn’t happen, because defect-defect is symmetric :)
Changing the game to a different game doesn’t mean the answer to the original game is wrong. In the typical prisoner’s dilemma, the players have common knowledge of all aspects of the game.
Just to be clear, are you saying that “in the typical prisoner’s dilemma”, each player has access to the other player’s source code?
It’s usually not stated in computational terms like that, but to my understanding yes. The prisoner’s dilemma is usually posed as a game of complete information, with nothing hidden.
Oh wow, that’s a completely outlandish thing to believe, from my perspective. I’ll try to explain why I feel that way:
I haven’t read very much of the open-source game theory literature, but I have read a bit, and everybody presents it as “this is a weird and underexplored field”, not “this is how game theory works and has always worked and everybody knows that”.
Here’s an example: an open-source game theory domain expert writes “I sometimes forget that not everyone realizes how poorly understood open-source game theory is … open-source game theory can be very counterintuitive, and we could really use a lot more research to understand how it works before we start building a lot of code-based agents that are smart enough to read and write code themselves”
Couple more random papers: this one & this one. Both make it clear from their abstracts that they are exploring a different problem than the normal problems of game theory.
You can look up any normal game theory paper / study ever written, and you’ll notice that the agents are not sharing source code
Game theory is purported to be at least slightly relevant to humans and human affairs and human institutions, but none of those things can can share source code. Granted, everybody knows that game theory is idealized compared to the real world, but its limitations in that respect are frequently discussed, and I have never seen “humans can’t share source code / read each other’s minds” listed as one of those limitations.
Wikipedia includes “strategies” as being common knowledge by definition of “complete information”, but I think that’s just an error (or a poor choice of words—see next paragraph). EconPort says complete information means each player is “informed of all other players payoffs for all possible action profiles”; Policonomics says “each agent knows the other agent’s utility function and the rules of the game”; Game Theory: An Introduction says complete information is “the situation in which each player i knows the action set and payoff function of each and every player j, and this itself is common knowledge”; Game Theory for Applied Economists says “a game has incomplete information if one player does not know another player’s payoffs”. I also found several claims that chess is a complete-information game, despite the fact that chess players obviously don’t share source code or explain to their opponent what exactly they were planning when they sacrificed their bishop.
I haven’t yet found any source besides Wikipedia that says anything about the player’s reading each other’s minds (not just utility functions, but also immediate plans, tactics, strategies, etc.) as being part of “complete information games” universally and by definition. Actually, upon closer reading, I think even Wikipedia is unclear here, and probably not saying that the players can read each other’s minds (beyond utility functions). The intro just says “strategies” which is unclear, but later in the article it says “potential strategies”, suggesting to me that the authors really mean something like “the opponents’ space of possible action sequences” (which entails no extra information beyond the rules of the game), and not “the opponents’ actual current strategies” (which would entail mind-reading).
Yep, a game of complete information is just one is which the structure of the game is known to all players. When wikipedia says
it’s an unfortunately ambiguous phrasing but it means
It certainly does not mean that the actual strategies or source code of all players are known to each other player.
Well in that case classical game theory doesn’t seem up to the task, since in order to make optimal decisions you’d need a probability distribution over the opponent’s strategies, no?
Right, vanilla game theory is mostly not a tool for making decisions.
It’s about studying the structure of strategic interactions, with the idea that some kind of equilibrium concept should have predictive power about what you’ll see in practice. On the one hand, if you get two humans together and tell them the rules of a matrix game, Nash equilibrium has relatively little predictive power. But there are many situations across biology, computer science, economics and more where various equilibrium concepts have plenty of predictive power.
But doesn’t the calculation of those equilibria require making an assumption about the opponent’s strategy?
The situation is slightly complicated, in the following way. You’re broadly right; source code sharing is new. But the old concept of Nash equilibrium is I think sometimes justified like this: We assume that not only do the agents know the game, but they also know each other. They know each other’s beliefs, each other’s beliefs about the other’s beliefs, and so on ad infinitum. Since they know everything, they will know what their opponent will do (which is allowed to be a stochastic policy). Since they know what their opponent will do, they’ll of course (lol) do a causal EU-maxxing best response. Therefore the final pair of strategies must be a Nash equilibrium, i.e. a mutual best-response.
This may be what Isaac was thinking of when referring to “common knowledge of everything”.
OSGT then shows that there are code-reading players who play non-Nash strategies and do better than Nashers.
This only needs knowledge of each other’s policy, not knowledge of each other’s knowledge, yes?
Yes, but the idea (I think!) is that you can recover the policy from just the beliefs (on the presumption of CDT EU maxxing). Saying that A does xyz because B is going to do abc is one thing; it builds in some of the fixpoint finding. The common knowledge of beliefs instead says: A does xyz because he believes “B believes that A will do xyz, and therefore B will do abc as the best response”; so A chooses xyz because it’s the best response to abc.
But that’s just one step. Instead you could keep going:
--> A believes that
----> B believes that
------> A believes that
--------> B believes that A will do xyz,
--------> and therefore B will do abc as the best response
------> and therefore A will do xyz as the best response
----> and therefore B will do abc as the best response
so A does xyz as the best response. And then you go to infinityyyy.
Being able to deduce a policy from beliefs doesn’t mean that common knowledge of beliefs is required.
The common knowledge of policy thing is true but is external to the game. We don’t assume that players in prisoner’s dilemma know each others policies. As part of our analysis of the structure of the game, we might imagine that in practice some sort of iterative responding-to-each-other’s-policy thing will go on, perhaps because players face off regularly (but myopically), and so the policies selected will be optimal wrt each other. But this isn’t really a part of the game, it’s just part of our analysis. And we can analyse games in various different ways e.g. by considering different equilibrium concepts.
In any case it doesn’t mean that an agent in reality in a prisoner’s dilemma has a crystal ball telling them the other’s policy.
Certainly it’s natural to consider the case where the agents are used to playing against each other so the have the chance to learn and react to each other’s policies. But a case where they each learn each other’s beliefs doesn’t feel that natural to me—might as well go full OSGT at that point.
Sure, I didn’t say it was. I’m saying it’s sufficient (given some assumptions), which is interesting.
Sure, who’s saying so?
It’s analyzed this way in the literature, and I think it’s kind of natural; how else would you make the game be genuinely perfect information (in the intuitive sense), including the other agent, without just picking a policy?
It’s assumed that both agents are offered equal utility from each outcome. It’s true that one of them might care less about “years in jail” than the other, but then you can just offer them something else. In the prisoner’s dilemma with certainty (no probabilistic reasoning) it actually doesn’t matter what the exact payouts are, just their ordering of D/C > C/C > D/D > C/D.
My claim is that the underlying process is irrelevant. Either one output is better than another, in which case all rational agents will output that decision regardless of the process they used to arrive at it, or two outputs are tied for best in which case all rational agents would calculate them as being tied and output indifference.
OK, then I disagree with your claim. If A’s decision-making process is very different from B’s, then A would be wrong to say “If I choose to cooperate, then B will also choose to cooperate.” There’s no reason that A should believe that; it’s just not true. Why would it be? But that logic is critical to your argument.
And if it’s not true, then A gets more utility by defecting.
Being in a prisoner’s dilemma with someone whose decision-making process is known by me to be very similar to my own, is a different situation from being in prisoner’s dilemma with someone whose decision-making process is unknown by me in any detail but probably extremely different from mine. You can’t just say “either C is better than D or C is worse than D” in the absence of that auxiliary information, right? It changes the situation. In one case, C is better, and in the other case, D is better.
The situation is symmetric. If C is better for one player, it’s better for the other. If D is better for one player, it’s better for the other. And we know from construction that C-C is better for both than D-D, so that’s what a rational agent will pick.
All that matters is the output, not the process that generates it. If one agent is always rational, and the other agent is rational on Tuesdays and irrational on all other days, it’s still better to cooperate on Tuesdays.
The rational choice for player A depends on whether (RCC × P(B cooperates | A cooperates) + RCD × P(B defects | A cooperates)) is larger or smaller than (RDC × P(B cooperates | A defects) + RDD × P(B defects | A defects)). (“R” for “Reward”) Right?
So then one extreme end of the spectrum is that A and B are two instantiations of the exact same decision-making algorithm, a.k.a. perfect identical twins, and therefore P(B cooperates | A defects) = P(B defects | A cooperates) = 0.
The opposite extreme end of the spectrum is that A and B are running wildly different decision-making algorithms with nothing in common at all, and therefore P(B cooperates | A defects) = P(B cooperates | A cooperates) = P(B cooperates) and ditto for P(B defects).
In the former situation, it is rational for A to cooperate, and also rational for B to cooperate. In the latter situation, it is rational for A to defect, and also rational for B to defect. Do you agree? For example, in the latter case, if you compare A to an agent A’ with minimally-modified source code such that A’ cooperates instead, then B still defects, and thus A’ does worse than A. So you can’t say that A’ is being “rational” and A is not—A is doing better than A’ here.
(The latter counterfactual is not an A’ and B’ who both cooperate. Again, A and B are wildly different decision-making algorithms, spacelike separated. When I modify A into A’, there is no reason to think that someone-very-much-like-me is simultaneously modifying B into B’. B is still B.)
In between the former and the latter situations, there are situations where the algorithms A and B are not byte-for-byte identical but do have something in common, such that the output of algorithm A provides more than zero but less than definitive evidence about the output of algorithm B. Then it might or might not be rational to cooperate, depending on the strength of this evidence and the exact payoffs.
Hmm, maybe it will help if I make it very concrete. You, Isaac, will try to program a rational agent—call it AI,—and I, Steve, will try to program my own rational agent—call it AS. As it happens, I’m going to copy your entire source code because I’m lazy, but then I’ll add in a special-case that says: my AS will defect when in a prisoner’s dilemma with your AI.
Now let’s consider different cases:
You screwed up; your agent AI is not in fact rational. I assume you don’t put much credence here—you think that you know what rational agents are.
Your agent AI is a rational agent, and so is my agent AS. OK, now suppose there’s a prisoner’s dilemma with your agent AI and my agent AS. Then your AI will cooperate because that’s presumably how you programmed it: as you say in the post title, “rational agents cooperate in the prisoner’s dilemma”. And my AS is going to defect because, recall, I put that as a special-case in the source code. So my agent is doing strictly better than yours. Specifically: My agent and your agent take the same actions in all possible circumstances except for this particular AI-and-AS prisoner’s dilemma, where my agent gets the biggest prize and yours is a sucker. So then I might ask: Are you sure your agent AI is rational? Shouldn’t rationality be about systematized winning?
Your agent AI is a rational agent, but my agent AS is not in fact a rational agent. If that’s your belief, then my question for you is: On what grounds? Please point to a specific situation where my agent AS is taking an “irrational” action.
Hmm, yeah, there’s definitely been a miscommunication somewhere. I agree with everything you said up until the cases at the end (except potentially your formula at the beginning; I wasn’t sure what “RCC” denotes).
You screwed up; your agent AI is not in fact rational. If this is intended to be a realistic hypothetical, this is where almost all of my credence would be. Nobody knows how to formally define rational behavior in a useful way (i.e. not AIXI), many smart people have been working on it for years, and I certainly don’t think I’d be more likely to succeed myself. (I don’t understand the relevance of this bullet point though, since clearly the point of your thought experiment is to discuss actually-rational agents.)
Your agent AI is a rational agent, and so is my agent AS. N/A, your agent isn’t rational.
Your agent AI is a rational agent, but my agent AS is not in fact a rational agent. Yes, this is my belief. Your agent is irrational because it’s choosing to defect against mine, which causes mine to defect against it, which results in a lower payoff than if it cooperated.
Sorry, RCC is the Reward / payoff to A if A Cooperates and if B also Cooperates, etc.
OK sure, let’s also imagine that you have access to a Jupiter-brain superintelligent oracle and can ask it for advice.
How does that “causes” work?
My agent has a module in the source code, that I specifically added when I was writing the code, that says “if I, AS, am in a prisoner’s dilemma with AI specifically, then output ‘defect’”.
Your agent has no such module.
How did my inserting this module change the behavior of your AI? Is your AI reading the source code of my AS or something? (If the agents are reading each other’s source code, that’s a very important ingredient to the scenario, and needs to be emphasized!!)
More generally, I understand that your AI follows the rule “always cooperate if you’re in a prisoner’s dilemma with another rational agent”. Right? But the rule is not just “always cooperate”, right? For example, if a rational agent is in a prisoner’s dilemma against cooperate-bot ( = the simple, not rational, agent that always cooperates no matter what), and if the rational agent knows for sure that the other party to the prisoner’s dilemma is definitely cooperate-bot, then the rational agent is obviously going to defect, right?
And therefore, AI needs to figure out whether the other party to its prisoner’s dilemma is or is not “a rational agent”. How does it do that? Shouldn’t it be uncertain, in practice, in many practical situations?
And if two rational agents are each uncertain about whether the other party to the prisoner’s dilemma is “a rational agent”, versus another kind of agent (e.g. cooperate-bot), isn’t it possible for them both to defect?
It’s specified in the premise of the problem that both players have access to the other player’s description; their source code, neural map, decision theory, whatever. My agent considers the behavior of your agent, sees that your agent is going to defect against mine no matter what mine does, and defects as well. (It would also defect if your additional module said “always cooperate with IsaacBot”, or “if playing against IsaacBot, flip a coin”, or anything else that breaks the correlation.)
“Always cooperate with other rational agents” is not the definition of being rational, it’s a consequence of being rational. If a rational agent is playing against an irrational agent, it will do whatever maximizes its utility; cooperate if the irrational agent’s behavior is nonetheless correlated with the rational agent, and otherwise defect.
OK cool. If the title had been “LDT agents cooperate with other LDT agents in the prisoner’s dilemma if they can see, trust, and fully understand each other’s source code; and therefore it’s irrational to be anything but an LDT agent if that kind of situation might arise” … then I wouldn’t have objected. That’s a bit verbose though I admit :) (If I had seen that title, my reaction would have been “That might or might not be true; seems plausible but maybe needs caveats, whatever, it’s beyond my expertise”, whereas with the current title my immediate reaction was “That’s wrong!”.)
I think I was put off because:
The part where the agents see each other’s source code (and trust it, and can reason omnisciently about it) is omitted from the title and very easy to miss IMO even when reading the text [this is sometimes called “open-source prisoner’s dilemma”—it has a special name because it’s not the thing that people are usually talking about when they talk about “prisoner’s dilemmas”];
Relatedly, I think “your opponent is constitutionally similar to you and therefore your decisions are correlated” and “your opponent can directly see and understand your source code and vice-versa” are two different reasons that an agent might cooperate in the prisoner’s dilemma, and your post and comments seem to exclusively talk about the former but now it turns out that we’re actually relying on the latter;
I think everyone agrees that the rational move is to defect against a CDT agent, and your title “rational agents cooperate in the prisoner’s dilemma” omits who the opponent is;
Even if you try to fix that by adding “…with each other” to the title, I think that doesn’t really help because there’s a kind of circularity, where the way you define “rational agents” (which I think is controversial, or at least part of the thing you’re arguing for) determines who the prisoner’s dilemma opponent is, which in turn determines what the rational move is, yet your current title seems to be making an argument about what it implies to be a rational agent, so you wind up effectively presupposing the answer in a confusing way.
See Nate Soares’s Decision theory does not imply that we get to have nice things for a sense of how the details really matter and can easily go awry in regards to “reading & understanding each other’s source code”.