Which decision theory should we use? CDT? UDT? TDT? What exactly do we mean by a “better” decision theory?
To get some practice in answering this kind of question, lets look first at a simpler set of questions: Which play should I make in the game PSS? Paper? Stone? Scissors? What exactly do we mean by a better play in this game?
Bear with me on this. I think that a careful look at the process that game theorists went through in dealing with game-level questions may be very helpful in our current confusion about decision-theory-level questions.
The first obvious thing to notice about the PSS problem is that there is no universal “best” play in the game. Sometimes one play (“stone”, say) works best; sometimes another play works better. It depends on what the other player does. So we make our first conceptual breakthrough. We realize we have been working on the wrong problem. It is not “which play produces the best results?”. It is rather “which play produces the best expected results?” that we want to ask.
Well, we are still a bit puzzled by that new word “expected”, so we hire consultants. One consultant, a Bayesian/MAXENT theorist tells us that the appropriate expectation is that the other player will play each of “paper”, “stone”, and “scissors” equally often. And hence that all plays on our part are equally good. The second consultant, a scientist, actually goes out and observes the other player. He comes back with the report that out of 100 PSS games, the other player will play “paper” 35 times, “stone” 32 times, and “scissors” 33 times. So the scientist recommends the play “scissors” as the best play. Our MAXENT consultant has no objection. “That choice is no worse than any other”, says he.
So we adopt the strategy of always playing scissors, which works fine at first, but soon starts returning abysmal results. The MAXENT fellow is puzzled. “Do you think maybe the other guy found out about our strategy?” he asks. “Maybe he hired our scientist away from us. But how can we possibly keep our strategy secret if we use it more than once?” And this leads to our second conceptual breakthrough.
We realize that it is both impossible and unnecessary to keep our strategy secret (just as cryptographer knows that it is difficult and unnecessary to keep the encryption algorithm secret. But it is both possible and essential to keep the plays secret until they are actually made (just as a cryptographer keeps keys secret). Hence, we must have mixed strategies where the strategy is a probability distribution and a play is a one-point sample from that distribution.
Take a step back and think about this. Non-determinism of agents is an inevitable consequence of having multiple agents whose interests are not aligned (or more precisely, agents whose interests cannot be brought into alignment by a system of side payments). Lesson 1: Any decision theory intended to work in multi-agent situations must handle (i.e. model) non-determinism in other agents. Lesson 2: In many games, the best strategy is a mixed strategy.
Think some more. Agents whose interests are not aligned often should keep secrets from each other. Lesson 3: Decision theories must deal with secrecy. Lesson 4: Agents may lie to preserve secrets.
But how does game theory find the best mixed strategy? Here is where it gets weird. It turns out that, in some sense, it is not about “winning” at all. It is about equilibrium. Remember back when we were at the PSS stage where we thought that “Always play scissors” was a good strategy? What was wrong with this, of course, was that it induced the other player to switch his strategy toward “Always play stone”
(assuming, of course, that he has a scientist on his consulting staff). And that shift on his part induces (assuming we have a scientist too) us to switch toward paper.
So, how is this motion brought to a halt? Well, there is one particular strategy you can choose that at least removes the motivation for the motion. There is one particular mixed strategy which makes your opponent not really care what he plays. And there is one particular mixed strategy that your opponent can play which makes you not really care what you play. So, if you both make each other indifferent, then neither of you has any particular incentive to stop making each other indifferent, so you both just stick to the strategy you are currently playing.
This is called Nash equilibrium. It also works on non-zero sum games where the two players’ interests are not completely misaligned. The decision theory at the heart of Game Theory—the source of eight economics Nobel prizes so far—is not trying to “win”. Instead, it is trying to stop the other player from squirming so much as he tries to win. Swear to God. That is the way it works.
Alright, in the last paragraph, I was leaning over backward to make it look weird. But the thing is, even though you no longer look like you are trying to win, you still actually do as well as possible, assuming both players are rational. Game theory works. It is the right decision theory for the kinds of decisions that fit into its model.
So, was this long parable useful in our current search for “the best decision theory”? I guess the answer to that must depend on exactly what you want a decision theory to accomplish. My intuition is that Lessons #1 through #4 above cannot be completely irrelevant. But I also think that there is a Lesson 5 that arises from the Nash equilibrium finale of this story. The lesson is: In any optimization problem with a multi-party optimization dynamics to it, you have to look for the fixpoints.
There’s probably no single-player decision theory that, if all players adopted it, would lead to Nash equilibrium play in all games. The reason is that many games have multiple Nash equilibria, and equilibrium selection (aka bargaining) is often “indeterminate”: it requires you to go outside the game and look at the real-world situation that generated it.
Here on LW we know how to implement “optimal” agents, who cooperate with each other and share the wins “fairly” while punishing defectors, in only two cases: symmetric games (choosing the Pareto-best symmetric correlated equilibrium), and games with transferable utility (using the Shapley value). The general case of non-symmetric games with non-transferable utility is still open. I’m very skeptical that any single-player decision theory can solve it, and have voiced my skepticism many times.
It has been a while since I looked at Aumann’s Handbook, and I don’t have access to a copy now, but I seem to recall discussion of an NTU analog of the Shapley value. Ah, I also find it in Section 9.9 of Myerson’s textbook. Perhaps the problem is they don’t collectively voluntarily punish defectors quite as well you would like them to. I’m also puzzled by your apparent restriction of correlated equilibria to symmetric games. You realize, of course, that symmetry is not a requirement for a correlated equilibrium in a two-person game?
It wasn’t my intention, at least in this posting, to advocate standard Game Theory as the solution to the FAI decision theory question. I am not at all sure I understand what that question really is. All I am doing here is pointing out the analogy between the “best play” problem and the “best decision theory” metaproblem.
I’m also puzzled by your apparent restriction of correlated equilibria to symmetric games. You realize, of course, that symmetry is not a requirement for a correlated equilibrium in a two-person game?
Yes, I realize that. The problem lies elsewhere. When you pit two agents using the same “good” decision theory against each other in a non-symmetric game, some correlated play must result. But which one? Do you have a convention for selecting the “best” correlated equilibrium in an arbitrary non-symmetric game? Because your “good” algorithm (assuming it exists) will necessarily give rise to just such a convention.
About values for NTU games: according to my last impressions, the topic was a complete mess and there was no universally agreed-upon value. Unlike the TU case, there seems to be a whole zoo of competing NTU values with different axiomatic justifications. Maybe our attempts to codify “good” algorithms will someday cut through this mess, but I don’t yet see how.
Do you have a convention for selecting the “best” correlated equilibrium in an arbitrary non-symmetric game?
What is wrong with the Nash bargaining solution (with threats)? Negotiating an acceptable joint equilibrium is a cooperative game. It is non-cooperatively enforceable because you limit yourself to only correlated equilibria rather than the full Pareto set of joint possibilities.
I must be missing something. You are allowing communication and (non-binding) arbitration, aren’t you? And a jointly trusted source of random numbers.
Um, maybe it’s me who’s missing something. Does the Nash bargaining solution uniquely solve all games? How do you choose the “disagreement point” used for defining the solution, if the game has multiple noncooperative equilibria? Sorry if I’m asking a stupid question.
Nash’s 1953 paper covers that, I think. Just about any game theory text should explain. Look in the index for “threat game”. In fact, Googling on the string “Nash bargaining threat game” returns a host of promising-looking links.
Of course, when you go to extend this 2-person result to coalition games, it gets even more complicated. In effect, the Shapley value is a weighted average of values for each possible coalition structure, with the division of spoils and responsibilities within each coalition also being decided by bargaining. The thing is, I don’t see any real justification for the usual convention of giving equal weights to each possible coalition. Some coalitions seem more natural to me than others—one most naturally joins the coalitions with which one communicates best, over which one has the most power to reward and punish, and which has the most power over oneself. But I’m not sure exactly how this fits into the math. Probably a Rubinstein-style answer could be worked out within the general framework of Nash’s program.
Well, sorry. You were right all along and I’m a complete idiot. For some reason my textbook failed to cover that, and I never stumbled on that anywhere else.
Does this paper have what you’re looking for? I’m not in the office, so can’t read it at the moment—and might not be able to anyway, as my university’s subscriptions tend not to include lots of Science Direct journals—it does at least seem to provide one plausible answer to your question.
(no idea if that link will work—the paper is Bargained-Correlated Equilibria by Tedeschi Piero)
Thanks a lot. RobinZ sent me the paper and I read it. The key part is indeed the definition of the disagreement point, and the reasoning used to justify it is plausible. The only sticky issue is that the disagreement point defined in the paper is unattainable; I’m not sure what to think about that, and not sure whether the disagreement point must be “fair” with respect to the players.
The “common priors” property used in the paper gave me the idea that optimal play can arise via Aumann agreement, which in turn can emerge from individually rational behavior! This is really interesting and I’ll have to think about it.
The abstract looks interesting, but I can’t access the paper because I’m a regular schmuck in Russia, not a student at a US university or something :-)
Which decision theory should we use? CDT? UDT? TDT? What exactly do we mean by a “better” decision theory?
To get some practice in answering this kind of question, lets look first at a simpler set of questions: Which play should I make in the game PSS? Paper? Stone? Scissors? What exactly do we mean by a better play in this game?
Bear with me on this. I think that a careful look at the process that game theorists went through in dealing with game-level questions may be very helpful in our current confusion about decision-theory-level questions.
The first obvious thing to notice about the PSS problem is that there is no universal “best” play in the game. Sometimes one play (“stone”, say) works best; sometimes another play works better. It depends on what the other player does. So we make our first conceptual breakthrough. We realize we have been working on the wrong problem. It is not “which play produces the best results?”. It is rather “which play produces the best expected results?” that we want to ask.
Well, we are still a bit puzzled by that new word “expected”, so we hire consultants. One consultant, a Bayesian/MAXENT theorist tells us that the appropriate expectation is that the other player will play each of “paper”, “stone”, and “scissors” equally often. And hence that all plays on our part are equally good. The second consultant, a scientist, actually goes out and observes the other player. He comes back with the report that out of 100 PSS games, the other player will play “paper” 35 times, “stone” 32 times, and “scissors” 33 times. So the scientist recommends the play “scissors” as the best play. Our MAXENT consultant has no objection. “That choice is no worse than any other”, says he.
So we adopt the strategy of always playing scissors, which works fine at first, but soon starts returning abysmal results. The MAXENT fellow is puzzled. “Do you think maybe the other guy found out about our strategy?” he asks. “Maybe he hired our scientist away from us. But how can we possibly keep our strategy secret if we use it more than once?” And this leads to our second conceptual breakthrough.
We realize that it is both impossible and unnecessary to keep our strategy secret (just as cryptographer knows that it is difficult and unnecessary to keep the encryption algorithm secret. But it is both possible and essential to keep the plays secret until they are actually made (just as a cryptographer keeps keys secret). Hence, we must have mixed strategies where the strategy is a probability distribution and a play is a one-point sample from that distribution.
Take a step back and think about this. Non-determinism of agents is an inevitable consequence of having multiple agents whose interests are not aligned (or more precisely, agents whose interests cannot be brought into alignment by a system of side payments). Lesson 1: Any decision theory intended to work in multi-agent situations must handle (i.e. model) non-determinism in other agents. Lesson 2: In many games, the best strategy is a mixed strategy.
Think some more. Agents whose interests are not aligned often should keep secrets from each other. Lesson 3: Decision theories must deal with secrecy. Lesson 4: Agents may lie to preserve secrets.
But how does game theory find the best mixed strategy? Here is where it gets weird. It turns out that, in some sense, it is not about “winning” at all. It is about equilibrium. Remember back when we were at the PSS stage where we thought that “Always play scissors” was a good strategy? What was wrong with this, of course, was that it induced the other player to switch his strategy toward “Always play stone” (assuming, of course, that he has a scientist on his consulting staff). And that shift on his part induces (assuming we have a scientist too) us to switch toward paper.
So, how is this motion brought to a halt? Well, there is one particular strategy you can choose that at least removes the motivation for the motion. There is one particular mixed strategy which makes your opponent not really care what he plays. And there is one particular mixed strategy that your opponent can play which makes you not really care what you play. So, if you both make each other indifferent, then neither of you has any particular incentive to stop making each other indifferent, so you both just stick to the strategy you are currently playing.
This is called Nash equilibrium. It also works on non-zero sum games where the two players’ interests are not completely misaligned. The decision theory at the heart of Game Theory—the source of eight economics Nobel prizes so far—is not trying to “win”. Instead, it is trying to stop the other player from squirming so much as he tries to win. Swear to God. That is the way it works.
Alright, in the last paragraph, I was leaning over backward to make it look weird. But the thing is, even though you no longer look like you are trying to win, you still actually do as well as possible, assuming both players are rational. Game theory works. It is the right decision theory for the kinds of decisions that fit into its model.
So, was this long parable useful in our current search for “the best decision theory”? I guess the answer to that must depend on exactly what you want a decision theory to accomplish. My intuition is that Lessons #1 through #4 above cannot be completely irrelevant. But I also think that there is a Lesson 5 that arises from the Nash equilibrium finale of this story. The lesson is: In any optimization problem with a multi-party optimization dynamics to it, you have to look for the fixpoints.
There’s probably no single-player decision theory that, if all players adopted it, would lead to Nash equilibrium play in all games. The reason is that many games have multiple Nash equilibria, and equilibrium selection (aka bargaining) is often “indeterminate”: it requires you to go outside the game and look at the real-world situation that generated it.
Here on LW we know how to implement “optimal” agents, who cooperate with each other and share the wins “fairly” while punishing defectors, in only two cases: symmetric games (choosing the Pareto-best symmetric correlated equilibrium), and games with transferable utility (using the Shapley value). The general case of non-symmetric games with non-transferable utility is still open. I’m very skeptical that any single-player decision theory can solve it, and have voiced my skepticism many times.
It has been a while since I looked at Aumann’s Handbook, and I don’t have access to a copy now, but I seem to recall discussion of an NTU analog of the Shapley value. Ah, I also find it in Section 9.9 of Myerson’s textbook. Perhaps the problem is they don’t collectively voluntarily punish defectors quite as well you would like them to. I’m also puzzled by your apparent restriction of correlated equilibria to symmetric games. You realize, of course, that symmetry is not a requirement for a correlated equilibrium in a two-person game?
It wasn’t my intention, at least in this posting, to advocate standard Game Theory as the solution to the FAI decision theory question. I am not at all sure I understand what that question really is. All I am doing here is pointing out the analogy between the “best play” problem and the “best decision theory” metaproblem.
Yes, I realize that. The problem lies elsewhere. When you pit two agents using the same “good” decision theory against each other in a non-symmetric game, some correlated play must result. But which one? Do you have a convention for selecting the “best” correlated equilibrium in an arbitrary non-symmetric game? Because your “good” algorithm (assuming it exists) will necessarily give rise to just such a convention.
About values for NTU games: according to my last impressions, the topic was a complete mess and there was no universally agreed-upon value. Unlike the TU case, there seems to be a whole zoo of competing NTU values with different axiomatic justifications. Maybe our attempts to codify “good” algorithms will someday cut through this mess, but I don’t yet see how.
What is wrong with the Nash bargaining solution (with threats)? Negotiating an acceptable joint equilibrium is a cooperative game. It is non-cooperatively enforceable because you limit yourself to only correlated equilibria rather than the full Pareto set of joint possibilities.
I must be missing something. You are allowing communication and (non-binding) arbitration, aren’t you? And a jointly trusted source of random numbers.
Um, maybe it’s me who’s missing something. Does the Nash bargaining solution uniquely solve all games? How do you choose the “disagreement point” used for defining the solution, if the game has multiple noncooperative equilibria? Sorry if I’m asking a stupid question.
Nash’s 1953 paper covers that, I think. Just about any game theory text should explain. Look in the index for “threat game”. In fact, Googling on the string “Nash bargaining threat game” returns a host of promising-looking links.
Of course, when you go to extend this 2-person result to coalition games, it gets even more complicated. In effect, the Shapley value is a weighted average of values for each possible coalition structure, with the division of spoils and responsibilities within each coalition also being decided by bargaining. The thing is, I don’t see any real justification for the usual convention of giving equal weights to each possible coalition. Some coalitions seem more natural to me than others—one most naturally joins the coalitions with which one communicates best, over which one has the most power to reward and punish, and which has the most power over oneself. But I’m not sure exactly how this fits into the math. Probably a Rubinstein-style answer could be worked out within the general framework of Nash’s program.
Well, sorry. You were right all along and I’m a complete idiot. For some reason my textbook failed to cover that, and I never stumbled on that anywhere else.
(reads paper, goes into a corner to think)
Does this paper have what you’re looking for? I’m not in the office, so can’t read it at the moment—and might not be able to anyway, as my university’s subscriptions tend not to include lots of Science Direct journals—it does at least seem to provide one plausible answer to your question.
(no idea if that link will work—the paper is Bargained-Correlated Equilibria by Tedeschi Piero)
Thanks a lot. RobinZ sent me the paper and I read it. The key part is indeed the definition of the disagreement point, and the reasoning used to justify it is plausible. The only sticky issue is that the disagreement point defined in the paper is unattainable; I’m not sure what to think about that, and not sure whether the disagreement point must be “fair” with respect to the players.
The “common priors” property used in the paper gave me the idea that optimal play can arise via Aumann agreement, which in turn can emerge from individually rational behavior! This is really interesting and I’ll have to think about it.
The abstract looks interesting, but I can’t access the paper because I’m a regular schmuck in Russia, not a student at a US university or something :-)
I have it. PM me with email address for PDF.
Done!
ETA: and received. Thanks!