In There are no coherence theorems EJT defines a “coherence theorem” as a theorem that proves maximizing expected utility is the only way to play a game that is not dominated. Ie, any strategy that is not a utility maximisation could be made strictly better. I will make the underlying claim more precise, at least according to my intuition, and prove that it is false in general. I will then discuss what variations of the claim we could think off.
There are no coherence theorems was mostly a list of many pre-existing results, none of which proved the underlying claim. The term “coherence theorem” itself gave rise to some discussions and might have been unfortunate. But I agree that something like the following claim has been floating around, more or less implicitely, in multiple documents I read and discussions I had:
Intuitive claim
Under very little general modelisation assumption, it is in the best interest of any agent playing a game to act as a bayesian utility maximizer. If it is not doing so, the agent could become strictly better by becoming a utility maximizer.
If this is true then I find it important, at least somewhat. And if it is false I also wish to know it.
An issue with this is that as it stands the claim is too vague to be disproven. There was some discussion in EJT’s post claiming that the result he sought was a trivial corollary of the complete class theorem. But even then I wasn’t sure what the exact claim was.
In this post I will:
Give a more precise model for the kind of games we are speaking about and use it to state a conjecture, relfecting the intuitive claim in the general case.
Disprove this conjecture with a counterexample.
Discuss what alternative precise theorems we might seek to prove or find instead.
LaTEX document with a more formal model
This post will avoid using too much math. For a more formal discussion of the same ideas see the draft LaTEX document.
Iterative game model and conjecture
The conjecture bellow is my attempt at translating the intuitive claim to a more precise framework. I will now state that conjecture and then spend the rest of this section on defining the terms that I use.
Conjecture
In an iterative game, any strategy that is not a utility maximisation is dominated by some other strategy.
We model games with a single player that are played for an infinity of turns. Because I need something to call them, for the duration of this post I will use “iterative game” to refer to the kind of game I describe.
The player knows the rules of the game. All of his uncertainty is described by a hidden state w which takes its value in a set of all possible states Ω.
For example if the game needs to roll dice, the result of all the dices that will be rolled is included in w.
At the begining of each turn the player receives information that depends on w and on the actions he previously chose.
This information helps him guess the value of w with more accuracy. This process is repeated an infinity of times, which defines an infinite series of actions u.
An outcome is then decided by a function of w and u.
The outcome takes its value in a set of all outcomes O. Not all outcomes can be compared but the set of outcomes is partially ordered.
This means that some outcomes can be preferred to others and that the “being preferred” relation satisfies the following properties:
reflexivity
antisymmetry
transitivity
Let look at an example of iterative game:
A dice roll gives a number between 1 and 20. You are given the unit digit of this roll but not the full number (so if the roll was a 12, you know the second digit was a 2, but you don’t know whether the roll was a 2 or a 12).
You can decide to pass or to bet 1 point. If you bet then a second dice is rolled and the result is compared to the first. If it is higher you gain 1 point. Otherwise you lose 1 point.
This game wouldn’t be very interesting to play, but let’s try to model it.
The set of states Ω is the set of all pairs of numbers between 1 and 20. For example it contains (1,2), (12,1), and (20,15).
The set of outcomes O is {−1,0,+1}, with the obvious order −1<0<+1.
The only action taken by the player that matters is the first, which can be BET or PASS. On every ulterior turn the only legal action is PASS_TURN. This allows to model a finite game with our general model of infinite games.
Finally, the outcome is decided as follows:
If the player chose PASS on the first turn the outcome is 0.
If the player chose BET and w is of the form (a,b) with a>b the outcome is −1.
If the player chose BET and w is of the form (a,b) with a≤b the outcome is +1.
Hopefully this is enough to make the base game model clear. If you want to read a more mathematically formal presentation you can look at the pdf.
Finally we need to speak about strategies.
A strategy is a function that decides an action depending on an information state and a history. Note that we assume the player has a perfect memory. This notably means he remembers all his past actions.
If we know the player’s strategy and the hidden state w then we know how the entire game run will go (we know all the actions that will be played and the associated outcome).
A strategy s1dominates a strategy s2 if and only if it gives a better (or equal) outcome on all possible hidden states w in Ω and gives a strictly better outcome in at least one case.
A strategy s is a utility maximisation for some probability P and utility U if:
P is a probability function on Ω (we can say it is a prior on the hidden state)
U is a utility function on O that gives more utility to o1 than to o2 whenever o1 is preferred to o2
s is the strategy that gives the best expected utility for P and U, among all strategies
This concludes the definition of all that was needed for conjecture to make sense. I will now show with a counterexample that the conjecture is false. Alternative modelisation choices that could have been made will be discussed at the end of the post.
Counterexample
Take Ω=N and O={−1,0,+1}, with −1≺0≺+1. We write w the (unknown) hidden state (w∈Ω).
The game is as follows:
On the first turn you decide either PASS or PLAY. If you decide PASS you drop out of the game and get outcome 0 in all cases.
Otherwise on each turn you have two actions: STOP or INCREMENT. Once you choose STOP the game ends (all future actions are irrelevant) and your total choice number T is equal to the number of times you picked INCREMENT. If you never pick STOP, T is arbitrarily set to 0.
If T=w you lose and get −1, otherwise you win and get +1.
Remark
Nothing says how bad it is to get −1 or how good it is to get +1. Maybe −1 stands for “lose everything and die” and +1 for “get a cookie”. The only constraint defined by the game is that getting +1 is better than getting 0, which is better than −1.
This is all equivalent to you either refusing to bet (get outcome 0) or betting and choosing a number (any integer). When betting, if the integer you picked is equal to the hidden state you lose and get −1; otherwise you win and get +1.
The available strategies boil down to s:”refuse to play” or strategies of the form sn:”play and pick n”.
In this game the strategy s:”refuse to play” is not dominated. I will show it is not a utility maximization.
Proof
Let’s assume that it is for some probability function P and utility U. We write a=U(+1)−U(0) and b=U(0)−U(−1). We only know that they are strictly positive.
Because Ω is infinite there are values n∈Ω with arbitrarily small values for P({n}).
So there is some n∈Ω such that P({n})b<(1−P({n}))a.
The expected difference between s and the strategy sn:”pick number n” is (1−P({n}))a−P({n})b.
This is positive so sn has a better expected utility than s. qed.
Alternative hypothesis and other discussion
Different hypothesis for the outcome ordering
I assumed the existence of a partial order on O. This notably implies transitivity, which was somewhat controversial in the post “There are no coherence theorems”. EJT argued that only acyclicity came freely from dutch book arguments and not transitivity.
I personally find transitivity of preferences intuitive enough to be interested in results that take it for granted.
In any case, I think it is a good start to see what we can deduce if we assume transitivity of preference. If we find good results with that hypothesis then we can try to restrict assumptions further.
One turn games
We could restrict the conjecture to games that only last one turn. I.e by taking games for which no action except the first one has an impact on the outcome.
We can prove the theorem for these cases as an application of the complete class theorem.
But that restriction is too strong and not ambitious enough to cover the intuitive claim.
Bounded games
We could instead look at games for which there is some integer k such that all runs of the game last less than k turns. I do not know if the conjecture is true in that case. Maybe it can be reduce to the case of one turn games.
Finite but unbounded games
That restriction would cover games for which all runs are over for some k (they reach a point at which not further action can change the outcome), but without assuming that there is one integer k that applies to all possible runs.
The counterexample above applies to this category.
Call for results
I have searched for a result in the literature that would settle the question and so far I have found none. But I think it is entirely possible that I just missed it. Maybe it was pointless to provide my own take on a conjecture.
Please do tell if you know a result that either proves or disproves the intuitive claim. However I think the topic is at least nontrivial enough that such a result should be stated clearly, if it can be found.
Remark: gender of the hypothetical player determined by a coin toss.
Let’s look for coherence theorems
In There are no coherence theorems EJT defines a “coherence theorem” as a theorem that proves maximizing expected utility is the only way to play a game that is not dominated. Ie, any strategy that is not a utility maximisation could be made strictly better. I will make the underlying claim more precise, at least according to my intuition, and prove that it is false in general. I will then discuss what variations of the claim we could think off.
There are no coherence theorems was mostly a list of many pre-existing results, none of which proved the underlying claim. The term “coherence theorem” itself gave rise to some discussions and might have been unfortunate. But I agree that something like the following claim has been floating around, more or less implicitely, in multiple documents I read and discussions I had:
Intuitive claim
If this is true then I find it important, at least somewhat. And if it is false I also wish to know it. An issue with this is that as it stands the claim is too vague to be disproven. There was some discussion in EJT’s post claiming that the result he sought was a trivial corollary of the complete class theorem. But even then I wasn’t sure what the exact claim was.
In this post I will:
Give a more precise model for the kind of games we are speaking about and use it to state a conjecture, relfecting the intuitive claim in the general case.
Disprove this conjecture with a counterexample.
Discuss what alternative precise theorems we might seek to prove or find instead.
LaTEX document with a more formal model
This post will avoid using too much math. For a more formal discussion of the same ideas see the draft LaTEX document.
Iterative game model and conjecture
The conjecture bellow is my attempt at translating the intuitive claim to a more precise framework. I will now state that conjecture and then spend the rest of this section on defining the terms that I use.
Conjecture
We model games with a single player that are played for an infinity of turns. Because I need something to call them, for the duration of this post I will use “iterative game” to refer to the kind of game I describe.
The player knows the rules of the game. All of his uncertainty is described by a hidden state w which takes its value in a set of all possible states Ω. For example if the game needs to roll dice, the result of all the dices that will be rolled is included in w. At the begining of each turn the player receives information that depends on w and on the actions he previously chose. This information helps him guess the value of w with more accuracy. This process is repeated an infinity of times, which defines an infinite series of actions u. An outcome is then decided by a function of w and u.
The outcome takes its value in a set of all outcomes O. Not all outcomes can be compared but the set of outcomes is partially ordered. This means that some outcomes can be preferred to others and that the “being preferred” relation satisfies the following properties:
reflexivity
antisymmetry
transitivity
Let look at an example of iterative game:
This game wouldn’t be very interesting to play, but let’s try to model it.
Hopefully this is enough to make the base game model clear. If you want to read a more mathematically formal presentation you can look at the pdf.
Finally we need to speak about strategies.
A strategy is a function that decides an action depending on an information state and a history. Note that we assume the player has a perfect memory. This notably means he remembers all his past actions. If we know the player’s strategy and the hidden state w then we know how the entire game run will go (we know all the actions that will be played and the associated outcome).
A strategy s1 dominates a strategy s2 if and only if it gives a better (or equal) outcome on all possible hidden states w in Ω and gives a strictly better outcome in at least one case.
A strategy s is a utility maximisation for some probability P and utility U if:
P is a probability function on Ω (we can say it is a prior on the hidden state)
U is a utility function on O that gives more utility to o1 than to o2 whenever o1 is preferred to o2
s is the strategy that gives the best expected utility for P and U, among all strategies
This concludes the definition of all that was needed for conjecture to make sense. I will now show with a counterexample that the conjecture is false. Alternative modelisation choices that could have been made will be discussed at the end of the post.
Counterexample
Take Ω=N and O={−1,0,+1}, with −1≺0≺+1. We write w the (unknown) hidden state (w∈Ω). The game is as follows:
On the first turn you decide either PASS or PLAY. If you decide PASS you drop out of the game and get outcome 0 in all cases. Otherwise on each turn you have two actions: STOP or INCREMENT. Once you choose STOP the game ends (all future actions are irrelevant) and your total choice number T is equal to the number of times you picked INCREMENT. If you never pick STOP, T is arbitrarily set to 0. If T=w you lose and get −1, otherwise you win and get +1.
Remark Nothing says how bad it is to get −1 or how good it is to get +1. Maybe −1 stands for “lose everything and die” and +1 for “get a cookie”. The only constraint defined by the game is that getting +1 is better than getting 0, which is better than −1.
This is all equivalent to you either refusing to bet (get outcome 0) or betting and choosing a number (any integer). When betting, if the integer you picked is equal to the hidden state you lose and get −1; otherwise you win and get +1.
The available strategies boil down to s:”refuse to play” or strategies of the form sn:”play and pick n”.
In this game the strategy s:”refuse to play” is not dominated. I will show it is not a utility maximization.
Proof Let’s assume that it is for some probability function P and utility U. We write a=U(+1)−U(0) and b=U(0)−U(−1). We only know that they are strictly positive. Because Ω is infinite there are values n∈Ω with arbitrarily small values for P({n}).
So there is some n∈Ω such that P({n})b<(1−P({n}))a.
The expected difference between s and the strategy sn:”pick number n” is (1−P({n}))a−P({n})b.
This is positive so sn has a better expected utility than s. qed.
Alternative hypothesis and other discussion
Different hypothesis for the outcome ordering
I assumed the existence of a partial order on O. This notably implies transitivity, which was somewhat controversial in the post “There are no coherence theorems”. EJT argued that only acyclicity came freely from dutch book arguments and not transitivity.
I personally find transitivity of preferences intuitive enough to be interested in results that take it for granted. In any case, I think it is a good start to see what we can deduce if we assume transitivity of preference. If we find good results with that hypothesis then we can try to restrict assumptions further.
One turn games
We could restrict the conjecture to games that only last one turn. I.e by taking games for which no action except the first one has an impact on the outcome. We can prove the theorem for these cases as an application of the complete class theorem. But that restriction is too strong and not ambitious enough to cover the intuitive claim.
Bounded games
We could instead look at games for which there is some integer k such that all runs of the game last less than k turns. I do not know if the conjecture is true in that case. Maybe it can be reduce to the case of one turn games.
Finite but unbounded games
That restriction would cover games for which all runs are over for some k (they reach a point at which not further action can change the outcome), but without assuming that there is one integer k that applies to all possible runs. The counterexample above applies to this category.
Call for results
I have searched for a result in the literature that would settle the question and so far I have found none. But I think it is entirely possible that I just missed it. Maybe it was pointless to provide my own take on a conjecture.
Please do tell if you know a result that either proves or disproves the intuitive claim. However I think the topic is at least nontrivial enough that such a result should be stated clearly, if it can be found.
Remark: gender of the hypothetical player determined by a coin toss.