Absent-Minded Driver and Self-Locating Probabilities

Link post

The absent-minded driver problem is this:

Fig 1. (Taken from The Absent-Minded Driver by Robert J. Aumann, Sergiu Hart, and Motty Perry)

“An absent-minded driver starts driving at START in Figure 1. At X he can either EXIT and get to A (for a payoff of 0) or CONTINUE to Y. At Y he can either EXIT and get to B (payoff 4), or CONTINUE to C (payoff 1). The essential assumption is that he cannot distinguish between intersections X and Y, and cannot remember whether he has already gone through one of them.”

It can be easily shown the best strategy is to CONTINUE with a probability of 23 at an intersection. i.e. let p be the probability of CONTINUE at an intersection, the payoff would be 4p(1-p)+p2 which is maximized when p=2/​3.

Yet, there is a paradox. Imagine you are the driver. When arrived at an intersection, you could assign a certain probability for “here is X”. Let this probability be α. Now the payoff function would be α[4(1-p)p+p2] +(1-α)[4(1-p)+p]. Maximizing it would cause p to be different from 23, no matter how one chooses to determine the value of α. Except of course if α=1. i.e. the probability of “here at X” is 100%, which is impossible to justify.

For example, a common approach is to let α=1/​(1+p). The reason being the driver will always pass X with a probability of 1 and pass Y with a probability of p, ergo the probability of “here is X” should be the relative weight of the two. Using this value for α, the payoff is function is maximized when p=4/​9. Different from 23.

Aumman’s Answer

Aumann, Hart, and Perry disagree with the above reasoning and deny the problem presents any paradox. First, they make a distinction between planning optimal and action optimal. The original p=2/​3 is regarded as the planning optimal. The rational decision for the driver when arrived at an intersection is dubbed the action optimal. The action optimal is derived with the following observations:

  1. The decision the driver makes only determines his action at that particular intersection. It does not affect the action at the other intersection.

  2. The driver is aware he will make (has made) an identical decision at the other intersection too.

I will skip the details (Eliezer Yudkowsky made a post about it here). Their method of determining the action optimal is essentially treating it as a game of coordination between “me at this intersection” and “me at the other intersection”. The result is finding a stable point of the payoff function, similar to a Nash equilibrium which indeed gives p=2/​3. Furthermore, the planning optimal will always be an action optimal.

However, problems still exist when there are more than one action optimals. For example:

Figure 2. taken from the same source as before. In

For this problem, the planning optimal is p=0. Always EXIT at an intersection would give the highest payoff of 7. However, there are other stable points aka action optimals. Notably, p=1/​2. In the planning stage, p=1/​2 would give a payoff of 6.5: ( as 1/​2*7+1/​4*0+1/​8*22+1/​8*2). Yet in the action stage, its payoff changes to 507. (To see this, notice for p=1/​2 the probability of that “Here is X/​Y/​Z” is 47; 27 and 17 respectively. So the payoff is 4/​7*(1/​2*7+1/​4*0+1/​8*22+1/​8*2) + 2/​7*(1/​2*0+1/​4*22+1/​4*2) + 1/​7*(1/​2*22+1/​2+2) = 507) In the action stage, the payoff for p=1 is still 7, the value remains unchanged because for an always EXIT strategy the probability of “here is X” is indeed 100%. So why shouldn’t the driver disregard the planning optimal and choose to CONTINUE with p=1/​2 when arrived at an intersection, since it gives a higher expected payoff than always EXIT? (i.e. 507 > 7)

Aumman’s explanation is based on game theory. When there are multiple equilibrium points an actor cannot actively “choose” which one to take (observation point 1 reflects this). The equilibrium is determined by outside forces such as customs and traditions. In his belief, because of the absent-mindedness, the coordination between”the driver at this intersection” and “the driver at other intersections” can only be done in the planning stage. Which gives p=0, even though it gives a lower expected payoff at the action stage.

I find this explanation unconvincing. Even if a game of coordination is played between different persons, each person picking the stable point with the highest payoff is still a very plausible outcome. The given rationale for picking a lower payoff strategy at the action stage, i.e. “due to the absent-mindedness”, lacks compelling reason and seems ad-hoc. To me, the root of the problem is how can p=1/​2 be regarded as the highest payoff strategy at all? Clearly, even if somehow the driver at the three intersections does coordinate on p=1/​2, the average payoff would still only be 6.5. The payoff of 507 at the action stage could never be realized or varified. Furthermore, by Aumman’s reasoning, 507 being the highest payoff is never relevant to the decision anyway.

Another Problem Caused by Self-Locating Probability

I think this paradox is caused by accepting a probability for “here is X”. It is a self-locating probability. As I have argued in anthropic paradoxes, self-locating probabilities are not valid concepts. For an oversimplified summary, “here” is a location primitively understood from a particular perspective due to its immediacy to subjective experience. Because of its primitive nature, there is no way to define an inherent reference class nor assign any probability distribution.

This problem requires us to think from the specific perspective of the driver as he arrives at an intersection to comprehend “here”. So any notion of probability for “Here is X/​Y/​Z” is fallacious. As a result, the expected payoff calculation during the action stage (50/​7) is misguided. There is only one legitimate payoff function (the one at the planning stage). The driver should keep his original strategy when arriving at an intersection simply because there is no new information.

An Embarrassment for Double Halfers?

The invalidity of self-locating beliefs also resolves the major problem for double halferism in the sleeping beauty problem: why not do a Bayesian update upon learning today is Monday/​Tuesday. Because there is no probability for “today is Monday/​Tuesday” in the first place.

Micheal Titelbaum formulated an example to demonstrate double halfers cannot reasonably keep their belief about a coin yet to be tossed at 12 (objective chance). This experiment, dubbed “an embarrassment for double halfers”, is highly similar to the original sleeping beauty problem. Here the original coin is tossed after the first awakening on Monday afternoon. A second, inconsequential coin toss is added on Tuesday afternoon. Now after waking up in the experiment, answer the question “what is the probability that today’s coin toss will land on heads?”.

Titelbaum argues “today’s coin landed Heads” includes two possibilities: “today is Monday and the original coin landed heads” or “today is Tuesday and the new coin landed heads”. So double-halfers can either keep their answer of the original coin at 12 which will make the yet-to-be-tossed probability bigger than half, or say today’s coin has a probability of 12 and make the probability for the original coin smaller. Unless of course, they believe the probability for “today is Monday” is 100%, which is impossible to justify.

This format is similar to the absent-minded driver problem. It can be resolved the same way by recognizing “today” as a primitive concept, invalidating the self-locating probability. Nonetheless, I think it is an excellent example. It shows the typical explanation for double-halving, i.e. inventing new rules of update for self-locating information is not going to work. As long as self-locating probabilities are regarded as valid, double-halfer’s answers would inevitably deviate from objective chance.