Anthropic Decision Theory V: Linking and ADT

A near-final version of my Anthropic Decision Theory paper is available on the arXiv. Since anthropics problems have been discussed quite a bit on this list, I’ll be presenting its arguments and results in this, subsequent, and previous posts 1 2 3 4 5 6.

Now that we’ve seen what the ‘correct’ decision is for various Sleeping Beauty Problems, let’s see a decision theory that reaches the same conclusions.

Linked decisions

Identical copies of Sleeping Beauty will make the same decision when faced with same situations (technically true until quantum and chaotic effects cause a divergence between them, but most decision processes will not be sensitive to random noise like this). Similarly, Sleeping Beauty and the random man on the street will make the same decision when confronted with a twenty pound note: they will pick it up. However, while we could say that the first situation is linked, the second is coincidental: were Sleeping Beauty to refrain from picking up the note, the man on the street would not so refrain, while her copy would.

The above statement brings up subtle issues of causality and counterfactuals, a deep philosophical debate. To sidestep it entirely, let us recast the problem in programming terms, seeing the agent’s decision process as a deterministic algorithm. If agent α is an agent that follows an automated decision algorithm A, then if A knows its own source code (by quining for instance), it might have a line saying something like:

Module M: If B is another algorithm, belonging to agent β, identical with A (‘yourself’), assume A and B will have identical outputs on identical inputs, and base your decision on this.

This could lead, for example, to α and β cooperating in a symmetric Prisoner’s Dilemma. And there is no problem with A believing the above assumption, as it is entirely true: identical deterministic algorithms on the same input do produce the same outputs. With this in mind, we give an informal definition of a linked decision as:

Linked decisions: Agent α′s decisions are linked with agent β′s, if both can prove they will both make the same decision, even after taking into account the fact they know they are linked.

An example of agents that are not linked would be two agents α and β, running identical algorithms A and B on identical data, except that A has module M while B doesn’t. Then A’s module might correctly deduce that they will output the same decision, but only if A disregards the difference between them, i.e.module M. So A can ‘know’ they will output the same decision, but if it acts on that knowledge, it makes it incorrect. If A and B both had module M, then they could both act on the knowledge and it would remain correct.

ADT

Given the above definition, anthropic decision theory (ADT) can be simply stated as:

Anthropic Decision Theory (ADT): An agent should first find all the decisions linked with their own. Then they should maximise expected utility, acting as if they simultaneously controlled the outcomes of all linked decisions, and using the objective (non-anthropic) probabilities of the various worlds.

ADT is similar to SSA in that it makes use of reference classes. However, SSA needs to have the reference class information established separately before it can calculate probabilities, and different reference classes give very different results. In contrast, the reference class for ADT is part of the definition. It is not the class of identical or similar agents; instead, it is the class of linked decisions which (by definition) is the class of decisions that the agent can prove are linked. Hence the whole procedure is perfectly deterministic, and known for a given agent.

It can be seen that ADT obeys all the axioms in the Sleeping Beauty problems, so must reach the same conclusions as there.

Linking non-identical agents

Now, module M is enough when the agents/​algorithms are strictly identical, but fails when they differ slightly. For instance, imagine a variant of the selfless Sleeping Beauty problem where the two agents aren’t exactly identical in tails world. The first agent has the same utility as before, while the second agent has some personal displeasure in engaging in trade—if she buys the coupon, she will suffer a single -£0.05 penalty for doing so.

Then if the coupon is priced at £0.60, something quite interesting happens. If the agents do not believe they are linked, they will refuse the offer: their expected returns are 0.5(-0.6 + (1-0.6)) = −0.1 and −0.1-0.05=-0.15 respectively. If however they believe their decisions are linked, they will calculate the expected return from buying the coupon as 0.5 (-0.60 + 2(1-0.60)) = 0.1 and 0.1-0.05 = 0.05 respectively. Since these are positive, they will buy the coupon: meaning their assumption that they were linked was actually correct!

If the coupon is priced at £0.66, things change. If the two agents assume their decisions are linked, then they will calculate their expected return from buying the coupon as 0.5(-0.66 + 2(1-0.66))= 0.01 and 0.01-0.05=-0.04 respectively. The first agent will buy, and the second will not—they were wrong to assume they were linked

A more general module that gives this kind of behaviour is:

Module N: Let H be the hypothesis that the decision of A (‘myself’) and those of algorithm B are linked. I will then compute what each of us will decide if we were both to accept H. If our ultimate decisions are indeed the same, and if the other agent also has a module N, then I will accept H.

The module N gives correct behaviour. It only triggers if the agents can prove that accepting H will ensure that H is true—and then N makes them accept H, hence making H true.

For the coupon priced at £0.60, it will correctly tell them they are linked, and they will both buy it. For the coupon priced at £0.66, it will not trigger, and both will refuse to buy it—though they reach the same decision, they will not have done so if they had assumed they were linked. For a coupon priced above £2/​3, module N will correctly tell them are linked again, and they will both refuse to buy it.