Problematic Problems for TDT

A key goal of Less Wrong’s “ad­vanced” de­ci­sion the­o­ries (like TDT, UDT and ADT) is that they should out-perform stan­dard de­ci­sion the­o­ries (such as CDT) in con­texts where an­other agent has ac­cess to the de­cider’s code, or can oth­er­wise pre­dict the de­cider’s be­havi­our. In par­tic­u­lar, agents who run these the­o­ries will one-box on New­comb’s prob­lem, and so gen­er­ally make more money than agents which two-box. Slightly sur­pris­ingly, they may well con­tinue to one-box even if the boxes are trans­par­ent, and even if the pre­dic­tor Omega makes oc­ca­sional er­rors (a prob­lem due to Gary Drescher, which Eliezer has de­scribed as equiv­a­lent to “coun­ter­fac­tual mug­ging”). More gen­er­ally, these agents be­have like a CDT agent will wish it had pre-com­mit­ted it­self to be­hav­ing be­fore be­ing faced with the prob­lem.

How­ever, I’ve re­cently thought of a class of Omega prob­lems where TDT (and re­lated the­o­ries) ap­pears to un­der-perform com­pared to CDT. Im­por­tantly, these are prob­lems which are “fair”—at least as fair as the origi­nal New­comb prob­lem—be­cause the re­ward is a func­tion of the agent’s ac­tual choices in the prob­lem (namely which box or boxes get picked) and in­de­pen­dent of the method that the agent uses to choose, or of its choices on any other prob­lems. This con­trasts with clearly “un­fair” prob­lems like the fol­low­ing:

Discrim­i­na­tion: Omega pre­sents the usual two boxes. Box A always con­tains $1000. Box B con­tains noth­ing if Omega de­tects that the agent is run­ning TDT; oth­er­wise it con­tains $1 mil­lion.

So what are some fair “prob­le­matic prob­lems”?

Prob­lem 1: Omega (who ex­pe­rience has shown is always truth­ful) pre­sents the usual two boxes A and B and an­nounces the fol­low­ing. “Be­fore you en­tered the room, I ran a simu­la­tion of this prob­lem as pre­sented to an agent run­ning TDT. I won’t tell you what the agent de­cided, but I will tell you that if the agent two-boxed then I put noth­ing in Box B, whereas if the agent one-boxed then I put $1 mil­lion in Box B. Re­gard­less of how the simu­lated agent de­cided, I put $1000 in Box A. Now please choose your box or boxes.”

Anal­y­sis: Any agent who is them­selves run­ning TDT will rea­son as in the stan­dard New­comb prob­lem. They’ll prove that their de­ci­sion is linked to the simu­lated agent’s, so that if they two-box they’ll only win $1000, whereas if they one-box they will win $1 mil­lion. So the agent will choose to one-box and win $1 mil­lion.

How­ever, any CDT agent can just take both boxes and win $1001000. In fact, any other agent who is not run­ning TDT (e.g. an EDT agent) will be able to re-con­struct the chain of logic and rea­son that the simu­la­tion one-boxed and so box B con­tains the $1 mil­lion. So any other agent can safely two-box as well.

Note that we can mod­ify the con­tents of Box A so that it con­tains any­thing up to $1 mil­lion; the CDT agent (or EDT agent) can in prin­ci­ple win up to twice as much as the TDT agent.

Prob­lem 2: Our ever-re­li­able Omega now pre­sents ten boxes, num­bered from 1 to 10, and an­nounces the fol­low­ing. “Ex­actly one of these boxes con­tains $1 mil­lion; the oth­ers con­tain noth­ing. You must take ex­actly one box to win the money; if you try to take more than one, then you won’t be al­lowed to keep any win­nings. Be­fore you en­tered the room, I ran mul­ti­ple simu­la­tions of this prob­lem as pre­sented to an agent run­ning TDT, and de­ter­mined the box which the agent was least likely to take. If there were sev­eral such boxes tied for equal-low­est prob­a­bil­ity, then I just se­lected one of them, the one la­bel­led with the small­est num­ber. I then placed $1 mil­lion in the se­lected box. Please choose your box.”

Anal­y­sis: A TDT agent will rea­son that what­ever it does, it can­not have more than 10% chance of win­ning the $1 mil­lion. In fact, the TDT agent’s best re­ply is to pick each box with equal prob­a­bil­ity; af­ter Omega calcu­lates this, it will place the $1 mil­lion un­der box num­ber 1 and the TDT agent has ex­actly 10% chance of win­ning it.

But any non-TDT agent (e.g. CDT or EDT) can rea­son this through as well, and just pick box num­ber 1, so win­ning $1 mil­lion. By in­creas­ing the num­ber of boxes, we can en­sure that TDT has ar­bi­trar­ily low chance of win­ning, com­pared to CDT which always wins.


Some ques­tions:

1. Have these or similar prob­lems already been dis­cov­ered by TDT (or UDT) the­o­rists, and if so, is there a known solu­tion? I had a search on Less Wrong but couldn’t find any­thing ob­vi­ously like them.

2. Is the anal­y­sis cor­rect, or is there some sub­tle rea­son why a TDT (or UDT) agent would choose differ­ently from de­scribed?

3. If a TDT agent be­lieved (or had rea­son to be­lieve) that Omega was go­ing to pre­sent it with such prob­lems, then wouldn’t it want to self-mod­ify to CDT? But this seems para­dox­i­cal, since the whole idea of a TDT agent is that it doesn’t have to self-mod­ify.

4. Might such prob­lems show that there can­not be a sin­gle TDT al­gorithm (or fam­ily of prov­ably-linked TDT al­gorithms) so that when Omega says it is simu­lat­ing a TDT agent, it is quite am­bigu­ous what it is do­ing? (This ob­jec­tion would go away if Omega re­vealed the source-code of its simu­lated agent, and the source-code of the choos­ing agent; each par­tic­u­lar ver­sion of TDT would then be out-performed on a spe­cific match­ing prob­lem.)

5. Are these re­ally “fair” prob­lems? Is there some in­tel­ligible sense in which they are not fair, but New­comb’s prob­lem is fair? It cer­tainly looks like Omega may be “re­ward­ing ir­ra­tional­ity” (i.e. giv­ing greater gains to some­one who runs an in­fe­rior de­ci­sion the­ory), but that’s ex­actly the ar­gu­ment that CDT the­o­rists use about New­comb.

6. Fi­nally, is it more likely that Omegas—or things like them—will pre­sent agents with New­comb and Pri­soner’s Dilemma prob­lems (on which TDT suc­ceeds) rather than prob­le­matic prob­lems (on which it fails)?

Edit: I tweaked the ex­pla­na­tion of Box A’s con­tents in Prob­lem 1, since this was caus­ing some con­fu­sion. The idea is that, as in the usual New­comb prob­lem, Box A always con­tains $1000. Note that Box B de­pends on what the simu­lated agent chooses; it doesn’t de­pend on Omega pre­dict­ing what the ac­tual de­cid­ing agent chooses (so Omega doesn’t put less money in any box just be­cause it sees that the ac­tual de­cider is run­ning TDT).