Example decision theory problem: “Agent simulates predictor”

Some peo­ple on LW have ex­pressed in­ter­est in what’s hap­pen­ing on the de­ci­sion-the­ory-work­shop mailing list. Here’s an ex­am­ple of the kind of work we’re try­ing to do there.

In April 2010 Gary Drescher pro­posed the “Agent simu­lates pre­dic­tor” prob­lem, or ASP, that shows how agents with lots of com­pu­ta­tional power some­times fare worse than agents with limited re­sources. I’m post­ing it here with his per­mis­sion:

There’s a ver­sion of New­comb’s Prob­lem that poses the same sort of challenge to UDT that comes up in some multi-agent/​game-the­o­retic sce­nar­ios.

Sup­pose:

  • The pre­dic­tor does not run a de­tailed simu­la­tion of the agent, but re­lies in­stead on a high-level un­der­stand­ing of the agent’s de­ci­sion the­ory and com­pu­ta­tional power.

  • The agent runs UDT, and has the abil­ity to fully simu­late the pre­dic­tor.

Since the agent can de­duce (by low-level simu­la­tion) what the pre­dic­tor will do, the agent does not re­gard the pre­dic­tion out­come as con­tin­gent on the agent’s com­pu­ta­tion. In­stead, ei­ther pre­dict-onebox or pre­dict-twobox has a prob­a­bil­ity of 1 (since one or the other of those is de­ducible), and a prob­a­bil­ity of 1 re­mains the same re­gard­less of what we con­di­tion on. The agent will then calcu­late greater util­ity for two-box­ing than for one-box­ing.

Mean­while, the pre­dic­tor, know­ing that the the agent runs UDT and will fully simu­late the pre­dic­tor, can rea­son as in the pre­ced­ing para­graph, and thus de­duce that the agent will two-box. So the large box is left empty and the agent two-boxes (and the agent’s de­tailed simu­la­tion of the pre­dic­tor cor­rectly shows the pre­dic­tor cor­rectly pre­dict­ing two-box­ing).

The agent would be bet­ter off, though, run­ning a differ­ent de­ci­sion the­ory that does not two-box here, and that the pre­dic­tor can de­duce does not two-box.

About a month ago I came up with a way to for­mal­ize the prob­lem, along the lines of my other for­mal­iza­tions:

a) The agent gen­er­ates all proofs of length up to M, then picks the ac­tion for which the great­est util­ity was proven.

b) The pre­dic­tor gen­er­ates all proofs of length up to N which is much less than M. If it finds a prov­able pre­dic­tion about the agent’s ac­tion, it fills the boxes ac­cord­ingly. Also the pre­dic­tor has an “epistemic ad­van­tage” over the agent: its proof sys­tem has an ax­iom say­ing the agent’s proof sys­tem is con­sis­tent.

Now the pre­dic­tor can rea­son as fol­lows. It knows that the agent will find some proof that the pre­dic­tor will put X dol­lars in the sec­ond box, for some un­known value of X, be­cause the agent has enough time to simu­late the pre­dic­tor. There­fore, it knows that the agent will find proofs that one-box­ing leads to X dol­lars and two-box­ing leads to X+1000 dol­lars. Now what if the agent still chooses one-box­ing in the end? That means it must have found a differ­ent proof say­ing one-box­ing gives more than X+1000 dol­lars. But if the agent ac­tu­ally one-boxes, the ex­is­tence of these two differ­ent proofs would im­ply that the agent’s proof sys­tem is in­con­sis­tent, which the pre­dic­tor knows to be im­pos­si­ble. So the pre­dic­tor ends up pre­dict­ing that the agent will two-box, the agent two-boxes, and ev­ery­body loses.

Also Wei Dai has a ten­ta­tive new de­ci­sion the­ory that solves the prob­lem, but this mar­gin (and my brain) is too small to con­tain it :-)

Can LW gen­er­ate the kind of in­sights needed to make progress on prob­lems like ASP? Or should we keep work­ing as a small clique?