# Example decision theory problem: “Agent simulates predictor”

Some peo­ple on LW have ex­pressed in­ter­est in what’s hap­pen­ing on the de­ci­sion-the­ory-work­shop mailing list. Here’s an ex­am­ple of the kind of work we’re try­ing to do there.

In April 2010 Gary Drescher pro­posed the “Agent simu­lates pre­dic­tor” prob­lem, or ASP, that shows how agents with lots of com­pu­ta­tional power some­times fare worse than agents with limited re­sources. I’m post­ing it here with his per­mis­sion:

There’s a ver­sion of New­comb’s Prob­lem that poses the same sort of challenge to UDT that comes up in some multi-agent/​game-the­o­retic sce­nar­ios.

Sup­pose:

• The pre­dic­tor does not run a de­tailed simu­la­tion of the agent, but re­lies in­stead on a high-level un­der­stand­ing of the agent’s de­ci­sion the­ory and com­pu­ta­tional power.

• The agent runs UDT, and has the abil­ity to fully simu­late the pre­dic­tor.

Since the agent can de­duce (by low-level simu­la­tion) what the pre­dic­tor will do, the agent does not re­gard the pre­dic­tion out­come as con­tin­gent on the agent’s com­pu­ta­tion. In­stead, ei­ther pre­dict-onebox or pre­dict-twobox has a prob­a­bil­ity of 1 (since one or the other of those is de­ducible), and a prob­a­bil­ity of 1 re­mains the same re­gard­less of what we con­di­tion on. The agent will then calcu­late greater util­ity for two-box­ing than for one-box­ing.

Mean­while, the pre­dic­tor, know­ing that the the agent runs UDT and will fully simu­late the pre­dic­tor, can rea­son as in the pre­ced­ing para­graph, and thus de­duce that the agent will two-box. So the large box is left empty and the agent two-boxes (and the agent’s de­tailed simu­la­tion of the pre­dic­tor cor­rectly shows the pre­dic­tor cor­rectly pre­dict­ing two-box­ing).

The agent would be bet­ter off, though, run­ning a differ­ent de­ci­sion the­ory that does not two-box here, and that the pre­dic­tor can de­duce does not two-box.

About a month ago I came up with a way to for­mal­ize the prob­lem, along the lines of my other for­mal­iza­tions:

a) The agent gen­er­ates all proofs of length up to M, then picks the ac­tion for which the great­est util­ity was proven.

b) The pre­dic­tor gen­er­ates all proofs of length up to N which is much less than M. If it finds a prov­able pre­dic­tion about the agent’s ac­tion, it fills the boxes ac­cord­ingly. Also the pre­dic­tor has an “epistemic ad­van­tage” over the agent: its proof sys­tem has an ax­iom say­ing the agent’s proof sys­tem is con­sis­tent.

Now the pre­dic­tor can rea­son as fol­lows. It knows that the agent will find some proof that the pre­dic­tor will put X dol­lars in the sec­ond box, for some un­known value of X, be­cause the agent has enough time to simu­late the pre­dic­tor. There­fore, it knows that the agent will find proofs that one-box­ing leads to X dol­lars and two-box­ing leads to X+1000 dol­lars. Now what if the agent still chooses one-box­ing in the end? That means it must have found a differ­ent proof say­ing one-box­ing gives more than X+1000 dol­lars. But if the agent ac­tu­ally one-boxes, the ex­is­tence of these two differ­ent proofs would im­ply that the agent’s proof sys­tem is in­con­sis­tent, which the pre­dic­tor knows to be im­pos­si­ble. So the pre­dic­tor ends up pre­dict­ing that the agent will two-box, the agent two-boxes, and ev­ery­body loses.

Also Wei Dai has a ten­ta­tive new de­ci­sion the­ory that solves the prob­lem, but this mar­gin (and my brain) is too small to con­tain it :-)

Can LW gen­er­ate the kind of in­sights needed to make progress on prob­lems like ASP? Or should we keep work­ing as a small clique?

• Can LW gen­er­ate the kind of in­sights needed to make progress on prob­lems like ASP? Or should we keep work­ing as a small clique?

Yes, es­pe­cially if matt adds a ‘de­ci­sion the­ory’ sub­red­dit. That way those from the clique with a nar­row in­ter­est in de­ci­sion the­ory can just fol­low that page in­stead. I know I would be more likely to en­gage with the top­ics on a red­dit style medium than via email. Email con­ver­sa­tions are just a lot harder to fol­low. Espe­cially given that, let’s be hon­est, the ‘small clique’ is barely ac­tive and can go for weeks or a month be­tween email.

Hav­ing less­wrong ex­posed to peo­ple who are ac­tively think­ing through and solv­ing a difficult prob­lem would be an over­whelm­ingly good in­fluence on the less­wrong com­mu­nity and will ex­pose po­ten­tial new thinkers, hope­fully in­spiring some of them to get in­volved in the prob­lem solv­ing them­selves. There will, of course, be a low­er­ing of the av­er­age stan­dard of the dis­cus­sion but I would ex­pect a net in­crease in high qual­ity con­tri­bu­tions as well. Heavy up­vot­ing of the clear think­ing and com­men­su­rate down­vot­ing would make new in­sights ac­cessible.

(Mind you there is an ob­vi­ous ad­van­tage to hav­ing de­ci­sion the­ory con­ver­sa­tions that are not on a sight run by SIAI too.)

• I think for the same rea­son we had meetup posts on the front page, we should just post stuff to the main LW (this will max­i­mize the use case of re­cruit­ing new peo­ple), and we should move on to cre­at­ing a sub-red­dit only when the vol­ume be­comes dis­tract­ing. The de­ci­sion the­ory list can con­tinue to serve the cur­rent pur­pose of a place to dis­cuss par­tic­u­larly ar­cane ideas (that are not re­sults), which new re­cruits won’t typ­i­cally un­der­stand any­way (and so why waste any­one’s time). Publi­cly open­ing list’s archives (but not sub­scrip­tion, we don’t have vot­ing over there to defend our­selves) can cover the rest.

But we might want to adopt a stan­dard tag on LW so that peo­ple can sub­scribe to this part only, which I can com­mit to con­sis­tently set­ting on all posts that qual­ity, say “de­ci­sion_the­ory” (just fixed for this post).

• Can LW gen­er­ate the kind of in­sights needed to make progress on prob­lems like ASP?

Yes, es­pe­cially if matt adds a ‘de­ci­sion the­ory’ sub­red­dit.

From con­text I sup­pose you’re talk­ing about a sub­red­dit of LessWrong, not a sub­red­dit of Red­dit. Is there a list of ex­ist­ing LessWrong sub­red­dits some­where, or is this pro­posal to cre­ate the first one?

• In April 2010 Gary Drescher pro­posed the “Agent simu­lates pre­dic­tor” prob­lem, or ASP, that shows how agents with lots of com­pu­ta­tional power some­times fare worse than agents with limited re­sources.

Just to give due credit: Wei Dai and oth­ers had already dis­cussed Pri­soner’s Dilemma sce­nar­ios that ex­hibit a similar prob­lem, which I then dis­til­led into the ASP prob­lem.

• Speak­ing only for my­self: Eliezer’s se­quences first lured me to Less Wrong, but your posts on de­ci­sion the­ory were what con­vinced me to stick around and keep check­ing the front page.

I con­fess I don’t un­der­stand all of the math. It’s been decades since I stud­ied math­e­mat­ics with any rigour; these days I can fol­low some fairly ad­vanced the­ory, but have difficulty re­pro­duc­ing it, and can­not in gen­eral ex­tend it. I have had noth­ing sub­stan­tial to add, and so I haven’t pre­vi­ously com­mented on one of your posts. Some­how LW didn’t seem like the best place to go all fan­girl…

It’s not only the con­trib­u­tors who are fol­low­ing your work on de­ci­sion the­ory. I hope you’ll con­tinue to share it with the rest of us.

• I would also like to fol­low the dis­cus­sion on de­ci­sion the­ory.

• This whole thing seems like an ar­ti­fact of failing to draw a bound­ary be­tween de­ci­sion the­o­ries and strate­gies. In my own work, I have for de­ter­minis­tic prob­lems:

``````World: Strategy->Outcome
Preference: (Outcome,Outcome)->bool (an ordering)
Strategy: Observation->Decision
DecisionTheory: (World,Preference)->Strategy
``````

Phrased this way, the pre­dic­tor is part of the World func­tion, which means it is only al­lowed to simu­late a Strat­egy, not a De­ci­sionThe­ory, and so the prob­lem as stated is ill-posed. This struc­ture is re­quired for de­ci­sion the­ory in gen­eral to have cor­rect an­swers, be­cause oth­er­wise you could con­struct a prob­lem for any de­ci­sion the­ory, no mat­ter how ridicu­lous, which only that de­ci­sion the­ory can win at.

• This struc­ture is re­quired for de­ci­sion the­ory in gen­eral to have cor­rect an­swers, be­cause oth­er­wise you could con­struct a prob­lem for any de­ci­sion the­ory, no mat­ter how ridicu­lous, which only that de­ci­sion the­ory can win at.

I dis­agree. An agent can use any con­sid­er­a­tions what­so­ever in mak­ing its de­ci­sions, and these con­sid­er­a­tions can re­fer to the world, or its own al­gorithm, or to the way the world de­pends on agent’s al­gorithm, or to the way the de­pen­dence of the world on agent’s al­gorithm de­pends on agent’s de­ci­sion in a coun­ter­fac­tual world.

You can ob­ject that it’s not fair to pose be­fore the agent prob­lems that ask for recog­ni­tion of facts out­side some pre­defined class of non-ridicu­lous facts, but ask­ing about which situ­a­tions we are al­lowed to pre­sent be­fore an agent is a wrong per­spec­tive. It is wrong be­cause mak­ing an agent with cer­tain char­ac­ter­is­tics au­to­mat­i­cally de­ter­mines the facts of its suc­cess or failure in all of the pos­si­ble sce­nar­ios, fair, un­fair, plau­si­ble, and ridicu­lous.

So the only con­sid­er­a­tion that is al­lowed to dic­tate which con­sid­er­a­tions we are al­lowed to ig­nore is agent’s own prefer­ence. If the agent doesn’t care about in­fluence of some fact, then it can ig­nore it. Typ­i­cally, we won’t be able to for­mally point out any class of facts to which the agent is guaran­teed to be even in prin­ci­ple in­differ­ent. And so de­ci­sion the­ory must not be typed.

(You can see a cer­tain anal­ogy with not de­mand­ing par­tic­u­lar kind of proof. The agent is not al­lowed to re­ject my ar­gu­ment that a cer­tain ac­tion is de­sir­able, or un­de­sir­able, on the ba­sis of the con­sid­er­a­tions I re­fer to not be­long­ing to a par­tic­u­lar priv­ileged class, un­less it re­ally doesn’t care about those con­sid­er­a­tions or any or their log­i­cal con­se­quences (ac­cord­ing to agent’s nor­ma­tive the­ory of in­fer­ence). See also ex­plicit de­pen­dence bias.)

• I think you’ve mi­s­un­der­stood just what re­stric­tions this type schema im­poses on prob­lems. Could you provide a spe­cific ex­am­ple of some­thing you think it ex­cludes, that it shouldn’t?

• The ASP prob­lem de­scribed in the OP is just such an ex­am­ple. Are you per­haps not con­vinced that it rep­re­sents an as­pect of im­por­tant real-world prob­lems?

• What’s World? What do you mean by “World: Strat­egy->Out­come”? The prob­lem is that if World is a func­tion, it’s given by some syn­tac­tic speci­fi­ca­tion, and the agent, be­ing part of the world, can con­trol which func­tion this syn­tac­tic speci­fi­ca­tion refers to, or which value this func­tion has for a par­tic­u­lar ar­gu­ment, in a way un­re­lated to its type, to pass­ing of this Strat­egy thing as its ar­gu­ment. This is a typ­i­cal ex­am­ple of ex­plicit de­pen­dence bias: you de­clare that World (as a math­e­mat­i­cal struc­ture) de­pends on noth­ing, but it re­ally does.

See ex­am­ple “world3” in my post for an ex­am­ple with New­comb’s prob­lem. There, World is given by a pro­gram “world3″ that takes agent’s ac­tion as pa­ram­e­ter, but also de­pends on agent’s ac­tion im­plic­itly in a way that isn’t cap­tured by the in­ter­face of the func­tion. The func­tion in fact shows that two-box­ing dom­i­nates one-box­ing, and one can only see that it’s re­ally worse by rec­og­niz­ing that func­tion’s value when given agent’s ac­tual ac­tion as pa­ram­e­ter, is con­trol­led by agent’s ac­tion in a way that makes one-box­ing prefer­able. And so only see­ing pro­gram “world3” as spec­i­fy­ing a func­tion is un­helpful.

• What do you mean by “World: Strat­egy->Out­come”?

It means “World is the type of func­tion that takes a Strat­egy as in­put and re­turns an Out­comes.”

In the linked ar­ti­cle, you’ve defined world, world2, and world3 as things that’re not ac­tu­ally func­tions; they have un­bound refer­ences to agent, which are pa­ram­e­ters in dis­guise. You then show that if you mix pa­ram­e­ters-as-un­bound-refer­ences with real pa­ram­e­ters, you can get con­fused into think­ing they’re in­de­pen­dent. Which jus means you shouldn’t use un­bound refer­ences.

• Which just means you shouldn’t use un­bound refer­ences.

How do you know that a given syn­tac­tic speci­fi­ca­tion of a func­tion doesn’t “use un­bound refer­ences”? Pres­ence of a log­i­cal de­pen­dence on agent’s ac­tion looks like some­thing al­gorith­mi­cally im­pos­si­ble to de­tect. You can of course fo­cus on a class of syn­tac­tic speci­fi­ca­tions that are known to not de­pend on agent’s ac­tion (for ex­am­ple, GLUTs), but this is too re­stric­tive for a FAI-grade de­ci­sion the­ory that can han­dle the ac­tual world, and ig­nores the prob­lem of whether the pro­cess that speci­fied that GLUT could it­self log­i­cally de­pend on agent’s ac­tion. (The very setup of a thought ex­per­i­ment could be con­trol­led by an agent act­ing in­side it, for ex­am­ple, for strange enough or sen­si­tive-to-de­tail enough thought ex­per­i­ments.)

• How do you know that a given syn­tac­tic speci­fi­ca­tion of a func­tion doesn’t “use un­bound refer­ences”?

Give it as in­put to a com­piler and see if it gives an er­ror mes­sage or not. Or ap­ply the same triv­ial pro­ce­dure com­pilers use: read it and look for a defi­ni­tion for ev­ery sym­bol that is not it­self the left hand side of a defi­ni­tion.

• Give it as in­put to a com­piler and see if it gives an er­ror mes­sage or not. Or ap­ply the same triv­ial pro­ce­dure com­pilers use: read it and look for a defi­ni­tion for ev­ery sym­bol that is not it­self the left hand side of a defi­ni­tion.

It doesn’t re­fer to any sym­bols, see in par­tic­u­lar the differ­ence be­tween “world” and “world2″, and no­tice that “world3” doesn’t re­fer to “ac­tion()”, but in­stead to “ac­tion2()”, which you can as­sume to be a copy-paste of “ac­tion()”’s source code, with all the sym­bols re­named.

• Ok, I think I see where our for­mal­iza­tions differ. In the for­mal­iza­tion I’m us­ing, the de­ci­sion the­ory pro­duces a strat­egy, which is a func­tion that’s given as an ar­gu­ment to the world pro­gram. The world pro­gram in­vokes the strat­egy zero or more times, each time pass­ing in some ar­gu­ments that give what­ever in­for­ma­tion is available to the agent at some point, and get­ting back a (real or pre­dicted) de­ci­sion. The world pro­gram is com­pletely self-con­tained; other than through the ar­gu­ment it re­ceives, it may not con­tain refer­ences to the agent’s choices at all. The strat­egy is similarly self-con­tained; it re­ceives no in­for­ma­tion about the world ex­cept through the ar­gu­ments the world pro­gram passes to it. Then sep­a­rately from that, a “de­ci­sion the­ory” is a func­tion that takes a sym­bolic rep­re­sen­ta­tion of a world pro­gram, and re­turns a sym­bolic rep­re­sen­ta­tion of a strat­egy.

Ul­ti­mately, this amounts to a re­fac­tor­ing; re­sults that hold in one sys­tem still hold in the other, if you map the defi­ni­tions ap­pro­pri­ately. How­ever, I’ve found that struc­tur­ing prob­lems this way makes the the­ory eas­ier to build on, and makes un­der­speci­fied prob­lems eas­ier to no­tice.

• The world pro­gram is com­pletely self-con­tained; other than through the ar­gu­ment it re­ceives, it may not con­tain refer­ences to the agent’s choices at all.

Can you for­mal­ize this re­quire­ment? If I copy agent’s code, re­name all sym­bols, obfus­cate it, simu­late its ex­e­cu­tion in a source code in­ter­preter that runs in a hard­ware em­u­la­tor run­ning on an em­u­lated linux box run­ning on javascript in­side a browser run­ning on Win­dows run­ning on a hard­ware simu­la­tor im­ple­mented (and then obfus­cated again) in the same lan­guage as the world pro­gram, and in­sert this thing in the world pro­gram (along with a few billion peo­ple and a planet and a uni­verse), how can you pos­si­bly make sure that there is no de­pen­dence?

• Can you for­mal­ize this re­quire­ment? If I copy agent’s code … and in­sert this thing in the world pro­gram, how can you pos­si­bly make sure that there is no de­pen­dence?

You don’t get to do that, be­cause when you’re writ­ing World, the Strat­egy hasn’t been de­ter­mined yet. Think of it as a challenge-re­sponse pro­to­col; World is a challenge, and Strat­egy is a re­sponse. You can still do agent-copy­ing, but you have to en­large the scope of World to in­clude the rules by which that copy­ing was done, or else you get un­re­lated agents in­stead of copies.

• To copy agent’s code, you don’t need to know strat­egy. World nat­u­rally changes if you change it, and the strat­egy might change as well if you run the agent on a changed world, but agent’s code is still the same, and you know this code. The new world will only de­pend on the new strat­egy, not the old one, but now we have a world that de­pends on its agent’s strat­egy, and you won’t be able to find how it does, if you don’t already know.

In any case, all this copy­ing is ir­rele­vant, be­cause the point is that there can ex­ist very con­voluted wor­lds that de­pend on agent’s ac­tion, but it’s not fea­si­ble to know that they do or how they do. And we don’t get to choose the real world.

• Yeah, in ASP the pre­dic­tor looks in­side the agent. But the prob­lem still seems “fair” in a cer­tain sense, be­cause the pre­dic­tor’s pre­dic­tion is log­i­cally tied to the agent’s de­ci­sion. A stupid agent that one-boxed no mat­ter what would get re­warded, and a stupid agent that two-boxed would get pun­ished. In other words, I’d still like a de­ci­sion the­ory that does well on some suit­ably defined class of ASP-like prob­lems, even if that class is wider than the class of “TDT-fair” prob­lems that Eliezer en­vi­sioned. Of course we need a lot of progress to pre­cisely define such classes of prob­lems, too.

• In other words, I’d still like a de­ci­sion the­ory that does well on some suit­ably defined class of ASP-like prob­lems, even if that class is wider than the class of “TDT-fair” prob­lems that Eliezer en­vi­sioned. Of course we need a lot of progress to pre­cisely define such classes of prob­lems, too.

It would be use­ful to have list of prob­lems that TDT can han­dle, a list that cur­rent speci­fi­ca­tions of UDT can han­dle and a list that are still in the grey area of not quite re­solved. Among other things that would make the differ­ence be­tween TDT and UDT far more in­tu­itively clear!

• The class of prob­lems which are well-posed in this type sys­tem is ex­actly the class of prob­lems that would not change if you gave the agent a chance to self-mod­ify in be­tween re­ceiv­ing the prob­lem state­ment and the start of the world pro­gram. Prob­lems out­side this class are nei­ther fair nor solv­able in prin­ci­ple nor in­ter­est­ing.

• I work un­der the as­sump­tion that the prob­lem state­ment is a world pro­gram or a prior over world pro­grams. Maybe it’s a bad as­sump­tion. Can you sug­gest a bet­ter one?

• I use that same as­sump­tion, with only the slight caveat that I keep the world pro­gram and the prefer­ence func­tion sep­a­rate for clar­ity’s sake, and you need both, but I of­ten see them com­bined into one func­tion.

The big differ­ence, I think, is that the way I do it, the world pro­gram doesn’t get a de­ci­sion the­ory di­rectly as in­put; in­stead, the world pro­gram’s source is given to the de­ci­sion the­ory, the de­ci­sion the­ory out­puts a strat­egy, and then the strat­egy is given to the world pro­gram as in­put. This is a bet­ter match for how we nor­mally talk about de­ci­sion the­ory prob­lems, and pre­vents a lot of shen­nani­gans.

• Of course the world pro­gram shouldn’t get the de­ci­sion the­ory as in­put! In the for­mu­la­tion I always use, the world pro­gram doesn’t have any in­puts, it’s a com­pu­ta­tion with no ar­gu­ments that re­turns a util­ity value. You live in the world, so the world pro­gram con­tains your de­ci­sion the­ory as a sub­rou­tine :-)

• That’s very wrong. You’ve given up the dis­tinc­tions be­tween ques­tion and an­swer, and be­tween an­swer and solu­tion pro­ce­dure. This also lets in stupid prob­lems, like “you get a dol­lar iff the source code to your de­ci­sion the­ory is prime”, which I would clas­sify as not a de­ci­sion the­ory prob­lem at all.

• Yeah, you can’t con­trol ar­bi­trary world pro­grams that way, but re­mem­ber that our world re­ally is like that: a hu­man mind runs within the same uni­verse that it’s try­ing to con­trol. One idea would be to define a class of “fair” world pro­grams (per­haps those that can be fac­tor­ized in ap­pro­pri­ate ways) and care only about solv­ing those. But I guess I’d bet­ter wait for your writeup, be­cause I don’t un­der­stand what kind of for­mal­ism you pre­fer.

• This also lets in stupid prob­lems, like “you get a dol­lar iff the source code to your de­ci­sion the­ory is prime”, which I would clas­sify as not a de­ci­sion the­ory prob­lem at all.

In this case, the out­come is the same for ev­ery ac­tion you make. A de­ci­sion the­ory can still con­sider this prob­lem, and right­fully con­clude that it’s ir­rele­vant which ac­tion it takes (or maybe it in­fers that jump­ing on a right foot has a slightly higher chance of mak­ing Dark Lords of the Ma­trix to have writ­ten your source code so that it’s prime). I don’t see how this is a prob­lem.

• That’s very wrong. You’ve given up the dis­tinc­tions be­tween ques­tion and an­swer, and be­tween an­swer and solu­tion pro­ce­dure.

But the world is re­ally like that. It is es­sen­tial to at­tain this.

• I don’t know of any im­por­tant as­pect of the world that can’t be cap­tured in ques­tion-an­swer-pro­ce­dure for­mat. Could you give an ex­am­ple?

• But who does the cap­tur­ing? (To push the anal­ogy one step up, I know of no solv­able ques­tion that doesn’t ad­mit an an­swer writ­ten on a sheet of pa­per.)

See the sec­tion on “util­ity func­tions” in this post. UDT/​ADT seeks ex­actly to in­fer func­tional de­pen­den­cies that can then be used in the usual man­ner (at least, this is one way to look at what’s go­ing on). It is in­tended to solve the very prob­lem which you ar­gue must always be solv­able.

My ex­am­ples of ex­plicit de­pen­dence bias show that there are wrong and right ways of pars­ing the out­come as de­pend­ing on agent’s de­ci­sion. If we are guaran­teed to be given a right pars­ing, then that part is in­deed cov­ered, and there is no need to worry about the wrong ones. Believ­ing that a right pars­ing merely ex­ists doesn’t par­tic­u­larly help in find­ing one.

So I guess the prob­lem you are hav­ing with UDT is that you as­sume that the prob­lem of find­ing a cor­rect ex­plicit de­pen­dence of out­come on ac­tion is already solved, and we have a func­tion World ready speci­fied in a way that doesn’t in any way im­plic­itly de­pend on agent’s ac­tions. But UDT is in­tended to solve the prob­lem while not mak­ing this as­sump­tion, and in­stead tries to find an un­bi­ased de­pen­dence on its own. Since you as­sume the goal of UDT as a pre­req­ui­site in your think­ing about de­ci­sion the­ory, you don’t see the mo­ti­va­tion for UDT, which in­deed there would be none had we as­sumed this prob­lem solved.

• We’re talk­ing past each other, and this back-and-forth con­ver­sa­tion isn’t go­ing any­where be­cause we’re start­ing from very differ­ent defi­ni­tions. Let’s restart this con­ver­sa­tion af­ter I’ve finished the post that builds up the defi­ni­tions from scratch.

• At least ad­dress this con­cern, which sug­gests that our difficulty is prob­a­bly an easy one of tech­ni­cal con­fu­sion, and not of com­mu­ni­cat­ing in­tu­itive un­der­stand­ing of rele­vance of study­ing a cer­tain ques­tion.

• Since the agent can de­duce (by low-level simu­la­tion) what the pre­dic­tor will do, the agent does not re­gard the pre­dic­tion out­come as con­tin­gent on the agent’s com­pu­ta­tion.

I’m con­fused on this point, would you mind check­ing if my think­ing on it is cor­rect?

My ini­tial ob­jec­tion was that this seems to as­sume that the pre­dic­tor doesn’t take any­thing into ac­count, and that the agent was try­ing to pre­dict what the pre­dic­tor would do with­out try­ing to figure out what the pre­dic­tor would pre­dict the agent would choose.

Then I no­ticed that the pre­dic­tor isn’t ac­tu­ally wait­ing for the agent to finish mak­ing its de­ci­sion, it was us­ing a higher level of rep­re­sen­ta­tion of how the agent thinks. Tak­ing this into ac­count, the agent’s abil­ity to simu­late the pre­dic­tor im­plic­itly in­cludes the abil­ity to com­pute what the pre­dic­tor pre­dicts the agent will do.

So then I was con­fused about why this is still a prob­lem. My in­tu­ition was bang­ing its head against the wall in­sist­ing that the pre­dic­tor still has to take into ac­count the agent’s de­ci­sion, and that the agent couldn’t model the pre­dic­tor’s pre­dic­tion as not con­tin­gent on the agent’s de­ci­sion.

Then I no­ticed that the real is­sue isn’t any par­tic­u­lar pre­dic­tion that the agent thinks the pre­dic­tor would pre­dict, so much as the fact that the agent sees this hap­pen­ing with prob­a­bil­ity one, and hap­pen­ing re­gard­less of what it chooses to do. Since the agent already “knows” what the pre­dic­tor will pre­dict, it is free to choose to two-box, which will always be higher util­ity once you can’t causally effect the boxes.

• Your think­ing seems to be sort of cor­rect, but in­for­mal. I’m not sure if my proof in the sec­ond part of the post was parseable, but at least it’s an ac­tual proof of the thing you want to know, so maybe you could try pars­ing it :-)

• Thanks for the re­sponse, that was fast.

I can parse it, but I don’t re­ally think that I un­der­stand it in a math­e­mat­i­cal way.

A is a state­ment that makes sense to me, and I can see why the pre­dic­tor needs to know that the agent’s proof sys­tem is con­sis­tent.

What I don’t get about it is why you spec­ify that the pre­dic­tor com­putes proofs up to length N, and then just say how the pre­dic­tor will do its proof.

Ba­si­cally, I have no for­mal math­e­mat­ics ed­u­ca­tion in fields that aren’t a di­rect pre­req­ui­site of ba­sic mul­ti­vari­able calcu­lus, and my in­for­mal math­e­mat­ics ed­u­ca­tion con­sists of Godel, Escher, Bach. And a wikipe­dia ed­u­ca­tion in game the­ory.

Do you have any sug­ges­tions on how to get started in this area? Like, in­tro­duc­tory sources or even just terms that I should look up on wikipe­dia or google?

• What I don’t get about it is why you spec­ify that the pre­dic­tor com­putes proofs up to length N, and then just say how the pre­dic­tor will do its proof.

If the out­lined proof is less than N sym­bols long (which is true if N is large enough), the pre­dic­tor will find it be­cause it enu­mer­ates all proofs up to that length. Since the pre­dic­tor’s proof sys­tem is con­sis­tent, it won’t find any other proofs con­tra­dict­ing this one.

• The N < M is nec­es­sary to guaran­tee that the agent pre­dicts the pre­dic­tor’s proof, right?

What hap­pens if the out­lined proof is more than N sym­bols long?

• The N < M is nec­es­sary to guaran­tee that the agent pre­dicts the pre­dic­tor’s proof, right?

Yeah. Ac­tu­ally, N must be ex­po­nen­tially smaller than M, so the agent’s proofs can com­pletely simu­late the pre­dic­tor’s ex­e­cu­tion.

What hap­pens if the out­lined proof is more than N sym­bols long?

No idea. :-) Maybe the pre­dic­tor will fail to prove any­thing, and fall back to filling only one box, I guess? Any­way, the out­lined proof is quite short, so the prob­lem already arises for not very large val­ues of N.

• What the agent tries to in­fer, is pre­dic­tor’s be­hav­ior for each of agent’s pos­si­ble ac­tions, not just un­con­di­tional pre­dic­tor’s be­hav­ior. Be­ing able to in­fer pre­dic­tor’s de­ci­sion with­out as­sum­ing agent’s ac­tion is equiv­a­lent to con­clus­ing that pre­dic­tor’s de­ci­sion is a con­stant func­tion of agent’s ac­tion (in agent’s opinion, given the kind of de­ci­sion-maker our agent is, which is some­thing that it should be able to con­trol bet­ter, but cur­rent ver­sion of the the­ory doesn’t sup­port).

• This prob­lem is un­der­speci­fied, in that it does not say what hap­pens if the pre­dic­tor fails to find a proof ei­ther way. Which is ex­actly what will hap­pen if the agent simu­lates the pre­dic­tor.

• Do you un­der­stand the sec­ond part of the post where I gave con­crete im­ple­men­ta­tions of the pre­dic­tor and the agent that simu­lates it? In that case the pre­dic­tor suc­cess­fully finds a proof that the agent two-boxes.

• On an ex­tremely similar note, as I’ve ar­gued here, I’m pretty sure that AIXI two-boxes on New­comb’s prob­lem.

Ba­si­cally my ar­gu­ment is that AIXI can fairly triv­ially work out whether the box is full or empty, and hence it two-boxes due to causal as­sump­tions im­plicit in the AIXI al­gorithm. More­over, that line of rea­son­ing is very easy for Omega to come across, and so Omega pre­dicts two-box­ing and the box ends up be­ing empty, and then AIXI takes two boxes and ends up with \$1000.

• It can be ar­gued that the “epistemic ad­van­tage” of the pre­dic­tor over the agent is an un­fair one. After all, if the agent had an equiv­a­lent ax­iom for pre­dic­tor’s con­sis­tency, both would be in­con­sis­tent.

In the ab­sence of this ad­van­tage, the pre­dic­tor won’t be able to find a proof of agent’s ac­tion (if it is con­sis­tent).

• Yes, that’s a valid rea­son to dis­count prob­lems like ASP. It’s awful how much we don’t know...

• It’s awful how much we don’t know...

Well, yes, but… is there a feel­ing at SI that these kinds of prob­lems are rele­vant to AGI, friendly or no? Or is this just some­thing gen­er­ally in­ter­est­ing (maybe with the hope that these prob­lems may point to some­thing tan­gen­tial to them, but which would turn out to be highly rele­vant)?

I mean, gen­er­ally I would say that ideas con­nected to these ap­proaches fall into the “sym­bolic AI” paradigm, which is sup­posed to be dis­cred­ited by some se­ri­ously revered peo­ple, like Hofs­tadter...

• Well, to make a guaran­teed friendly AI you prob­a­bly need to prove the­o­rems about your AI de­sign. And our uni­verse most likely con­tains many copies of ev­ery­thing. So figur­ing out the right de­ci­sion the­ory in the pres­ence of copies seems to be a nec­es­sary step on the road to FAI. I don’t speak for SingInst here, this is just how I feel.

• to make a guaran­teed friendly AI you prob­a­bly need to prove the­o­rems about your AI de­sign.

Wouldn’t this be a level mis­match in a multi-level AI ar­chi­tec­ture? Like, prov­ing things about low-level neu­ral com­pu­ta­tional sub­strate in­stead of about the con­cep­tual level where ac­tual cog­ni­tion would take place, and where the ac­tual friendli­ness would be defined? [and this level can’t be iso­mor­phic to any for­mal log­i­cal sys­tem, ex­cept in sym­bolic AI, which doesn’t work]

figur­ing out the right de­ci­sion the­ory in the pres­ence of copies seems to be a nec­es­sary step on the road to FAI

And again, a con­cep­tual-level un­der­stand­ing should do the trick. Like, know­ing that I play PD against my­self would be suffi­cient to co­op­er­ate. Be­sides, as EY fre­quently says, it’s re­ally hard to find one­self in a true PD. Usu­ally, it’s iter­ated PD, or some Schel­ling’s con­flict game [BTW, huge thanks for recom­mend­ing his book in one of your posts!]

• If a mul­ti­level ar­chi­tec­ture (what­ever it is) makes prov­able friendli­ness im­pos­si­ble, then FAI can’t use it.

I imag­ine the fu­ture FAI as closer to AIXI, which works fine with­out mul­ti­level ar­chi­tec­ture, than to the Lisp pro­grams of the 70s.

• AFAICT, the gen­eral ar­chi­tec­ture that EY ad­vo­cates (-ed?) in “Levels of Or­ga­ni­za­tion in GI” is mul­ti­level. But this doesn’t au­to­mat­i­cally mean that it’s im­pos­si­ble to prove any­thing about it. Maybe it’s pos­si­ble, just not us­ing the for­mal logic meth­ods. [And so maybe get­ting not a 100% cer­tainty, but 100-1e-N%, which should be suffi­cient for large enough N].

AIXI doesn’t work so much more than sym­bolic AI Lisp pro­grams of the 70s. I mean, the Gen­eral Prob­lem Solver would be su­per­in­tel­li­gent given in­finite com­put­ing power.

• Eliezer says here:

A good deal of the ma­te­rial I have ever pro­duced – speci­fi­cally, ev­ery­thing dated 2002 or ear­lier – I now con­sider com­pletely ob­so­lete. (...) I no longer con­sider LOGI’s the­ory use­ful for build­ing de novo AI.

To make the Gen­eral Prob­lem Solver or any other pow­er­ful com­put­ing de­vice do any­thing in­ter­est­ing in the real world, you need to give it a for­mal de­scrip­tion that con­tains the real world as a spe­cial case. You could use the uni­ver­sal prior, which gives you AIXI. Or you could use the yet-un­speci­fied prior of UDT, which gives you the yet-un­speci­fied UDT-AIXI.

The cen­tral difficulty of de­ci­sion the­ory doesn’t get eas­ier if you have lots of com­put­ing power. Imag­ine you’re play­ing the PD against some­one. You both know each other’s source code, but you have a halt­ing or­a­cle and your op­po­nent doesn’t. With so much power, what do you do? I simu­late the other guy and… whoops, didn’t think this through. Looks like I must avoid look­ing at the re­sult. Hmmm.

• Oh… LOGI’s to­tally re­lin­quished then? They should mark the pa­per as com­pletely ob­so­lete in the list of SI pub­li­ca­tions, oth­er­wise it’s con­fus­ing :) I was un­der im­pres­sion I read some rel­a­tively re­cent Eliezer’s text where he says the prospec­tive FAI re­searchers must thor­oughly un­der­stand LOGI be­fore mov­ing to the cur­rent even more ad­vanced undis­closed ar­chi­tec­ture...

The cen­tral difficulty of de­ci­sion the­ory doesn’t get eas­ier if you have lots of com­put­ing power

Yes, this is an in­ter­est­ing prob­lem. And it ap­pears to pro­duce some neat metaphors. Like: main­tain illu­sion of free will by de­liber­ately avoid­ing know­ing your own de­ci­sion in ad­vance [or be­come crazy]. And avoid de-hu­man­iz­ing your op­po­nents [or get defected].

But does it re­main rele­vant given limits on the com­put­ing power? [= as­sum­ing nei­ther simu­la­tion nor any kind of for­mal proof is fea­si­ble]

• I was un­der im­pres­sion I read some rel­a­tively re­cent Eliezer’s text where he says the prospec­tive FAI re­searchers must thor­oughly un­der­stand LOGI be­fore mov­ing to the cur­rent even more ad­vanced undis­closed ar­chi­tec­ture...

That sounds weird. Can you find a link?

But does it re­main rele­vant given limits on the com­put­ing power? [= as­sum­ing nei­ther simu­la­tion nor any kind of for­mal proof is fea­si­ble]

That seems to be a much stronger as­sump­tion than just limit­ing com­put­ing power. It can be bro­ken by one player strate­gi­cally weak­en­ing them­selves, if they can benefit from be­ing simu­lated.

• That sounds weird. Can you find a link?

This.

It can be bro­ken by one player strate­gi­cally weak­en­ing them­selves, if they can benefit from be­ing simu­lated.

Are you sure this is pos­si­ble? I tried to do this with the “im­per­son­ate other agents” strat­egy, but it does not seem to work if the op­po­nent has your source code. The other agent knows you’re not ac­tu­ally them, just im­per­son­at­ing :)

There is a pos­si­bil­ity to send out a differ­ent sim­ple pro­gram in­stead of your­self (or fully self-mod­ify into the said pro­gram, there is no differ­ence), but it would be a wholly differ­ent prob­lem (and eas­ily solv­able) from the origi­nal one.

• Ouch, that text sounds painful, it’s prob­a­bly about as old as LOGI.

• Well, not quite that old, but yes, not very re­cent. The in­ter­net archive says the page was cre­ated at the end of 2009, but it was prob­a­bly not done by EY him­self. The ear­liest refer­ence google gives is in 2007...

So, you’re say­ing, now the party line is on sin­gle-level for­mal sys­tem-style ar­chi­tec­tures? But does it even make sense to try to define FAI-mean­ingful con­cepts in such ar­chi­tec­ture? Isn’t it like try­ing to define ‘love’, ‘free­dom’, and ‘jus­tice’ in terms of atoms?

I re­mem­ber EY say­ing some­where (can’t find where now) that AIXI de­sign was very com­mend­able in the sense that here fi­nally is a full AGI de­sign that can be clearly shown to kill you :)

• Here is a 2003 refer­ence to the origi­nal SL4 wiki post, which is still on­line but for some rea­son not in­dexed by Google.

• I only know what the de­ci­sion the­ory folks are do­ing, don’t know about the SingInst party line.

For­mally defin­ing “love” may be eas­ier than you think. For ex­am­ple, Paul Chris­ti­ano’s blog has some posts about us­ing “poin­t­ers” to our world: take a long bit­string, like the text of Fin­negans Wake, and tell the AI to in­fluence what­ever al­gorithm was most likely to pro­duce that string un­der the uni­ver­sal prior. Also I have played with the idea of us­ing UDT to in­crease the mea­sure of speci­fied bit­strings. Such ideas don’t re­quire know­ing cor­rect physics down to the level of atoms, and I can eas­ily imag­ine that we may find a for­mal way of point­ing the AI at any hu­man-rec­og­niz­able idea with­out go­ing through atoms.

• Thanks for the refer­ence! I skimmed over the blog, and wow! The amount of se­ri­ously con­sid­ered weird­ness is stag­ger­ing :) (like: acausal coun­ter­fac­tual takeover by a simu­lat­ing UFAI!). It is of huge en­ter­tain­ment value, of course, but… most of it ap­pears to be con­di­tioned on blatantly im­pos­si­ble premises, so it’s hard to take the con­cerns se­ri­ously. Maybe it’s lack of imag­i­na­tion on my part...

Re­gard­ing the solu­tion to defin­ing com­plex con­cepts via low-level in­puts, as far as I un­der­stood the idea, you do not re­move the multi-lev­eled­ness, just let it be in­ferred in­ter­nally by the AI and re­fuse to look at how it is done. Be­sides, it does not ap­pear to solve the prob­lem: metaphor­i­cally speak­ing, we are not re­ally in­ter­ested in get­ting the pre­cise text (Fin­negans Wake) down to its last typo, but in a prob­a­bil­ity mea­sure over all pos­si­ble texts, which is con­cen­trated on texts that are “suffi­ciently similar”. In fact, we are most in­ter­ested in defin­ing this similar­ity, which is ex­tremely in­tri­cate and non-triv­ial (it may in­clude, for ex­am­ple, trans­la­tions into other lan­guages).

• Your com­ment re­minded me of a post I’ve long wanted to write. The idea is that ex­am­in­ing as­sump­tions is un­pro­duc­tive. The only way to make in­tel­lec­tual progress, ei­ther in­di­vi­d­u­ally or as a group, is to stop ar­gu­ing about as­sump­tions and in­stead ex­plore their im­pli­ca­tions wher­ever they might lead. The #1 rea­son why I took so long to un­der­stand New­comb’s Prob­lem or Coun­ter­fac­tual Mug­ging was my in­sis­tence on deny­ing the as­sump­tions be­hind these prob­lems. In­stead I should have said to my­self, okay, is this di­rec­tion of in­quiry in­ter­est­ing when taken on its own terms?

Many as­sump­tions seemed di­vorced from real life at first, e.g. peo­ple dis­missed the study of elec­tro­mag­netism as an im­prac­ti­cal toy, and con­sid­ered num­ber the­ory hope­lessly ab­stract un­til cryp­tog­ra­phy ar­rived. Peo­ple seem un­able to judge the use­ful­ness of as­sump­tions be­fore ex­plor­ing their im­pli­ca­tions in de­tail, but they ab­solutely love ar­gu­ing about as­sump­tions in­stead of get­ting any­thing done.

There, thanks for en­courag­ing me to write the first draft :-)

• Ab­solutely, I agree of course. If a line of in­quiry is in­ter­est­ing in it­self and a progress is be­ing made, why not pur­sue it? My ques­tion was only about its di­rect rele­vance to FAI. Or, rather, whether the ar­gu­ments that I made to my­self about its non-rele­vance can be eas­ily re­futed.

And, you know, ques­tion­ing of as­sump­tions can some­times be use­ful too. From a false as­sump­tion any­thing fol­lows :)

In any case, I’m glad to be of ser­vice, how­ever small. Your posts are gen­er­ally ex­cel­lent.

• In­ter­est­ing. Do you see any cur­rent ar­gu­ments over as­sump­tions that we should stop (on LW or el­se­where)?

• Hmm, looks like I some­times at­tack peo­ple for start­ing from (what I con­sider) wrong as­sump­tions. Maybe I should re­think my po­si­tion on those is­sues.

• It seems to me that this prob­lem as­sumes that the pre­dic­tor both does and does not pre­dict cor­rectly.

When de­ter­min­ing the pre­dic­tor’s ac­tions, we as­sume that it forsees the agent’s two-box­ing.

When de­ter­min­ing the agent’s ac­tions, we as­sume that the simu­lated pre­dic­tor be­haves the same re­gard­less of the agent’s de­ci­sion.

The ques­tion thus seems to con­tra­dict it­self.

• In this prob­lem the pre­dic­tor pre­dicts cor­rectly. Can you ex­plain why you think it pre­dicts in­cor­rectly?

• the agent does not re­gard the pre­dic­tion out­come as con­tin­gent on the agent’s com­pu­ta­tion.

In try­ing to ex­plain why the simu­la­tion ought to show that the pre­dic­tion out­come is in fact con­tin­gent, I re­al­ized that I was con­fused, so I’m go­ing to dis­re­gard what I pre­vi­ously tried to think, and start over.

The re­sults are messy and full of wrong ideas; I sug­gest skim­ming to get the gist of it. That is: the fol­low­ing text is a noisy sig­nal of what I’m try­ing to think, so don’t read the de­tails too closely.

--

I may have to re­con­sider whether I prop­erly grokked the se­quence on free will and de­ter­minism.

To the ex­tent that the agent has not yet made up its mind as to whether it will one-box or two-box, the simu­la­tion ought to re­flect that.

The effect of pos­sess­ing the simu­la­tion should be that the agent has (1) un­usu­ally high con­fi­dence in the pre­dic­tor’s ac­cu­racy, and (2) ex­cep­tional lu­minos­ity, which ought to be as­sumed of de­ci­sion-the­o­retic agents any­way.

Be­ing lu­mi­nous about my de­ci­sion-mak­ing pro­cess does not mean that I have made up my mind. Similarly, be­ing able to run a pro­gram that con­tains a highly-ac­cu­rate high-level de­scrip­tion of my­self does not mean that I already know what de­ci­sion I’m go­ing to even­tu­ally make.

All this adds up to cir­cum­stances that, if any­thing, more strongly sup­port one-box­ing than vanilla New­comb does.

--

Alter­na­tively, if we ac­cept as given that the model’s mind is made up when the agent runs the simu­la­tion, we must equally in­fer that the agent’s de­ci­sion is like­wise de­ter­mined. From the agent’s point of view, this means that its key choice oc­curred ear­lier, in which case we can just ask about its de­ci­sion back then in­stead of now, and we again have a stan­dard New­comb/​Parfit situ­a­tion.

--

In short, two-box­ing re­quires the agent to be stupid about causal­ity and de­ter­minism, so the ques­tion is fun­da­men­tally noth­ing new.

• I don’t even know a frac­tion of the math you peo­ple know, but this prob­lem seems ob­vi­ous to me: One-box fully ex­pect­ing the box to be empty (or some­thing im­pos­si­ble to hap­pen).

More gen­er­ally, if ex­pect­ing A im­plies B, ex­pect­ing B im­plies A or C and ex­pect­ing C im­plies C and U(B)>U(C)>U(A) ex­pect A un­less the cost of “be­ing wrong” is larger than the differ­ence. (A be­ing one-box­ing the empty box, C be­ing two-box­ing with one box empty and B be­ing one-box­ing a full box here, in this case B → C would be via D ex­pect­ing to two-box with both boxes full, which is not im­plied by any ex­pec­ta­tion and there­fore un­reach­able).

In the model I sup­pose the agent would prove that the util­ity of the var­i­ous ac­tions de­pends on whether the agent always chooses the ac­tion for which the great­est util­ity was proven, go into the branch that makes an ex­cep­tion here, and this be­ing cor­rectly pre­dicted by the pre­dic­tor.

Is there some­thing im­por­tant I’m miss­ing?

• Another way to phrase it is that this prob­lem is iso­mor­phic to the trans­par­ent box New­comb prob­lem, and you are try­ing to find a for­mal­ized de­ci­sion the­ory that will one-box the empty box “know­ing” it is empty. (Just that in­stead of “know­ing” through up­dat­ing on see­ing the empty box, which UDT re­fuses to do, there is an equiv­a­lent trick with the de­pen­dence.) The only way you can do that is if you ei­ther don’t ac­tu­ally try to max­i­mize the money at that point or ex­pect a con­tra­dic­tion. Not try­ing to max­i­mize the money in such situ­a­tions prob­a­bly is eas­ier to deal with.

• Since the agent can de­duce (by low-level simu­la­tion) what the pre­dic­tor will do, the agent does not re­gard the pre­dic­tion out­come as con­tin­gent on the agent’s com­pu­ta­tion.

What hap­pens if the UDT agent gen­er­ates a proof that us­ing any proof longer than N re­sults in only \$1000? Is that level of self-refer­ence al­lowed?

• What hap­pens if the UDT agent gen­er­ates a proof that us­ing any proof longer than N re­sults in only \$1000? Is that level of self-refer­ence al­lowed?

Yes, but cur­rent ver­sions of the de­ci­sion the­ory can’t re­spond to this con­clu­sion of the meta-proof by not gen­er­at­ing the proof, and re­ally the prob­lem is not that the agent gen­er­ates the proof, but that it uses it. It could gen­er­ate it and then ig­nore the re­sult (based on the meta-proof), which would re­sult in suc­cess, but this level of de­ci­sion rules is not cur­rently sup­ported.

• I was think­ing that if proofs are al­lowed to use their own length as a premise, then the ex­pected pay­off from each proof would de­pend on the length of that proof. Both agent and pre­dic­tor can prove that any proof which is too long re­sults in \$1000 or \$0, there­fore the agent would choose a shorter proof that gives \$1000000.

I have no idea whether that’s cor­rect, though.

• I’d re­ally like to have a de­ci­sion the­ory that would do such things au­to­mat­i­cally when the situ­a­tion calls for it. Un­for­tu­nately, the one in my for­mal­iza­tion doesn’t :-(

• If your proof works, I would ex­pect that Omega also knows the agent is con­sis­tent and can fol­low the same logic, and so the UDT agent two-boxes on New­comb’s prob­lem. Un­less you use a ver­sion of UDT that (effec­tively) op­ti­mizes over de­ci­sions rather than ac­tions (like TDT), which would solve both prob­lems.

EDIT: On solv­ing both prob­lems: my un­der­stand­ing of UDT comes from AlephNeil’s post. If you look at his “gen­er­al­iza­tion 2,” it is ex­actly what I mean by a prob­lem where you need to op­ti­mize over de­ci­sions rather than ac­tions—and he claims that a UDT agent does so 5 di­a­grams later—that is, treats the ac­tion as also “con­trol­ling tele­pathic robots.”

So go­ing by that un­der­stand­ing of UDT, cous­inIt’s proof is in­cor­rect. If we can find proofs shorter than N, we can treat the pre­dic­tor as a “robot,” and so two-box­ing is re­garded as worse than one-box­ing if it “con­trols the robot” into not filling the box. So to pre­dict the agent’s ac­tion we prob­a­bly need to fully define the prob­lem—what does the pre­dic­tor do in cases with no proofs and in cases where the agent’s ac­tion de­pends on the pre­dic­tor’s?

• If your proof works, I would ex­pect that Omega also knows the agent is con­sis­tent and can fol­low the same logic

Omega knows ev­ery­thing but un­for­tu­nately he isn’t available right now. We are stuck with a far more limited pre­dic­tor.

• The refer­ence to Omega comes from con­trast­ing this post with prior claims that a UDT agent will one-box on new­comb’s prob­lem.

• UDT may two-box in the above sce­nario if it simu­lates the pre­dic­tor once, but what if the UDT agent simu­lates the pre­dic­tor twice, simu­lates it­self us­ing the above rea­son­ing and two-box­ing in simu­la­tion 1, and simu­lates it­self one-box­ing for what­ever rea­son in simu­la­tion 2? The UDT agent that one-boxes “for what­ever rea­son” does bet­ter, and thus the real UDT agent will re­al­ize this upon run­ning these 2 simu­la­tions and one-box, which the pre­dic­tor will rea­son that it would.