# [Question] Is Agent Simulates Predictor a “fair” problem?

It’s a sim­ple ques­tion, but I think it might help if I add in con­text. In the pa­per in­tro­duc­ing Func­tional De­ci­sion The­ory, it is noted that it is im­pos­si­ble to de­sign an al­gorithm that can perform well on all de­ci­sion prob­lems since some of them can be speci­fied to be blatantly un­fair, ie. pun­ish ev­ery agent that isn’t an alpha­bet­i­cal de­ci­sion the­o­rist.

The ques­tion then arises, how do we define which prob­lems are or are not fair? We start by not­ing that some peo­ple con­sider New­comb’s-like prob­lems to be un­fair since your out­come de­pends on a pre­dic­tor’s pre­dic­tion, which is rooted in an anal­y­sis of your al­gorithm. So what makes this case any differ­ent from only re­ward­ing the alpha­bet­i­cal de­ci­sion the­o­rist?

The pa­per an­swers that the pre­dic­tion only de­pends on the de­ci­sion you end up mak­ing and that any other in­ter­nal de­tails are ig­nored. So it only cares about your de­ci­sion and not how you come to it, the prob­lem seems fair. I’m in­clined to agree with this rea­son­ing, but a similar line of rea­son­ing doesn’t seem to hold with Agent Si­mu­lates Pre­dic­tor. Here the al­gorithm you use is rele­vant as the pre­dic­tor can only pre­dict the agent if it’s al­gorithm is less than a cer­tain level of com­plex­ity, oth­er­wise it may make a mis­take.

Please note that this ques­tion isn’t about whether this prob­lem is worth con­sid­er­ing; life is of­ten un­fair and we have to deal with it the best that we can. The ques­tion is about whether the prob­lem is “fair”, where I roughly un­der­stand “fair” mean­ing that this is in a cer­tain class of prob­lems that I can’t spec­ify at this mo­ment (I sus­pect it would re­quire its own seper­ate post) where we should be able to achieve the op­ti­mal re­sult in each prob­lem.

• My think­ing about this is that a prob­lem is fair if it cap­tures some as­pect of some real world prob­lem. I be­lieve Gary Drescher came up with ASP as a dis­til­la­tion of the fol­low­ing prob­lem, which it­self tries to cap­ture some es­sense of bar­gain­ing in the real world (similar to how New­comb’s Prob­lem is a dis­til­la­tion of Pri­soner’s Dilemma, which tries to cap­ture some es­sense of co­op­er­a­tion in the real world):

Con­sider a sim­ple two-player game, de­scribed by Slep­nev (2011), played by a hu­man and an agent which is ca­pa­ble of fully simu­lat­ing the hu­man and which acts ac­cord­ing to the pre­scrip­tions of UDT. The game works as fol­lows: each player must write down an in­te­ger be­tween 0 and 10. If both num­bers sum to 10 or less, then each player is paid ac­cord­ing to the num­ber that they wrote down. Other­wise, they are paid noth­ing. For ex­am­ple, if one player writes down 4 and the other 3, then the former gets paid \$4 while the lat­ter gets paid \$3. But if both play­ers write down 6, then nei­ther player gets paid. Say the hu­man player rea­sons as fol­lows:

“I don’t quite know how UDT works, but I re­mem­ber hear­ing that it’s a very pow­er­ful pre­dic­tor. So if I de­cide to write down 9, then it will pre­dict this, and it will de­cide to write 1. There­fore, I can write down 9 with­out fear.”

The hu­man writes down 9, and UDT, pre­dict­ing this, pre­scribes writ­ing down 1. This re­sult is un­com­fortable, in that the agent with su­pe­rior pre­dic­tive power “loses” to the “dumber” agent. In this sce­nario, it is al­most as if the hu­man’s lack of abil­ity to pre­dict UDT (while us­ing cor­rect ab­stract rea­son­ing about the UDT al­gorithm) gives the hu­man an “epistemic high ground” or “first mover ad­van­tage.” It seems un­satis­fac­tory that in­creased pre­dic­tive power can harm an agent.

(It looks like the cita­tion here is wrong, since I can’t find a de­scrip­tion of this game in Slep­nev (2011). As far as I know, I was the first per­son to come up with this game as some­thing that UDT seems to han­dle poorly.)

• “My think­ing about this is that a prob­lem is fair if it cap­tures some as­pect of some real world prob­lem”—I would say that you have to ac­cept that the real world can be un­fair, but that doesn’t make real world prob­lems “fair” in the sense ges­tured at in the FDT pa­per. Roughly, it is pos­si­ble to define a broad class of prob­lems such that you can have an al­gorithm that op­ti­mally han­dles all of them, for ex­am­ple if the re­ward only de­pends on your choice or pre­dic­tions of your choice.

“It seems un­satis­fac­tory that in­creased pre­dic­tive power can harm an agent”—that’s just life when in­ter­act­ing with other agents. In­deed, in some games, ex­ceed­ing a cer­tain level of ra­tio­nal­ity pro­vides an in­cen­tive for other play­ers to take you out. That’s un­fair, but that’s life.

• ASP doesn’t seem im­pos­si­ble to solve (in the sense of hav­ing a de­ci­sion the­ory that han­dles it well and not at the ex­pense of do­ing poorly on other prob­lems) so why define a class of “fair” prob­lems that ex­cludes it? (I had an idea that I called UDT2 which I think does bet­ter on it than UDT1.1 but it’s not as el­e­gant as I hoped.) Defin­ing such prob­lem classes may be use­ful for talk­ing about the tech­ni­cal prop­er­ties of spe­cific de­ci­sion the­o­ries, but that doesn’t seem to be what you’re try­ing to do here. The only other mo­ti­va­tion I can think of is find­ing a way to jus­tify not solv­ing cer­tain prob­lems, but I don’t think that makes sense in the case of ASP.

• “ASP doesn’t seem im­pos­si­ble to solve (in the sense of hav­ing a de­ci­sion the­ory that han­dles it well and not at the ex­pense of do­ing poorly on other prob­lems) so why define a class of “fair” prob­lems that ex­cludes it?”—my in­tu­ition is the op­po­site, that do­ing well on such prob­lems means do­ing poorly on oth­ers.

• Can you ex­plain your in­tu­ition? (Even sup­pos­ing your in­tu­ition is cor­rect, it still doesn’t seem like defin­ing a “fair” class of prob­lems is that use­ful. Shouldn’t we in­stead try to find a de­ci­sion the­ory that offers the best trade-offs on the ac­tual dis­tri­bu­tion of de­ci­sion prob­lems that we (or our AIs) will be ex­pected to face?)

To ex­plain my in­tu­ition, sup­pose we had a de­ci­sion the­ory that does well on ASP-like prob­lems and badly on oth­ers, and a sec­ond de­ci­sion the­ory that does badly on ASP-like prob­lems and well on oth­ers, then we can cre­ate a meta de­ci­sion the­ory that first tries to figure out what kind of prob­lem it is fac­ing and then se­lect one of these de­ci­sion the­o­ries to solve it. This meta de­ci­sion the­ory would it­self be a de­ci­sion the­ory that does well on both types of prob­lems so such a de­ci­sion the­ory ought to ex­ist.

BTW, you can quote oth­ers by putting a quote in a sep­a­rate para­graph and putting “>” in front of it.

• It still doesn’t seem like defin­ing a “fair” class of prob­lems is that use­ful”—dis­cov­er­ing one class of fair prob­lems lead to CDT. Another lead to TDT. This the­o­ret­i­cal work is seper­ate from the prob­lem of pro­duc­ing prag­matic al­gorithms that deal with un­fair­ness, but both ap­proaches pro­duce in­sights.

“This meta de­ci­sion the­ory would it­self be a de­ci­sion the­ory that does well on both types of prob­lems so such a de­ci­sion the­ory ought to ex­ist”—I cur­rently have a draft post that does al­low some kinds of re­wards based on al­gorithm in­ter­nals to be con­sid­ered fair and which ba­si­cally does the whole meta-de­ci­sion the­ory thing (that sec­tion of the draft post was writ­ten a few hours af­ter I asked this ques­tion which is why my views in it are slightly differ­ent).

• I’ve defined three classes of “fair” prob­lems for UDT, which are all ba­si­cally equiv­a­lent: sin­gle player ex­ten­sive form games, pro­grams with a halt­ing or­a­cle, and for­mu­las in prov­abil­ity logic. But none of these are plain old pro­grams with­out or­a­cles or stuff. I haven’t been able to define any class of “fair” prob­lems in­volv­ing plain old pro­grams. The most I can do is agree with you: ASP doesn’t seem “fair” in spirit and doesn’t trans­late into any of the classes I men­tioned. This is an open ques­tion—maybe you can find a bet­ter “fair” class!

• There are some for­mal no­tions of fair­ness that in­clude ASP. See Asymp­totic De­ci­sion The­ory.

Here’s one way of think­ing about this. Imag­ine a long se­quence of in­stances of ASP. Both the agent and pre­dic­tor in a later in­stance know what hap­pened in all the ear­lier in­stances (say, be­cause the amount of com­pute available in later in­stances is much higher, such that all pre­vi­ous in­stances can be simu­lated). The pre­dic­tor in ASP is a log­i­cal in­duc­tor pre­dict­ing what the agent will do this time.

Look­ing at the prob­lem this way, it looks pretty fair. Since log­i­cal in­duc­tors can do in­duc­tion, if an agent takes ac­tions ac­cord­ing to a cer­tain policy, then the pre­dic­tor will even­tu­ally learn this, re­gard­less of the agent’s source code. So only the policy mat­ters, not the source code.

• To my mind what seems un­fair about some prob­lems is that they pro­pose pre­dic­tors that, to the best of our knowl­edge, are phys­i­cally im­pos­si­ble, like a New­comb Omega that never makes a mis­take, al­though these are only un­fair in the sense that they de­pict sce­nar­ios we won’t ever en­counter (perfect pre­dic­tors), not that they ask us some­thing math­e­mat­i­cally un­fair.

Other more mun­dane types of un­fair­ness, like where a pre­dic­tor sim­ply de­mands some­thing so spe­cific that no gen­eral al­gorithm could always find a way to satisfy it, seem more fair to me be­cause they are the sorts of things we ac­tu­ally en­counter in the real world. If you haven’t en­coun­tered this sort of thing, just spend some time with a tod­dler, and you will be quickly dis­abused of the no­tion that there could not ex­ist an agent which de­mands im­pos­si­ble things.

• I already ac­knowl­edged in the real post that there ex­ist prob­lems that are un­fair, so I don’t know why you think we dis­agree there.

• I don’t think we dis­agree.

• Where does the ‘Agent Si­mu­lates Pre­dic­tor’ prob­lem come from?

• See my an­swer here. Or see here for a de­scrip­tion of ASP if that’s what you’re ask­ing about.

• How can a pre­dic­tor be un­fair to an al­gorithm that enu­mer­ates pos­si­ble wor­lds and picks the best one, with­out any “de­ci­sion the­ory” what­so­ever? Un­less by “un­fair” you mean some­thing like “you will get a coin that always lands tails, but the heads win, while ev­ery­one else gets a fair coin”

• I don’t quite un­der­stand the ques­tion, but un­fair refers to the en­vi­ron­ment re­quiring the in­ter­nals to be a par­tic­u­lar way. I ac­tu­ally think it is pos­si­ble to al­low some in­ter­nal re­quire­ments to be con­sid­ered fair and I dis­cuss this in one of my draft posts. Nonethe­less, it works as a first ap­prox­i­ma­tion.

• Say you have cer­tain in­for­ma­tion about the world and calcu­late the odds of differ­ent out­comes and their util­ities. For ex­am­ple, in the twin pris­on­ers dilemma the odds of DC and CD are zero, so the choice is be­tween DD and CC. In the New­comb’s prob­lem the odds of get­ting \$1001000 are zero, so the choice is be­tween \$1000000 (one-box) and \$1000 (two-box). In the Death in Da­m­as­cus prob­lem the odds of es­cap­ing Death are zero, so the choice is to spend money on travel or not. What would be a con­crete ex­am­ple of an un­fair prob­lem against this ap­proach?

• I think this com­ment does a bet­ter job of ex­plain­ing the no­tion of fair­ness you’re try­ing to point at than other words here.

• BTW, I pub­lished the draft, al­though fair­ness isn’t the main topic and only comes up to­wards the end.

• It’s im­pos­si­ble to enu­mer­ate pos­si­ble wor­lds and pick the best one with­out a de­ci­sion the­ory, be­cause your de­ci­sion pro­cess gives the same out­put in ev­ery pos­si­ble world where you have a given epistemic state. We ob­vi­ously need coun­ter­fac­tu­als to make de­ci­sions, and the differ­ent de­ci­sion the­o­ries can be seen as differ­ent the­o­ries about how coun­ter­fac­tu­als work.

• [ ]
[deleted]
• [ ]
[deleted]