“UDT2” and “against UD+ASSA”

I’m re­post­ing some old posts that I origi­nally sent to the “de­ci­sion the­ory work­shop” mailing list and the “ev­ery­thing-list”, be­cause I oc­ca­sion­ally want to refer­ence these posts but the former mailing list is pri­vate and the lat­ter one is pub­lic but I can’t figure out how to cre­ate di­rect links to posts that are vie­w­able with­out be­com­ing a mem­ber.

UDT2 is a de­ci­sion the­ory idea that I came up with to try to solve some prob­lems in UDT1.1 how­ever I’m not very happy with it cur­rently. UD+ASSA or UDASSA is an an­thropic rea­son­ing idea that I came up with and then moved away from prior to UDT. See also this post for fur­ther dis­cus­sion of UDASSA.

UDT2 (origi­nally “to­ward a solu­tion of the ‘un­in­ten­tional simu­la­tion’ prob­lem”, 1/​25/​2011)

(I think this ap­proach po­ten­tially solves sev­eral prob­lems be­sides “un­in­ten­tional simu­la­tion” but I’ll start there since it pro­vides the clear­est mo­ti­va­tion.)

I first de­scribed this prob­lem (with­out nam­ing it) at http://​​less­wrong.com/​​lw/​​15z/​​in­gre­di­ents_of_time­less_de­ci­sion_the­ory/​​120y. Here’s a con­densed ver­sion:

Two UDT1 (or UDT1.1) agents play one-shot PD. It’s com­mon knowl­edge that agent A must make a de­ci­sion in 10^100 ticks (com­pu­ta­tion steps), whereas agent B has 3^^^3 ticks. While B is try­ing to de­rive the log­i­cal con­se­quences of re­turn­ing ‘C’ or ‘D’ on the world pro­gram P, it is likely to come up with a proof by simu­la­tion of A’s out­put, af­ter which it will de­cide to play D.

I think what A should have done is (if it were run­ning a smarter de­ci­sion the­ory), in­stead of de­cid­ing di­rectly on C or D, mod­ify it­self into a pro­gram K = “simu­late the origi­nal agents A and B and out­put ‘C’ if and only if both of the simu­lated agents self-mod­ify into K within some time limit”. And B (if it were also run­ning a smarter de­ci­sion the­ory) would also self-mod­ify into K, whether or not it hap­pens to simu­late A’s de­ci­sion to self-mod­ify into K prior to its own self-mod­ifi­ca­tion, and do this be­fore the time limit built into K ex­pires.

So that’s my start­ing in­tu­ition, and I want to try to an­swer: what is this smarter de­ci­sion the­ory? It seems that at least two changes need to be made to UDT1:

  1. An agent must take the space of pos­si­ble de­ci­sions to be the set of pos­si­ble pro­grams it can self-mod­ify into, in­stead of the set of out­puts or in­put/​out­put maps. (This change is needed any­way if we want the agent to be able to self-im­prove in gen­eral.)

  2. An agent must con­sider not just the con­se­quences of even­tu­ally reach­ing some de­ci­sion, but also the con­se­quences of the amount of time it spends on that de­ci­sion. (This change is needed any­way if we want the agent to be eco­nom­i­cal with its com­pu­ta­tional re­sources.)

So, while UDT1 op­ti­mizes over pos­si­ble out­puts to its in­put and UDT1.1 op­ti­mizes over pos­si­ble in­put/​out­put map­pings it could im­ple­ment, UDT2 si­mul­ta­neously op­ti­mizes over pos­si­ble pro­grams to self-mod­ify into and the amount of time (in com­pu­ta­tion steps) to spend be­fore self-mod­ifi­ca­tion.

How to for­mu­late UDT2 more pre­cisely is not en­tirely clear yet. As­sum­ing the ex­is­tence of a math in­tu­ition mod­ule which runs con­tin­u­ously to re­fine its log­i­cal un­cer­tain­ties, one idea is to pe­ri­od­i­cally in­ter­rupt it, and dur­ing the in­ter­rupt, ask it about the log­i­cal con­se­quences of state­ments of the form “S, upon in­put X, be­comes T at time t” for all pro­grams T and t be­ing the time at the end of the cur­rent in­ter­rupt. At the end of the in­ter­rupt, re­turn T(X) for the T that has the high­est ex­pected util­ity ac­cord­ing to the math in­tu­ition mod­ule’s “be­liefs”. (One of these Ts should be equiv­a­lent to “let the math in­tu­ition mod­ule run for an­other pe­riod and ask again later”.)

Sup­pose agents A and B above are run­ning UDT2 in­stead of UDT1. It seems plau­si­ble that A would de­cide to self-mod­ify into K, in which case B would not suffer from the “un­in­ten­tional simu­la­tion” prob­lem, since if it does prove that A self-mod­ifies into K, it can then eas­ily prove that if B does not self-mod­ify into K within K’s time limit, A will play D, and there­fore “B be­comes K at time t” is the best choice for some t.

It also seems that UDT2 is able to solve the prob­lem that mo­ti­vated UDT1.1 with­out hav­ing “ig­nore the in­put un­til the end” hard-coded into it, which per­haps makes it a bet­ter de­par­ture point than UDT1.1 for think­ing about bar­gain­ing prob­lems. Re­call that prob­lem was:

Sup­pose Omega ap­pears and tells you that you have just been copied, and each copy has been as­signed a differ­ent num­ber, ei­ther 1 or 2. Your num­ber hap­pens to be 1. You can choose be­tween op­tion A or op­tion B. If the two copies choose differ­ent op­tions with­out talk­ing to each other, then each gets $10, oth­er­wise they get $0.

The idea here is that both agents, run­ning UDT2, would self-mod­ify into T = “re­turn A if in­put is 1, oth­er­wise re­turn B” if their math in­tu­ition mod­ules say that “S, upon in­put 1, be­comes T” is pos­i­tively cor­re­lated with “S, upon in­put 2, be­comes T”, which seems rea­son­able to as­sume.

I think UDT2 also cor­rectly solves Gary’s Agent-Si­mu­lates-Pre­dic­tor prob­lem and my “two more challeng­ing New­comb var­i­ants”. (I’ll skip the de­tails un­less some­one asks.)

To me, this seems to be the most promis­ing ap­proach to try to fix some of UDT1′s prob­lems. I’m cu­ri­ous if oth­ers agree/​dis­agree, or if any­one is work­ing on other ideas.

two more challeng­ing New­comb var­i­ants (4/​12/​2010)

On Apr 11, 2:45 pm, Vladimir Nesov wrote:

There, I need the en­vi­ron­ment to be pre­sented as func­tion of the agent’s strat­egy. Since pre­dic­tor is part of agent’s en­vi­ron­ment, it has to be seen as func­tion of the agent’s strat­egy as well, not as func­tion of the agent’s source code.

It’s doesn’t seem pos­si­ble, in gen­eral, to rep­re­sent the en­vi­ron­ment as a func­tion of the agent’s strat­egy. I ap­plied Gary’s trick of con­vert­ing multi-agent prob­lems into New­comb var­i­ants to come up with two more sin­gle-agent prob­lems that UDT1 (and per­haps Nesov’s for­mu­la­tion of UDT as well) does badly on.

In the first New­comb var­i­ant, Omega says he used a pre­dic­tor that did an ex­act simu­la­tion of you for 10^100 ticks and out­puts “one-box” if and only if the simu­la­tion out­puts “one-box” within 10^100 ticks. While ac­tu­ally mak­ing the de­ci­sion, you are given 10^200 free ticks.

In the sec­ond ex­am­ple (which is sort of the op­po­site of the above), Omega shows you a mil­lion boxes, and you get to choose one. He says he used 10^100 ticks and what­ever com­pu­ta­tional short­cuts he could find to pre­dict your de­ci­sion, and put $1 mil­lion in ev­ery box ex­cept the one he pre­dicted you would choose. You get 10^100 + 10^50 ticks to make your de­ci­sion, but you don’t get a copy of Omega’s pre­dic­tor’s source code.

In these two ex­am­ples, the ac­tual de­ci­sion is not more im­por­tant than how pre­dictable or un­pre­dictable the com­pu­ta­tion that leads to the de­ci­sion is. More gen­er­ally, it seems that many prop­er­ties of the de­ci­sion com­pu­ta­tion might af­fect the en­vi­ron­ment (in a way that needs to be taken into ac­count) be­sides its fi­nal out­put.

At this point, I’m not quite sure if UDT1 fails on these two prob­lems for the same rea­son it fails on Gary’s prob­lem. In both my first prob­lem and Gary’s prob­lem, UDT1 seems to spend too long “think­ing” be­fore mak­ing a de­ci­sion, but that might just be a su­perfi­cial similar­ity.

against UD+ASSA, part 1 (9/​26/​2007)

I promised to sum­ma­rize why I moved away from the philo­soph­i­cal po­si­tion that Hal Fin­ney calls UD+ASSA. Here’s part 1, where I ar­gue against ASSA. Part 2 will cover UD.

Con­sider the fol­low­ing thought ex­per­i­ment. Sup­pose your brain has been de­struc­tively scanned and up­loaded into a com­puter by a mad sci­en­tist. Thus you find your­self im­pris­oned in a com­puter simu­la­tion. The mad sci­en­tist tells you that you have no hope of es­cap­ing, but he will fi­nan­cially sup­port your sur­vivors (spouse and chil­dren) if you win a cer­tain game, which works as fol­lows. He will throw a fair 10-sided die with sides la­beled 0 to 9. You are to guess whether the die landed with the 0 side up or not. But here’s a twist, if it does land with “0” up, he’ll im­me­di­ately make 90 du­pli­cate copies of you be­fore you get a chance to an­swer, and the copies will all run in par­allel. All of the simu­la­tions are iden­ti­cal and de­ter­minis­tic, so all 91 copies (as well as the 9 copies in the other uni­verses) must give the same an­swer.

ASSA im­plies that just be­fore you an­swer, you should think that you have 0.91 prob­a­bil­ity of be­ing in the uni­verse with “0” up. Does that mean you should guess “yes”? Well, I wouldn’t. If I was in that situ­a­tion, I’d think “If I an­swer ‘no’ my sur­vivors are fi­nan­cially sup­ported in 9 times as many uni­verses as if I an­swer ‘yes’, so I should an­swer ‘no’.” How many copies of me ex­ist in each uni­verse doesn’t mat­ter, since it doesn’t af­fect the out­come that I’m in­ter­ested in.

No­tice that in this thought ex­per­i­ment my rea­son­ing men­tions noth­ing about prob­a­bil­ities. I’m not in­ter­ested in “my” mea­sure, but in the mea­sures of the out­comes that I care about. I think ASSA holds in­tu­itive ap­peal to us, be­cause his­tor­i­cally, copy­ing of minds isn’t pos­si­ble, so the mea­sure of one’s ob­server-mo­ment and the mea­sures of the out­comes that are causally re­lated to one’s de­ci­sions are strictly pro­por­tional. In that situ­a­tion, it makes sense to con­tinue to think in terms of sub­jec­tive prob­a­bil­ities defined as ra­tios of mea­sures of ob­server-mo­ments. But in the more gen­eral case, ASSA doesn’t hold up.

against UD+ASSA, part 2 (9/​26/​2007)

In part one I ar­gued against ASSA. Here I first sum­ma­rize my ar­gu­ment against UD, then against the gen­eral pos­si­bil­ity of any sin­gle ob­jec­tive mea­sure.

  1. There is an in­finite num­ber of uni­ver­sal Tur­ing ma­chines, so there is an in­finite num­ber of UD. If we want to use one UD as an ob­jec­tive mea­sure, there has to be a uni­ver­sal Tur­ing ma­chine that is some­how uniquely suit­able for this pur­pose. Why that UTM and not some other? We don’t even know what that jus­tifi­ca­tion might look like.

  2. Com­pu­ta­tion is just a small sub­set of math. I knew this was the case, hav­ing learned about or­a­cle ma­chines in my the­ory of com­pu­ta­tion class. But I didn’t re­al­ize just how small a sub­set un­til I read The­ory of Re­cur­sive Func­tions and Effec­tive Com­putabil­ity, by Hartley Rogers. Given that there is so much math­e­mat­i­cal struc­ture out­side of com­pu­ta­tion, why should they not ex­ist? How can we be sure that they don’t ex­ist? If we are not sure, then we have to take the pos­si­bil­ity of their ex­is­tence into ac­count when mak­ing de­ci­sions, in which case we still need a mea­sure in which they have non-zero mea­sures.

  3. At this point I started look­ing for an­other mea­sure that can re­place UD. I came up with what I called “set the­o­retic uni­ver­sal mea­sure”, where the mea­sure of a set is in­versely re­lated to the length of its de­scrip­tion in a for­mal set the­ory. Set the­ory cov­ers a lot more math, but oth­er­wise we still have the same prob­lems. Which for­mal set the­ory do we use? And how can we be sure that all struc­tures that can pos­si­bly ex­ist pos­si­ble can be for­mal­ized as sets? (An ex­am­ple of some­thing that can’t would be a de­vice that can de­cide the truth value of any set the­o­retic state­ment.)

  4. Be­sides the lack of good can­di­dates, the demise of ASSA means we don’t need an ob­jec­tive mea­sure any­more. There is no longer an is­sue of sam­pling, so we don’t need an ob­jec­tive mea­sure to sam­ple from. The thought ex­per­i­ment in part 1 of “against UD+ASSA” points out that in gen­eral, it’s not the mea­sure of one’s ob­server-mo­ment that mat­ters, but the mea­sures of the out­comes that are causally re­lated to one’s de­ci­sions. Those mea­sures can be in­ter­preted as in­di­ca­tions of how much one cares about the out­comes, and there­fore can be sub­jec­tive.

So where does this chain of thought lead us? I think UD+ASSA, while flawed, can serve as a kind of step­ping stone to­wards a more gen­eral ra­tio­nal­ity. Some­how UD+ASSA is more in­tu­itively ap­peal­ing, whereas truly gen­er­al­ized ra­tio­nal­ity looks very alien to us. I’m not sure any of us can re­ally prac­tice the lat­ter, even if we can ac­cept it philo­soph­i­cally. But per­haps our de­scen­dents can. One dan­ger I see with UD+ASSA is we’ll pro­gram it into an AI, and the AI will be for­ever stuck with the idea that non-com­putable phe­nomenon can’t ex­ist, no mat­ter what ev­i­dence it might ob­serve.