Doubts about Updatelessness

Epistemic sta­tus: crys­tal­liz­ing un­cer­tain lore

(this is mostly a writeup of a dis­cus­sion with Abram, plus some ad­di­tional thoughts of my own)

Note: Coun­ter­log­i­cal Mug­ging will be used as a term for the coun­ter­fac­tual mug­ging prob­lem that uses a digit of some math­e­mat­i­cal con­stant.

Lately there’s been a bit more doubt about up­date­less­ness (in full gen­er­al­ity in­stead of the par­tial ver­sion at­tained by policy se­lec­tion) as be­ing an achiev­able desider­ata. This came from think­ing a bit more about what uni­ver­sal in­duc­tors do on the coun­ter­log­i­cal mug­ging prob­lem.

A uni­ver­sal in­duc­tor can be thought of as “what hap­pens when you take a log­i­cal in­duc­tor, but don’t show it any­thing”, sub­ject to the ex­tra con­straint that it’s a prob­a­bil­ity dis­tri­bu­tion at ev­ery stage. Sur­pris­ingly enough, this al­lows us to find out in­for­ma­tion about ar­bi­trary con­sis­tent the­o­ries sim­ply by con­di­tion­ing. This is the­o­rem 4.7.2, clo­sure un­der con­di­tion­ing, which says that for any effi­ciently com­putable se­quence, the con­di­tional prob­a­bil­ities of sen­tences will act as a log­i­cal in­duc­tor rel­a­tive to a new de­duc­tive pro­cess that con­tains the sen­tences that are be­ing con­di­tioned on. Con­di­tion­ing sub­sti­tutes for hav­ing an up­stream de­duc­tive pro­cess. There­fore, if you fix some map­ping of bit­string places to sen­tences, and con­di­tion on the­o­rems of PA, it acts as a log­i­cal in­duc­tor over PA.

An in­ter­est­ing thing hap­pens when a suffi­ciently ad­vanced in­duc­tor (ad­vanced enough to know the digit ahead of time) is placed in a coun­ter­log­i­cal mug­ging sce­nario.

Con­sider the prob­lem where the agent is us­ing the limit of a uni­ver­sal in­duc­tor, , and it re­cieves/​is con­di­tioned on the first digits of the bi­nary ex­pan­sion of , so the agent is us­ing as its prob­a­bil­ity dis­tri­bu­tion. The agent must se­lect a policy , which is just a de­ci­sion whether to pay the mug­ging or not when asked. The mug­ging will oc­cur on a dis­tant digit of , , where is some fast-grow­ing func­tion.

And omega’s al­gorithm is:

if , then if , +2 dol­lars to agent

if , then if , −1 dol­lar to agent

Now, let’s com­pute the pay­offs for the 2 pos­si­ble poli­cies.

Now, to be­gin with, be­cause the bi­nary ex­pan­sion of has pos­i­tive prob­a­bil­ity in the re­sult­ing semimea­sure, then for a suffi­ciently long-run­ning uni­ver­sal in­duc­tor, if they con­di­tion on a suffi­ciently long ini­tial se­quence of digits, they’ll as­sign high prob­a­bil­ity to the true dis­tant digit.

So, if the ini­tial seg­ment is suffi­ciently long, and , then the pay­off of is ~, so the agent will pay up if asked. If , then the pay­off of is ~, so the agent will re­fuse to pay up.

So a uni­ver­sal in­duc­tor limit, when given the prob­lem set­ting of “omega asks you about a dis­tant digit of a se­quence”, will de­cline the log­i­cal mug­ging ex­actly when omega is pay­ing up in a non-real/​very low prob­a­bil­ity world, and suc­cess­fully bluff omega into pay­ing up when it ac­tu­ally gets a re­ward in the real world for it. This is the best out­come! It doesn’t throw away money on un­real wor­lds, but man­ages to get (that par­tic­u­lar type) of omega to give it money any­ways in the real world some­times. We could think of it as go­ing “well, I know the true digit, so it’s nearly costless to me, in ex­pected value, to adopt a policy that pays up when it sees a digit out of place”. A uni­ver­sal in­duc­tor limit can rec­og­nize when it’s in­side a coun­ter­log­i­cal and act ac­cord­ingly.

Trans­lated out of the log­i­cal in­duc­tor set­ting, the agent is es­sen­tially go­ing “I’ll com­pute the hard-to-com­pute-fact, and it’ll come out the same way as it does in re­al­ity, so I can figure out that I’m in a coun­ter­log­i­cal from the in­side”. Now, in­tu­itively, this doesn’t work be­cause, in the coun­ter­log­i­cal world where the dis­tant digit is some­thing else, your com­pu­ta­tion of it would be spoofed, and you wouldn’t know you’re in a coun­ter­log­i­cal. Maybe the traders would just make differ­ent trades, if the fabric of math was differ­ent. How­ever, uni­ver­sal in­duc­tors are sur­pris­ingly hard to fool in this way, as we’ll see.

To take a brief di­gres­sion, for some real-world al­gorithms, there are some coun­ter­log­i­cals that just make ab­solutely no sense. If the al­gorithm for policy se­lec­tion, say, used the sin­gu­lar-value de­com­po­si­tion of a ma­trix, then it doesn’t make sense to ask what the al­gorithm would out­put in the coun­ter­log­i­cal where there is no sin­gu­lar-value-de­com­po­si­tion. Chang­ing ba­sic prop­er­ties of ma­tri­ces would al­ter too many things.

Now, be­cause all poly-time pat­terns are be­ing enu­mer­ated by traders, there are some traders com­put­ing us­ing well-known al­gorithms, some traders com­put­ing us­ing al­gorithms that haven’t been in­vented yet, some com­put­ing through ho­mo­mor­phi­cally en­crypted al­gorithms, oth­ers ap­prox­i­mat­ing the digits by monte-carlo simu­la­tion, and oth­ers com­put­ing by an al­gorithm that con­stantly checks it­self and if bits are ran­domly changed in its op­er­a­tion it re­turns an er­ror mes­sage. There can also be fam­i­lies of al­gorithms that use trades on a lit­tle-traded sen­tence to en­sure that the fixed-point price en­codes digits of some­how. And there can be an­other al­gorithm ex­tract­ing late digits of that sen­tence…. There’s no clean coun­ter­log­i­cal for uni­ver­sal in­duc­tors, be­cause the set of all traders is too en­tan­gled with all ar­eas of math, just like the ma­trix ex­am­ple. In or­der to tell a lie, omega has to lie about other stuff en­tan­gled with that, and other stuff en­tan­gled with that, un­til it es­ca­lates to all of math.

How­ever, it is pos­si­ble to imag­ine omega just pick­ing out the most ob­vi­ous in­vo­ca­tions of that it can find by some means, and leav­ing the rest alone, and run­ning things for­ward. Then there is a spoofed prior where some com­pu­ta­tions out­put some­thing they don’t.

Now, if the agent knows the dis­tant digit of , and knows how omega will al­ter the prob­a­bil­ity dis­tri­bu­tion, the best move is to self-mod­ify to a de­ci­sion the­ory that pays up ex­actly when the prob­a­bil­ity dis­tri­bu­tion is spoofed in a way in­dica­tive of omega simu­lat­ing it.

This only ap­plies if omega is go­ing to simu­late the agent in the coun­ter­log­i­cal af­ter it self-mod­ifies. If omega goes “ah, in the world where the digit of is 0, the agent will see this be­fore the game gets un­der­way, and self-mod­ify to re­fuse to pay up”, and in the true world, the rele­vant digit of is 1, then the agent will get no money.

How­ever, coun­ter­log­i­cal mug­ging seems to fall into the cat­e­gory of omega re­ward­ing or pe­nal­iz­ing the rit­ual of cog­ni­tion the agent does, in­stead of what the agent ac­tu­ally does.

To be more spe­cific, imag­ine an in­duc­tor A where most of the trader mass is com­prised of traders which com­pute in an ob­scured way that omega’s coun­ter­log­i­cal surgery can’t pick up. And imag­ine an­other in­duc­tor B where most of the trader mass is com­prised of traders that omega’s coun­ter­log­i­cal surgery changes the be­hav­ior of. Then for the first trader, omega would go “in the world where the digit of is 0, this in­duc­tor still as­signs high prob­a­bil­ity to the digit of be­ing 1, for some rea­son, and it would pay up”. And for the sec­ond trader, omega would go “in the world where the digit of is 0, this in­duc­tor as­signs high prob­a­bil­ity to the digit of be­ing 0, so it wouldn’t pay up”

Back in the real world (as­sum­ing the digit of is 1), in­duc­tor A gets re­warded with 1 dol­lar, and in­duc­tor B gets re­warded with noth­ing, due to de­tails of the cog­ni­tion of A and B which af­fected what omega thought they would do in the coun­ter­log­i­cal. This in­di­cates that it’s just an un­fair prob­lem, and suffi­ciently hard cases of mixed-up­side up­date­less­ness can be solved by policy se­lec­tion/​self mod­ifi­ca­tion, but not in a re­ally prin­ci­pled way, be­cause coun­ter­log­i­cals don’t ex­ist.

So, to sum­ma­rize, up­date­less­ness for a uni­ver­sal in­duc­tor is quite tricky to define, be­cause it is se­cretly us­ing the “true prop­er­ties of re­al­ity” in a way that can’t be cleanly iso­lated. And if omega is go­ing through and tweak­ing the traders so they out­put some­thing other than what they out­put, then omega is ba­si­cally tam­per­ing with the agent’s prior! It’s un­rea­son­able to ex­pect the agent to adopt a policy that works even in cases where it is ada­p­ated to have an un­rea­son­able prob­a­bil­ity dis­tri­bu­tion by some ex­ter­nal agent. (put an­other way, if some­thing is simu­lat­ing what you would do if you had a prob­a­bil­ity dis­tri­bu­tion that thought the moon was go­ing to crash into earth, and mak­ing de­ci­sions based on that, then you prob­a­bly can’t con­trol that ver­sion of your­self)

#Learn­ing as Much as Possible

Why were we in­ter­ested in mixed-up­side up­date­less­ness in the first place? Reflec­tive con­sis­tency. The stan­dard story goes some­thing like this. “If you have an agent that is learn­ing about the world over time, there’s an early self who is un­cer­tain about which way things will go. It can be thought of as hav­ing a car­ing mea­sure (prob­a­bil­ity x util­ity) over im­pos­si­ble pos­si­ble wor­lds, and as it learns more, this changes, so in cases where one pos­si­ble world can in­fluence an­other, there can be cases where the fu­ture self of the agent would take ac­tions that are pre­dictably bad from the per­spec­tive of the cur­rent agent, so it would self-mod­ify to re­move this prop­erty.”

I have a few doubts about the “car­ing mea­sure” pic­ture of policy se­lec­tion. The first doubt is that, for an agent that doesn’t know their own de­ci­sion in ad­vance, the “up­date­less car­ing mea­sure” ac­tu­ally does change, but only once. (when the agent imag­ines se­lect­ing a cer­tain policy, it’s set­ting the car­ing mea­sure of all the wor­lds where it se­lects a differ­ent policy to 0, and renor­mal­iz­ing). This feels hacky and in­el­e­gant. It feels like there are more is­sues with this pic­ture of up­date­less­ness, but it’s a topic for an­other day.

The re­flec­tive con­sis­tency ar­gu­ment ac­tu­ally isn’t quite as force­ful as it seems at first glance, given a cer­tain con­di­tion on the im­pos­si­ble pos­si­ble wor­lds. We’ll go through a con­crete ex­am­ple first, and then gen­er­al­ize it a bit.

Con­sider the prob­lem where there is an aux­iliary bit. If the aux­iliary bit is 1, the next bit is 1 with 75% prob­a­bil­ity. If it’s 0, the next bit is 0 with 75% prob­a­bil­ity. The agent knows it will be go­ing through a coun­ter­fac­tual mug­ging (+2 dol­lars on 1 if you give me 1 dol­lar on 0) on the sec­ond bit, but it is given the op­tion to peek at the aux­iliary bit.

It’s to the agent’s benefit if it looks at the aux­iliary bit! If we con­sider a set­ting where the agent pays up 1 dol­lar to gain 2 dol­lars, the policy of just un­con­di­tion­ally pay­ing up gets dol­lars. If we con­sider the policy where the agent re­fuses to pay if the aux­iliary bit is 0, and pays up if the aux­iliary bit is 1 (and a 0 shows up), then this policy earns dol­lars.

This only ap­plies if, in omega’s coun­ter­fac­tual, the aux­iliary bit came up the same way as it did in re­al­ity. If omega’s coun­ter­fac­tual (where the fi­nal bit is 0) has the aux­iliary bit be­ing 0, the best move is to just se­lect a policy be­fore see­ing the aux­iliary bit.

Time to gen­er­al­ize.

Let’s say we’ve got a space of in­for­ma­tion equipped with some prob­a­bil­ity mea­sure , an­other space of states equipped with the prob­a­bil­ity mea­sure , and there’s some space of ac­tions , and the agent is choos­ing a par­tial policy of type , the set of all of them is , and the agent gets a re­ward .

There’s a key as­sump­tion that’s im­plicit in this prob­lem setup. Be­cause the re­ward only de­pends on the par­tial policy, and the event in Y, this for­bids coun­ter­fac­tual mug­gings where the coun­ter­fac­tual in­volves the event from X com­ing out differ­ently. X can be in­tu­itively thought of as the space of in­for­ma­tion that will be as­sumed to hold in some prob­lem where the agent is re­warded or not, based on what it would do un­der var­i­ous out­comes se­lected from . So, in the above coun­ter­fac­tual mug­ging ex­am­ple, would be the state of the aux­iliary bit, and would be the state of the bit that the agent is ac­tu­ally be­ing coun­ter­fac­tu­ally mugged with.

(this is just be­cause ei­ther , or there’s a that gets an even higher re­ward)

There­fore, the ex­pected value of sam­ple in­for­ma­tion is always non­nega­tive. This is just the usual “value of in­for­ma­tion is non­nega­tive” proof, but swap­ping out the space of ac­tions for the space of par­tial poli­cies. Be­cause there’s an as­sump­tion that we’ll only face coun­ter­fac­tual mug­gings on in­for­ma­tion in , the op­ti­mal policy can be found by just look­ing at the in­for­ma­tion from and do­ing policy se­lec­tion over ac­cord­ingly.

So, for all the math fea­tures that omega’s coun­ter­log­i­cal pre­serves, this seems to im­ply (the the­o­rem doesn’t quite ap­ply to the log­i­cal in­duc­tion set­ting be­cause it used clas­si­cal prob­a­bil­ity mea­sures) that the agent will try to up­date as far as it pos­si­bly can on in­for­ma­tion it won’t be pe­nal­ized for, be­fore do­ing policy se­lec­tion.

Due to the fact that ad­vanced agents will try their best to fool omega into con­clud­ing that they paid up, whether by self-mod­ifi­ca­tion, or try­ing to rea­son that they’re in a coun­ter­log­i­cal from the in­side, and the fact that these coun­ter­log­i­cals don’t ac­tu­ally ex­ist but are de­pen­dent on the im­ple­men­ta­tion of the agent, and the fact that the best policy is for the agent to up­date on ev­ery­thing that will re­main the same in­side the pre­dic­tor’s what-if… It seems to in­di­cate that prob­lems in­volv­ing log­i­cal up­date­less­ness have more to do with con­trol­ling the com­pu­ta­tion that the pre­dic­tor is run­ning, in­stead of car­ing about math­e­mat­i­cally im­pos­si­ble wor­lds.

No nominations.
No reviews.