Should logical probabilities be updateless too?

(This post doesn’t re­quire much math. It’s very spec­u­la­tive and prob­a­bly con­fused.)

Wei Dai came up with a prob­lem that seems equiv­a­lent to a var­i­ant of Coun­ter­fac­tual Mug­ging with some added twists:

  • the coin­flip is “log­i­cal”, e.g. the par­ity of the mil­lionth digit of pi;

  • af­ter you re­ceive the offer, you will have enough re­sources to calcu­late the coin­flip’s out­come your­self;

  • but you need to figure out the cor­rect de­ci­sion al­gorithm ahead of time, when you don’t have these re­sources and are still un­cer­tain about the coin­flip’s out­come.

If you give 5050 chances now to the mil­lionth digit of pi be­ing even or odd, you prob­a­bly want to write the de­ci­sion al­gorithm so it agrees to pay up later even when faced with a proof that the mil­lionth digit of pi is even. But from the de­ci­sion al­gorithm’s point of view, the situ­a­tion looks more like be­ing asked to pay up be­cause 2+2=4. How do we re­solve this ten­sion?

One of the main sel­l­ing points of TDT-style de­ci­sion the­o­ries is elimi­nat­ing the need for pre­com­mit­ment. You’re sup­posed to always do what you would have pre­com­mit­ted to do­ing, even if it doesn’t seem like a very good idea af­ter you’ve done your Bayesian up­dates. UDT solves Coun­ter­fac­tual Mug­ging and similar prob­lems by be­ing “up­date­less”, so you keep car­ing about pos­si­ble wor­lds in ac­cor­dance with their apri­ori prob­a­bil­ities re­gard­less of which world you end up in.

If we take the above prob­lem at face value, it seems to tell us that UDT should treat log­i­cal un­cer­tainty up­date­lessly too, and keep car­ing about log­i­cally im­pos­si­ble wor­lds in ac­cor­dance with their apri­ori log­i­cal prob­a­bil­ities. It seems to hint that UDT should be coded from the start with a “log­i­cal prior” over math­e­mat­i­cal state­ments, which en­codes the cre­ator’s ar­bi­trary “log­i­cal de­grees of car­ing”, just like its reg­u­lar prior en­codes the cre­ator’s ar­bi­trary de­grees of car­ing over physics. Then the AI must keep fol­low­ing that prior for­ever af­ter. But that’s a very tall or­der. Should you re­ally keep car­ing about log­i­cally im­pos­si­ble wor­lds where 2+2=5, and ac­cept bar­gains that help copies of you in such wor­lds, even af­ter you calcu­late that 2+2=4?

That con­clu­sion is pretty startling, but con­sider what hap­pens if you re­ject it:

  1. Precom­mit­ment can be mod­eled as a de­ci­sion prob­lem where an AI is asked to write a suc­ces­sor AI.

  2. Imag­ine the AI is asked to write a pro­gram P that will be faced with Coun­ter­fac­tual Mug­ging with a log­i­cal coin. The AI doesn’t have enough re­sources to calcu­late the coin’s out­come, but P will have as much com­put­ing power as needed. The re­sult­ing util­ity goes to the AI.

  3. Writ­ing P is equiv­a­lent to sup­ply­ing one bit: should P pay up if asked?

  4. Sup­ply­ing that bit is equiv­a­lent to ac­cept­ing or re­fus­ing the bet “win $10000 if the mil­lionth digit of pi is odd, lose $100 if it’s even”.

So if your AI treats log­i­cal un­cer­tainty similarly enough to prob­a­bil­ities that it can make bets on digits of pi, re­flec­tive con­sis­tency seems to force it to have an un­chang­ing “log­i­cal prior”, and keep pay­ing up in Coun­ter­fac­tual Mug­ging even when the log­i­cal coin­flip looks as ob­vi­ous to the AI as 2+2=4. Is there any way to es­cape this con­clu­sion? (Nesov has an idea, but I can’t parse it yet.) And what could a for­mal­iza­tion of “log­i­cal pri­ors” pos­si­bly look like?