Formalising decision theory is hard

In this post, I clar­ify how far we are from a com­plete solu­tion to de­ci­sion the­ory, and the way in which high-level philos­o­phy re­lates to the math­e­mat­i­cal for­mal­ism. I’ve per­son­ally been con­fused about this in the past, and I think it could be use­ful to peo­ple who ca­su­ally fol­lows the field. I also link to some less well-pub­li­cized ap­proaches.

The first dis­agree­ment you might en­counter when read­ing about al­ign­ment-re­lated de­ci­sion the­ory is the dis­agree­ment be­tween Causal De­ci­sion The­ory (CDT), Ev­i­den­tial De­ci­sion The­ory (EDT), and differ­ent log­i­cal de­ci­sion the­o­ries emerg­ing from MIRI and less­wrong, such as Func­tional De­ci­sion The­ory (FDT) and Up­date­less De­ci­sion The­ory (UDT). This is char­ac­ter­ized by dis­agree­ments on how to act in prob­lems such as New­comb’s prob­lem, smok­ing le­sion and the pris­oner’s dilemma. MIRI’s pa­per on FDT rep­re­sents this de­bate from MIRI’s per­spec­tive, and, as ex­em­plified by the philoso­pher who refer­eed that pa­per, aca­demic philos­o­phy is far from hav­ing set­tled on how to act in these prob­lems.

I’m quite con­fi­dent that the FDT-pa­per gets those prob­lems right, and as such, I used to be pretty happy with the state of de­ci­sion the­ory. Sure, the FDT-pa­per men­tions log­i­cal coun­ter­fac­tu­als as a prob­lem, and sure, the pa­per only talks about a few toy prob­lems, but the rest is just for­mal­ism, right?

As it turns out, there are a few caveats to this:

  1. CDT, EDT, FDT, and UDT are high-level clusters of ways to go about de­ci­sion the­ory. They have mul­ti­ple at­tempted for­mal­isms, and it’s un­clear to what ex­tent differ­ent for­mal­isms recom­mend the same things. For FDT and UDT in par­tic­u­lar, it’s un­clear whether any one at­tempted for­mal­ism (e.g. the graph­i­cal mod­els in the FDT pa­per) will be suc­cess­ful. This is be­cause:

  2. Log­i­cal coun­ter­fac­tu­als is a re­ally difficult prob­lem, and it’s un­clear whether there ex­ists a nat­u­ral solu­tion. More­over, any non-nat­u­ral, ar­bi­trary de­tails in po­ten­tial solu­tions are prob­le­matic, since some for­mal­isms re­quire ev­ery­body to know that ev­ery­body uses suffi­ciently similar al­gorithms. This high­lights that:

  3. The toy prob­lems are rad­i­cally sim­pler than ac­tual prob­lems that agents might en­counter in the fu­ture. For ex­am­ple, it’s un­clear how they gen­er­al­ise to acausal co­op­er­a­tion be­tween differ­ent civil­i­sa­tions. Such civil­i­sa­tions could use im­plic­itly im­ple­mented al­gorithms that are more or less similar to each oth­ers’, may or may not be try­ing and suc­ceed­ing to pre­dict each oth­ers’ ac­tions, and might be in asym­met­ric situ­a­tions with far more op­tions than just co­op­er­at­ing and defect­ing. This poses a lot of prob­lems that don’t ap­pear when you con­sider pure copies in sym­met­ric situ­a­tions, or pure pre­dic­tors with known in­ten­tions.

As a con­se­quence, know­ing what philo­soph­i­cal po­si­tion to take in the toy prob­lems is only the be­gin­ning. There’s no for­mal­ised the­ory that re­turns the right an­swers to all of them yet, and if we ever find a suit­able for­mal­ism, it’s very un­clear how it will gen­er­al­ise.

If you want to dig into this more, Abram Dem­ski men­tions some open prob­lems in this com­ment. Some at­tempts at mak­ing bet­ter for­mal­i­sa­tions in­cludes Log­i­cal In­duc­tion De­ci­sion The­ory (which uses the same de­ci­sion pro­ce­dure as ev­i­den­tial de­ci­sion the­ory, but gets log­i­cal un­cer­tainty by us­ing log­i­cal in­duc­tion), and a po­ten­tial mod­ifi­ca­tion, Asymp­totic De­ci­sion The­ory. There’s also a proof-based ap­proach called Mo­dal UDT, for which a good place to start would be the 3rd sec­tion in this col­lec­tion of links. Another sur­pris­ing av­enue is that some for­mal­i­sa­tions of the high-level clusters sug­gest that they’re all the same. If you want to know more about the differ­ences be­tween Time­less De­ci­sion The­ory (TDT), FDT, and ver­sions 1.0, 1.1, and 2 of UDT, this post might be helpful.