Formalising decision theory is hard
In this post, I clarify how far we are from a complete solution to decision theory, and the way in which high-level philosophy relates to the mathematical formalism. I’ve personally been confused about this in the past, and I think it could be useful to people who casually follows the field. I also link to some less well-publicized approaches.
The first disagreement you might encounter when reading about alignment-related decision theory is the disagreement between Causal Decision Theory (CDT), Evidential Decision Theory (EDT), and different logical decision theories emerging from MIRI and lesswrong, such as Functional Decision Theory (FDT) and Updateless Decision Theory (UDT). This is characterized by disagreements on how to act in problems such as Newcomb’s problem, smoking lesion and the prisoner’s dilemma. MIRI’s paper on FDT represents this debate from MIRI’s perspective, and, as exemplified by the philosopher who refereed that paper, academic philosophy is far from having settled on how to act in these problems.
I’m quite confident that the FDT-paper gets those problems right, and as such, I used to be pretty happy with the state of decision theory. Sure, the FDT-paper mentions logical counterfactuals as a problem, and sure, the paper only talks about a few toy problems, but the rest is just formalism, right?
As it turns out, there are a few caveats to this:
CDT, EDT, FDT, and UDT are high-level clusters of ways to go about decision theory. They have multiple attempted formalisms, and it’s unclear to what extent different formalisms recommend the same things. For FDT and UDT in particular, it’s unclear whether any one attempted formalism (e.g. the graphical models in the FDT paper) will be successful. This is because:
Logical counterfactuals is a really difficult problem, and it’s unclear whether there exists a natural solution. Moreover, any non-natural, arbitrary details in potential solutions are problematic, since some formalisms require everybody to know that everybody uses sufficiently similar algorithms. This highlights that:
The toy problems are radically simpler than actual problems that agents might encounter in the future. For example, it’s unclear how they generalise to acausal cooperation between different civilisations. Such civilisations could use implicitly implemented algorithms that are more or less similar to each others’, may or may not be trying and succeeding to predict each others’ actions, and might be in asymmetric situations with far more options than just cooperating and defecting. This poses a lot of problems that don’t appear when you consider pure copies in symmetric situations, or pure predictors with known intentions.
As a consequence, knowing what philosophical position to take in the toy problems is only the beginning. There’s no formalised theory that returns the right answers to all of them yet, and if we ever find a suitable formalism, it’s very unclear how it will generalise.
If you want to dig into this more, Abram Demski mentions some open problems in this comment. Some attempts at making better formalisations includes Logical Induction Decision Theory (which uses the same decision procedure as evidential decision theory, but gets logical uncertainty by using logical induction), and a potential modification, Asymptotic Decision Theory. There’s also a proof-based approach called Modal UDT, for which a good place to start would be the 3rd section in this collection of links. Another surprising avenue is that some formalisations of the high-level clusters suggest that they’re all the same. If you want to know more about the differences between Timeless Decision Theory (TDT), FDT, and versions 1.0, 1.1, and 2 of UDT, this post might be helpful.