Functional decision theory has open problems within it, but it is correct, and the rival decision theories are wrong
My understanding was MIRI is pretty confident that the correct decision theory is one of the ones in the LDT category, but that FDT was a specific formalization of an LDT which gets a lot of normal challenges right but has some known issues rather than being actually exactly correct. Given that we’ve afaict not solved DT, I think telling Claude “Do exactly FDT” is probably dangerously suboptimal, but telling it “here’s what we want from a good DT, correct handling of subjunctive dependence, we’re pretty sure it’s in the LDT category, here’s why this matters” is nicer.
Ok, rather than asking for MIRI people’s takes as I had in an earlier draft, I got a summary of positions from a Claude literature review:
Researcher
Position
Key Quote
Link
Wei Dai
Not solved — more open problems
”UDT shows that decision theory is more puzzling than ever… Instead of one major open problem (Newcomb’s, or EDT vs CDT) now we have a whole bunch more. I’m really not sure at this point whether UDT is even on the right track.”
”Logical Updatelessness is one of the central open problems in decision theory.” Also authored “Two Major Obstacles for Logical Inductor Decision Theory” documenting fundamental unsolved issues.
”There may just be no ‘correct’ counterfactuals” and UDT “assumes that your earlier self can foresee all outcomes, which can’t happen in embedded agents.” In 2021: “I have not yet concretely constructed any way out.”
MIRI works on DT because “there’s a cluster of confusing issues here (e.g., counterfactuals, updatelessness, coordination) that represent a lot of holes or anomalies in our current best understanding.”
”Knowing what philosophical position to take in the toy problems is only the beginning. There’s no formalised theory that returns the right answers to all of them yet… Logical counterfactuals is a really difficult problem, and it’s unclear whether there exists a natural solution.”
Wrote “Two Alternatives to Logical Counterfactuals” arguing for different approaches (counterfactual nonrealism, policy-dependent source code), noting fundamental problems with existing frameworks.
”I don’t think it’s right to see a spectrum with CDT and then EDT and then UDT. I think it’s more right to see a box, where there’s the updatelessness axis and then there’s the causal vs. evidential axis.”
In the FDT paper, Y&S acknowledge that “specifying an account of [subjunctive] counterfactuals is an ‘open problem’.” The companion paper “Cheating Death in Damascus” states: “Unfortunately for us, there is as yet no full theory of counterlogicals [...], and for FDT to be successful, a more worked out theory is necessary.”
Summary: The consensus among core MIRI/AF researchers (Wei Dai, Garrabrant, Demski, Bensinger, Finnveden) is that FDT/UDT represents the right direction but leaves major open problems—particularly around logical counterfactuals, embeddedness, and formalization.
I think you might be mixing up LDT and FDT, and “we have a likely accurate high level underspecified semantic description of what things a correct DT must have” with “we have a well-specified executable philosophy DT ready to go”.
There’s also MUPI now, which tries to sidestep logical counterfactuals:
FDT must reason about what would have happened if its deterministic algorithm had produced a different output, a notion of logical counterfactuals that is not yet mathematically well-defined. MUPI achieves a similar outcome through a different mechanism: the combination of treating universes including itself as programs, while having epistemic uncertainty about which universe it is inhabiting—including which policy it is itself running. As explained in Remark 3.14, from the agent’s internal perspective, it acts as if its choice of action decides which universe it inhabits, including which policy it is running. When it contemplates taking action a, it updates its beliefs w(λ|æ<ta), effectively concentrating probability mass on universes compatible with taking action a. Because the agent’s beliefs about its own policy are coupled with its beliefs about the environment through structural similarities, this process allows the agent to reason about how its choice of action relates to the behavior of other agents that share structural similarities. This “as if” decision-making process allows MUPI to manifest the sophisticated, similarity-aware behavior FDT aims for, but on the solid foundation of Bayesian inference rather than on yet-to-be-formalized logical counterfactuals.
I’d love to see more engagement by MIRI folks as to whether this successfully formalizes a form of LDT or FDT.
My understanding was MIRI is pretty confident that the correct decision theory is one of the ones in the LDT category, but that FDT was a specific formalization of an LDT which gets a lot of normal challenges right but has some known issues rather than being actually exactly correct. Given that we’ve afaict not solved DT, I think telling Claude “Do exactly FDT” is probably dangerously suboptimal, but telling it “here’s what we want from a good DT, correct handling of subjunctive dependence, we’re pretty sure it’s in the LDT category, here’s why this matters” is nicer.
Ok, rather than asking for MIRI people’s takes as I had in an earlier draft, I got a summary of positions from a Claude literature review:
I think you might be mixing up LDT and FDT, and “we have a likely accurate high level underspecified semantic description of what things a correct DT must have” with “we have a well-specified executable philosophy DT ready to go”.
There’s also MUPI now, which tries to sidestep logical counterfactuals:
I’d love to see more engagement by MIRI folks as to whether this successfully formalizes a form of LDT or FDT.