plex comments on Claude’s Constitutional Structure

plex 27 Jan 2026 13:37 UTC

18 points

Functional decision theory has open problems within it, but it is correct, and the rival decision theories are wrong

My understanding was MIRI is pretty confident that the correct decision theory is one of the ones in the LDT category, but that FDT was a specific formalization of an LDT which gets a lot of normal challenges right but has some known issues rather than being actually exactly correct. Given that we’ve afaict not solved DT, I think telling Claude “Do exactly FDT” is probably dangerously suboptimal, but telling it “here’s what we want from a good DT, correct handling of subjunctive dependence, we’re pretty sure it’s in the LDT category, here’s why this matters” is nicer.

Ok, rather than asking for MIRI people’s takes as I had in an earlier draft, I got a summary of positions from a Claude literature review:

Researcher Position Key Quote Link
Wei Dai Not solved — more open problems ”UDT shows that decision theory is more puzzling than ever… Instead of one major open problem (Newcomb’s, or EDT vs CDT) now we have a whole bunch more. I’m really not sure at this point whether UDT is even on the right track.” LessWrong, Sept 2023
Scott Garrabrant Not solved — major obstacles remain ”Logical Updatelessness is one of the central open problems in decision theory.” Also authored “Two Major Obstacles for Logical Inductor Decision Theory” documenting fundamental unsolved issues. LessWrong, Oct 2017 / LessWrong, Apr 2017
Abram Demski Not solved — fundamental issues remain ”There may just be no ‘correct’ counterfactuals” and UDT “assumes that your earlier self can foresee all outcomes, which can’t happen in embedded agents.” In 2021: “I have not yet concretely constructed any way out.” LessWrong, Oct 2018 / LessWrong, Apr 2021
Rob Bensinger Not solved — ongoing research needed MIRI works on DT because “there’s a cluster of confusing issues here (e.g., counterfactuals, updatelessness, coordination) that represent a lot of holes or anomalies in our current best understanding.” LessWrong, Sept 2018
Lukas Finnveden Not solved — formalization is hard ”Knowing what philosophical position to take in the toy problems is only the beginning. There’s no formalised theory that returns the right answers to all of them yet… Logical counterfactuals is a really difficult problem, and it’s unclear whether there exists a natural solution.” LessWrong, Aug 2019
Jessica Taylor Not solved — alternatives needed Wrote “Two Alternatives to Logical Counterfactuals” arguing for different approaches (counterfactual nonrealism, policy-dependent source code), noting fundamental problems with existing frameworks. LessWrong, Mar 2020
Paul Christiano Nuanced — 2D problem space ”I don’t think it’s right to see a spectrum with CDT and then EDT and then UDT. I think it’s more right to see a box, where there’s the updatelessness axis and then there’s the causal vs. evidential axis.” LessWrong, Sept 2019
Eliezer Yudkowsky Progress made but problems remain In the FDT paper, Y&S acknowledge that “specifying an account of [subjunctive] counterfactuals is an ‘open problem’.” The companion paper “Cheating Death in Damascus” states: “Unfortunately for us, there is as yet no full theory of counterlogicals [...], and for FDT to be successful, a more worked out theory is necessary.” arXiv, Oct 2017/May 2018
Summary: The consensus among core MIRI/AF researchers (Wei Dai, Garrabrant, Demski, Bensinger, Finnveden) is that FDT/UDT represents the right direction but leaves major open problems—particularly around logical counterfactuals, embeddedness, and formalization.

Researcher	Position	Key Quote	Link
Wei Dai	Not solved — more open problems	”UDT shows that decision theory is more puzzling than ever… Instead of one major open problem (Newcomb’s, or EDT vs CDT) now we have a whole bunch more. I’m really not sure at this point whether UDT is even on the right track.”	LessWrong, Sept 2023
Scott Garrabrant	Not solved — major obstacles remain	”Logical Updatelessness is one of the central open problems in decision theory.” Also authored “Two Major Obstacles for Logical Inductor Decision Theory” documenting fundamental unsolved issues.	LessWrong, Oct 2017 / LessWrong, Apr 2017
Abram Demski	Not solved — fundamental issues remain	”There may just be no ‘correct’ counterfactuals” and UDT “assumes that your earlier self can foresee all outcomes, which can’t happen in embedded agents.” In 2021: “I have not yet concretely constructed any way out.”	LessWrong, Oct 2018 / LessWrong, Apr 2021
Rob Bensinger	Not solved — ongoing research needed	MIRI works on DT because “there’s a cluster of confusing issues here (e.g., counterfactuals, updatelessness, coordination) that represent a lot of holes or anomalies in our current best understanding.”	LessWrong, Sept 2018
Lukas Finnveden	Not solved — formalization is hard	”Knowing what philosophical position to take in the toy problems is only the beginning. There’s no formalised theory that returns the right answers to all of them yet… Logical counterfactuals is a really difficult problem, and it’s unclear whether there exists a natural solution.”	LessWrong, Aug 2019
Jessica Taylor	Not solved — alternatives needed	Wrote “Two Alternatives to Logical Counterfactuals” arguing for different approaches (counterfactual nonrealism, policy-dependent source code), noting fundamental problems with existing frameworks.	LessWrong, Mar 2020
Paul Christiano	Nuanced — 2D problem space	”I don’t think it’s right to see a spectrum with CDT and then EDT and then UDT. I think it’s more right to see a box, where there’s the updatelessness axis and then there’s the causal vs. evidential axis.”	LessWrong, Sept 2019
Eliezer Yudkowsky	Progress made but problems remain	In the FDT paper, Y&S acknowledge that “specifying an account of [subjunctive] counterfactuals is an ‘open problem’.” The companion paper “Cheating Death in Damascus” states: “Unfortunately for us, there is as yet no full theory of counterlogicals [...], and for FDT to be successful, a more worked out theory is necessary.”	arXiv, Oct 2017/May 2018

I think you might be mixing up LDT and FDT, and “we have a likely accurate high level underspecified semantic description of what things a correct DT must have” with “we have a well-specified executable philosophy DT ready to go”.

Adele Lopez 28 Jan 2026 20:21 UTC
14 points
0
Parent
There’s also MUPI now, which tries to sidestep logical counterfactuals:

FDT must reason about what would have happened if its deterministic algorithm had produced a different output, a notion of logical counterfactuals that is not yet mathematically well-defined. MUPI achieves a similar outcome through a different mechanism: the combination of treating universes including itself as programs, while having epistemic uncertainty about which universe it is inhabiting—including which policy it is itself running. As explained in Remark 3.14, from the agent’s internal perspective, it acts as if its choice of action decides which universe it inhabits, including which policy it is running. When it contemplates taking action $a$ , it updates its beliefs $w (λ | æ_{< t} a)$ , effectively concentrating probability mass on universes compatible with taking action $a$ . Because the agent’s beliefs about its own policy are coupled with its beliefs about the environment through structural similarities, this process allows the agent to reason about how its choice of action relates to the behavior of other agents that share structural similarities. This “as if” decision-making process allows MUPI to manifest the sophisticated, similarity-aware behavior FDT aims for, but on the solid foundation of Bayesian inference rather than on yet-to-be-formalized logical counterfactuals.

I’d love to see more engagement by MIRI folks as to whether this successfully formalizes a form of LDT or FDT.