Research Lead at CORAL. Director of AI research at ALTER. PhD student in Shay Moran’s group in the Technion (my PhD research and my CORAL/ALTER research are one and the same). See also Google Scholar and LinkedIn.
E-mail: {first name}@alter.org.il
Research Lead at CORAL. Director of AI research at ALTER. PhD student in Shay Moran’s group in the Technion (my PhD research and my CORAL/ALTER research are one and the same). See also Google Scholar and LinkedIn.
E-mail: {first name}@alter.org.il
I’m not sure what do you mean by “does not scale much”, but I agree with everything else. (My own ideal outcome is not literally “I am the queen”, but the same principle applies.)
The above treatment of “CDT precommitment games” is problematic: the concept
Definition: A CDT decision problem is the following data. We have a set of variables
The parent relation must induce an acyclic directed graph. We also have a selected subset of decision variables
This is connected to our overall formalism by setting
The CDT counterfactuals and decision-rule are defined via a do-operator that forces
Definition: A CDT precommitment game is a CDT decision problem in which there is some special
1.
2. For some
3. For every
4. For every
This is connected to our abstract notion of precommitment game by setting
The underlying decision problem of the precommitment game is constructed by deleting
The game is said to be trivial when all variables with parent
Proposition: CDT is precommitment-stable in trivial precommitment games.
Definition: Given a CDT precommitment game with
Proposition: If
Above, I compare different decision theories to FDT. At the same time, I claim that in a deeper sense, FDT is ill-defined. One may doubt whether that is a coherent line of reasoning. Therefore, instead of a comparison to FDT, I propose to frame these observations as being about stability to precommitments. Details follow.
Definition: A precommitment game
1. We are given some
2. We are given
3. For any
4. Denote
The restriction of
Definition: An EDT precommitment game
1.
2.
The underlying decision problem is then an EDT decision problem with the belief
(Is there a natural generalization without the assumption
Proposition: EDT is precommitment-stable in formally causal precommitment games. That is, in any such game there is
For example, XOR blackmail can be formalized as an EDT precommitment game which is not formally causal and EDT is not precommitment-stable there (the only optimal policy is precommitting to reject).
Definition: [EDIT: The treatment of CDT here is problematic, see child post.] A CDT precommitment game
The underlying decision problem is then a CDT decision problem with
Proposition: CDT is precommitment-stable in policy-bottlenecked precommitment games. That is, in any such game there is
For example, Newcomb’s paradox can be formalized as a CDT precommitment game is which is not policy-bottlenecked and CDT is not precommitment-stable there (the only optimal policy is precommitting to one-box).
Definition: A DDT precommitment game
The underlying decision problem is then a DDT decision problem with
Proposition: IDDT is precommitment-stable in pseudocausal precommitment games. That is, in any such game there is
It should be straightforward to also formulate an analogous claim with plain DDT and iterated pseudocausal precommitment games.
To make the claim that DDT/IDDT is precommitment-stable more often than EDT and CDT, we need to somehow compare different decision theories on the same game. For this purpose, we have the following translations.
Definition: Given an EDT precommitment game with
Proposition: If
Definition: Given a CDT precommitment game, its DDT-translation is defined by setting
Proposition: If
Below we only use the case
There is no objective morality, but there is such as a thing as objectively rational decision-making. And I never said anything about egoism.
Your comment sounds to me like it’s coming from a particular school of moral philosophy discourse, which (in my view) is built on the erroneous redefinition of words. In particular, “moral” and “rational”, together with various synonym-ish words, mean different very things in colloquial speech, but this type of moral philosophy discourse conflates them. In theory, you can of course define your words any way you like. However, if you do so, you relinquish the right the argue from any common sense intuitive claim that uses these words in their original meaning. (Which, in my view, is how fallacies are smuggled in during this kind of discourse.)
Similarly, “egoism” and “taking rational actions according to your own preferences” are also very different things.
(Thank you for your comment, my explanation here is a useful addition to the OP, I think.)
Takes on moral philosophy and the history of this community that I mostly mentioned before but should maybe be put together somewhere:
Human preferences are very partial/parochial, and this is meta-endorsed. There is a finite number P s.t. for any N>0, the lives of N strangers are less than P times as terminally-valuable for you as the life of your loved one. If you want to be honest with yourself (which you should if it’s high-value for you to have accurate beliefs), you should endorse this.
(Objective, abstract) Morality is fake, both non-cognitivism and error theory have merit. Parochial altruistic preferences (=empathy) are real, rational and superrational cooperation are real. Morality-as-used-in-practice is a process of continuous negotiations about social norms (the “social contract” if you like).
In particular, utilitarianism is very confused. That said, (super)rational cooperation can cash out as something utilitarianism-ish in some situations. (For example, if it is best for everyone if we precommit to derailing the trolley even if a personal friend is on the other track.)
Paradoxes such as Pascal’s mugging, population ethics and infinite ethics all stem from trying to use a confused framework (impartial and unbounded utility).
This type of confusion contributed to the failure of Old MIRI’s agent foundations programme, by causing it to over-index on ideas like Pascal’s mugging and the procrastination paradox.
The self-deceptive endorsement of impartial unbounded utility obscures the importance of multi-agent considerations in morality-as-used-in-practice, and this contributed to failures of the Effective Altruism movement such as SBF and OpenAI. Ideas such as the “pivotal act” are also sus in a similar way (although I can see versions of that which might be justified).
This argument uses the assumption that Alice can’t change eir beliefs in response to learning that Omega has proposed specific bets and not others.
Not true. Changing her beliefs in response to Omega’s proposal doesn’t help her. Imagine that Alice is given a choice between
Take a bet that pays +2 if X and −1 if not-X.
Take a bet that pays +2 if not-X and −1 if X.
Refuse both bets.
No matter what probability Alice assigns to X after her update, “normal” Bayesian calculus (really CDT calculus, see below) mandates that she chooses 1 or 2, not 3.
It seems clear that a bookie can reliably make money from gamblers if the bookie knows which horse will win which race; this is not, in the classical way of thinking, a testament to the irrationality of the gamblers.
I guess this example assumes the gamblers are not allowed to update on the offered bets? (Otherwise it doesn’t make sense to me.) Like I said, we don’t assume it here.
Instead, infrabayesianism recommends a strict preference for mixed strategies.
Not really, you’re over-indexing on the somewhat outdated 6 year old post you’re replying to. It is true that if Alice has a coin that Omega cannot predict, she can come ahead by betting according to the coin. But, as my 1-2-3 example above demonstrates, this is not the core idea. The “modern” formulation of infra-Bayesianism only allows deterministic policies, whereas randomization is modeled by means such as “taking the action to flip a coin”.
That version relies on a “causal” assumption that Omega’s choices are probabilistically independent of the gambler’s. This assumption seems inherently contrary to the problem description (since Omega can predict the gambler’s choices, and uses those predictions to make its choices).
What is actually going on here, this is not a Dutch Book argument against Bayesianism per se, this is a Dutch Book argument against Bayesian CDT. CDT-Alice believes that choosing to bet on X doesn’t influence the veracity of X, since there is indeed no physical causal link from the former to the latter (X might even be determined before the bet is offered or made). EDT-Alice can succeed here by noticing that her own choice is correlated with X and therefore the probability of X differs between the “Alice bets on X” counterfactual and the “Alice bets on not-X” counterfactual.
So, why is this example interesting beyond other examples that undermine CDT?
Mainly, it’s just easier to understand how infra-Bayesianism solves the problem here, and in particular we only need (crisp) credal sets rather supradistributions (fuzzy credal sets).
Another reason is, the notion of subjective probability is often justified by thinking about bets. But thinking about bets requires a decision theory, and not just a theory of epistemology. Hence, once you noticed that you’re confused about decision theory, you should be open to reconsidering the notion of subjective probability as well.
Yet another reason is, there’s something interesting going on where the supra-POMDP method of dealing with Newcombian problems preserves causality in some sense, while the EDT solution “violates” it. I thought it’s notable, although probably more important are the cases where EDT fails altogether (while infra-Bayesianism / DDT succeeds).
I haven’t tried LZP in practice, but you can guess what results to expect by looking at the size of the LZ77-compression of the text. I expect that any remotely decent text prediction algorithm would be based on stochastic process prediction. The deterministic setting is just a toy model.
Thanks for the catch!
It’s supposed to look like the control panel of the Enterprise.
A few more observations.
The definition of iteration we had before implicitly assumes that the agent can observe the full outcome of previous iterations. We don’t have to make this assumption. Instead, we can assume a set of possible observations
I believe that Theorem 4 remains valid.
As we remarked before, DDT is not invariant under adding a constant to the loss function. It is interesting to consider what happens when we add an increasingly large constant. In the limit, DDT converges to something I dubbed “Idealized Disambiguative Decision Theory” (IDDT)[1], which works as follows.
For IDDT, it is sufficient to let
For problems coming from unambiguous FDT,
The decision rule is then
Notice that it is now invariant w.r.t. adding constants to
Proposition 5: For any stable problem, it holds that (i) any IDDT-optimal policy is FDT-optimal (ii) there is an FDT-optimal policy which is IDDT-optimal. For any pseudocausal problem, it also holds that any FDT-optimal policy is IDDT-optimal.
One might think, based on this proposition, that IDDT is a superior decision theory to DDT. However, I think that IDDT is incompatible with learning, because of its discontinuous dependence on probabilities.
(Based on Aumann, Hart and Perry.) We will operationalize the problem by assuming the agent’s decision may deterministically depend on observing a coin flip. To simplify the presentation, we assume a single coin flip per intersection, which limits the resulting probabilities to
Denote by
Denote by
Consistently with our source, we set the loss function to be
This problem is formally causal. However, as opposed to all previous examples, it has no extensive form! Hence, EDT in the sense we defined it is ill-posed: to apply EDT reasoning here we need to at least supplement it by a theory of anthropic probabilities. CDT’s counterfactuals agree with FDT’s if we posit that the do-operator is constrained to choosing among “absent-minded” policies.
Previously we described the self-coordination problem, but perhaps self-PD is a more striking example.
Here,
Using the obvious notations
The loss is the usual PD loss of the “factual” player.
This problem is not formally causal, because e.g.
The natural CDT interpretation is the one where the factual policy controls the counterfacual player and the counterfactual policy controls the factual player. (Alas, the terminology gets confusing here: in one case the words “factual” and “counterfactual” refer to the agent’s policy, and in the other case to the coin’s outcome.) Both CDT and EDT play
IDDT is related to the old idea of “surmeasures” from the original infra-Bayesianism sequence.
We can also imagine equipping the agent with a “self-belief”
What you propose here doesn’t address the issue of non-realizability at all. For example, let’s say
This is an idea I came up with and presented in the Agent Foundations 2025 at CMU conference.
Here is a nice simple formalism for decision theory, that in particular supports the decision theory coming out of infra-Bayesianism. I now call the latter decision theory “Disambiguative Decision Theory”, since the counterfactuals work by “disambiguating” the agent’s belief.
Let
This data is common for all decision theories, but the rest of the details depend on the theory:
We are given a mapping
We will call an FDT problem “formally causal” when for any
CDT has the same formal form as FDT, but we always require the problem to formally causal. Moreover, the interpretation of
Given an FDT problem
Given this data, we define the translation
To formalize EDT, we need to assume the decision process is given in “extensive” form. That is, we have a set
We assume that
We define a policy to be
For every
For every
We further assume that there is a mapping
Here,
For any
This represents the event “the decision point
So far, this notion of extensive form decision problem is useful not just for EDT. Specifically for EDT, we add the assumption that we’re given the agent’s belief
For every
Thus, the agent conditions both on following policy
Given an FDT problem
We are given the agent’s belief
Here,
We then have
This is the reason for the name “disambiguative”:
Given an FDT problem
That is,
DDT does have the odd property of non-invariance w.r.t. shifting
Now, let’s look into how different decision theories compare. We will be using FDT as the “gold standard” throughout, when it comes to choosing the correct policy. Note though, that FDT assumes we somehow assign strict meaning to the logical counterfactuals, which is unclear how to accomplish. On the other hand, DDT makes the substantially weaker assumption that can define the supracontribution belief. In particular, it is consistent with learning, as was explained here.
Proposition 1: Consider a formally causal FDT problem
Proposition 2: Consider a formally causal FDT problem in extensive form. Then,
Proposition 3: Consider a formally causal FDT problem. Then, Then,
Thus, in the strictly causal case all decision theories coincide: but even here DDT requires the least precise assumptions for that to work (compared to CDT and EDT). More importantly, DDT allows to go far beyond the formally causal case. However, we do need a mild assumption about the problem:
Definition 1: An FDT problem is called pseudocausal when for any
It’s easy to see that any formally causal problem is pseudocausal, but there are many counterexamples to the converse.
Essentially, pseudocausality means that the outcome cannot depend on decisions in situations of probability 0. Notice that in reality the agent is never absolutely certain about the decision problem, hence observing a situation of probability 0 should cause it to believe it is in a different decision problem altogether. This makes the pseudocausality condition very natural.
Pseudocausality has the nice property of not depending on the loss function. If we do allow dependence on the loss function, we can make do with an even weaker condition.
Definition 2: An FDT problem is called stable when there exists an FDT-optimal
It’s obvious that any stable problem is pseudocausal. Naturally, the converse is false.
Neither pseudocausality nor stability is sufficient to guarantee that DDT and FDT give identical recommendations. However, it becomes true when we iterate the problem.
Definition 3: Given a decision problem and
Given
For FDT, for any
For DDT, we take the belief to be
Note that iterating a problem commutes with converting it from FDT to DDT.
Theorem 4: For a stable FDT problem, there exists
The requirement to iterate doesn’t seem like a terrible cost, since in a learning context some kind of iteration is necessary anyway. It can also be understood as a natural result of the need for stability: problems that are close to being unstable require more iterations.
All these examples besides the last one have natural extensive forms with one decision point.
This problem is formally causal, however the usual causal interpretation is non-trivial:
As a result,
The problem is pseudocausal but not formally causal. Nevertheless, CDT agrees with FDT thanks to the following causal interpretation:
The problem is pseudocausal but not formally causal.
For simplicity, we postulate that the agent is forced to two-box when seeing a full box, since this choice is a “no-brainer” for all decision theories.
The problem is stable but not pseudocausal. EDT is ill-posed because
As above, we postulate that the agent is forced to two-box when seeing an empty box.
The problem is not stable. EDT is ill-posed because
We now assume Omega has a probability
The problem is pseudocausal, but not formally causal of course. EDT is well-posed and
Here’s an interesting example of a problem with two decision points. Omega flips a coin and shows the result to the agent. The agent then has to choose between buttons A, B and C. Button C always yields 3 dollars. Buttons A and B yield 4 dollars if Omega predicts the agent would choose the same button in the other coin counterfactual, and 0 dollars otherwise.
The rest of the definitions are clear and we won’t write them out. The problem is pseudocausal but not formally causal. CDT and EDT agree here, with their behavior depending on the agent’s self-belief
the computational complexity of individual hypotheses in the hypothesis class cannot be the thing that characterizes the hardness of learning, but rather it has to be some measure of how complex the entire hypothesis class is.
This is true, of course, but mostly immaterial. Outside of contrived examples, it’s rare for the hypothesis class to be feasible to learn while containing hypotheses that are infeasible to evaluate. It seems extremely implausible that you can find a hypothesis class that is simultaneously (i) possible to specify in practice [1] (ii) feasible to learn and (iii) contains a hypothesis which is an exact description of the real universe. Therefore, non-realizability is unavoidable.
By which I mean, we can construct the learning algorithm without being something akin to omniscient beings that already know everything about the universe and are able to hardcode this knowledge into the algorithm. Indeed, the reasons why we need a learning algorithm at all are (i) we don’t know a lot of what we want the agent to know (ii) it’s too labor-intensive to hardcode even the things that we do know. Therefore, we need a hypothesis class that is extremely broad and mostly uninformative.
This idea was described in a presentation I have in ’23, but wasn’t written down anywhere.
Here is a formalization of recursive self-improvement (more precisely, recursive metalearning) in the metacognitive agent framework.
Let
Let
Consider any symbolic representation of an element of
Define
Given
We now say that an agent is recursively metalearning (w.r.t. the choices involved), if (i) it satisfies a “good enough” regret bound w.r.t.
Intuitively, this reflects the idea if
For simplicity, we assume that
Just don’t. I understand the frustration of not getting engagement, but don’t spam the site.
Halpern and Leung propose the “minimax weighted expected regret” (MWER) decision-rule, which is a generalization of the minimax-expected-regret (MER) decision-rule. In contrast, our decision rule is a weighted generalization of maximin-expected-utility (MMEU). The problem with MER is that it doesn’t work very well with learning. The closest thing to doing learning with MER is adversarial bandits. However, adversarial regret is statistically intractable for Markov Decision Processes. And even with bandits there is a hidden obliviousness assumption if you try to interpret it in a principled decision-theoretic way.
The truth is outside of my hypothesis class, but my hypothesis class probably contains a non-trivial law that is a coarsening of the truth, which is the whole point.
For example, you can imagine that you start with some kind of intractable simplicity prior. Then, for each hypothesis you choose a tractable law that coarsens it. You end up with a probability distribution over laws.
A different way to view this is, this is just a way to force your policy to have low-regret w.r.t. all/most hypothesis while weighing complex hypotheses less. For a complex hypothesis, you naturally expect learning it to be harder so you’re weighing its regret less. Typically, it’s only possible to have a uniform regret bound if you impose a bound on the complexity of hypotheses in some sense. Absent such a bound, your regret bound must be non-uniform. You can formalize it by explicitly allowing the per-hypothesis regret to depend on some complexity parameter, but the Bayes approach is an alternative. (Also, Bayes regret obviously implies per-hypothesis non-uniform regret with a 1/probability coefficient.)
First, Bayes-regret and worst-case-regret are standard concepts in classical RL theory, and the infra-versions are straightforward analogs.
Second, you don’t have to focus on the Bayes-regret necessarily. In fact, in our papers, we focus entirely on uniform (worst-case) regret bounds.
Third, instead of an ordinary prior over laws you can consider an infraprior over laws (i.e. have ambiguity in hypothesis-space and not just in outcome-space). The resulting notion of “infra-Bayes-regret” has both Bayes-regret and worst-case-regret as special cases.
Fourth, the justification is quite straightforward. If you have an (unambiguous i.e. ordinary probability distribution) prior over laws, and your performance metric is the Bayes-infra-expected utility, then the Bayes-regret is just the difference between the performance of your policy and the performance of an optimal policy that magically knows the true hypothesis. So it’s a very natural measure of your policy’s ability to learn the hypothesis.
I like the overall vibe. Two issues:
It says “Top Posts” and the mouse-over text is “by karma”, however in reality I can choose which posts to put there. Now, I like it that I can choose which posts to put there, but once I customized them, the mouse-over becomes a lie.
The “recent comments” disappeared. This is really bad because I use that to find my recent comments when I want to edit them. (For example now I wanted to find this comment to add this second bullet but had to do it manually.) OK, I now see I can find them under “feed” but this might be confusing.
[Context: I’m not a digital minimalist but I am somewhat of a “digital reducetarian”: I don’t have social media (besides LinkedIn) and have a browser plugin that reduces my access to particular websites (like LessWrong).]
Cool post :)
For me, there’s something “strange” here (not surprising, but unlike my own experience), where the implication is that people have huge swaths of “free time” that they use for scrolling and the like (which you instead use for what’s described in this post). I spend the vast majority of my time either working or doing something with kids/lovers/friends. (I did read this post in bed preparing to start my day, and am sneaking in this comment between breakfast and work.) Plus short breaks from work, and a short time in bed before sleeping, during which I read fiction books (admittedly using digital means, but in principle I could use physical books just as well, if I could fit them all into my apartment).
It’s fun to hear about your experience talking to random strangers! Catalogued it under “I would never do this but I’m glad some people do”.
What I meant is not “people only care about ~Dunbar number of people”, but something more like “the closest ~Dunbar number of people have [some fraction around the range 1/1000-1/2] of the total value”. Giuseppe Garibaldi was also influenced by considerations such as increasing his own status (or maybe even posthumous reputation).
As to “humans are not capable to behave this way rationally”, I disagree. (The whole point of decision theories like UDT/FDT is that you don’t need to rewrite your source code to behave in an a priori-optimal way, and I believe that I’m fully capable of following the recommendations of such decision theories—and do follow their recommendations. )There is probably also a sense in which we value something vaguely akin to “abstract moral concepts”, but this caches out to something very different from utilitarianism (closer to virtue ethics).