RSS

Up­date­less De­ci­sion Theory

TagLast edit: 10 Jan 2025 13:02 UTC by Peter Berggren

Motivation

Updateless Decision Theory (UDT) is a decision theory meant to deal with a fundamental problem in the existing decision theories: dynamic inconsistency, IE, having conflicting desires over time. In behavioral economics, humans are often modeled as hyperbolic discounters, meaning that rewards further away in time are seen as proportionately less important (so getting $100 one week from now is as good as $200 two weeks from now). This is dynamically inconsistent because the relative value of rewards changes as they get closer or further away in time. (Getting $100 one year from now sounds much less desirable than getting $200 one year plus one week from now.) This model explains some human behaviors, such as snoozing alarms repeatedly.[1]

The dynamic inconsistency inherent in hyperbolic discounting can be fixed by exponential discounting, amongst other possibilities. However, dynamic inconsistencies can still occur for other reasons. The two most common decision theories today, Causal Decision Theory (CDT) and Evidential Decision Theory (EDT), are both dynamically inconsistent about Counterfactual Mugging: they refuse Omega when faced with the problem, but if asked beforehand, would see the value of agreeing.[2][3]

Getting this issue right is critical in building a safe artificial general intelligence, as such an AI must analyze its own behavior and that of a next generation that it may build. Dynamically inconsistent AI systems have an incentive to engage in self-modification, but such self-modification is inherently risky.

Updateless Decision Theory was invented by Wei Dai and first described in Towards a New Decision Theory.

See also

Content

UDT specifies that the optimal agent is the one with the best policy—the best mapping from observations to actions—as estimated by its prior beliefs. (“Best” here, as in other decision theories, means one that maximizes expected utility.)

This definition may seem trivial, but in contrast, CDT and EDT both choose the “best” action in the current moment, IE, according to the posterior beliefs.

For example, standard game theory (which uses CDT) says that following through on costly threats is irrational. For example, suppose Alice says that she will hunt down Bob and beat him up if Bob steals from her. Bob proceeds to steal a small amount from Alice. CDT says that Alice should let it go, rather than pay the cost of following through on her threat, because now that Bob has stolen, Alice is only losing utility by following through.

Yet, standard game theory affirms that Bob would not steal if Alice were the sort of person who followed through on threats. In the standard game-theoretic model, it is a Nash equilibrium for Alice to be the sort of person who follows through; in this Nash equilibrium, Bob doesn’t steal.

This Nash equilibrium is considered irrational, however, because CDT says it is irrational for Alice to follow through. Therefore, standard game-theory rules out such Nash equilibria. This assumption is called the subgame-perfect condition.

Part of the idea of UDT, then, is to get rid of the subgame-perfect condition. According to UDT, it is rational for Alice to be the sort of person who follows through on threats, if Bob has a low probability of stealing from such people—the cost is rarely paid, so it is worth the cost overall.

However, UDT isn’t only about rejection of the subgame-perfect condition. UDT also rejects CDT’s way of thinking about the consequences of actions. In Judea Pearl’s definition of causality,[4] CDT ignores any causal links inbound to the decider, treating this agent as an uncaused cause. UDT rejects this idea, instead thinking about consequences in the way EDT does.

Evidential Decision Theory is the other leading decision theory today. It says that the agent should make the choice for which the expected utility, as calculated with Bayes’ Rule, is the highest. This allows for cooperation in the Prisoner’s Dilemma in some cases, if the two players think their decisions are sufficiently correlated. This is not a Nash Equilibrium at all, subgame-perfect or otherwise, and standard game theory considers cooperation in this situation to be irrational. However, many people in the LessWrong community and related intellectual circles think that cooperation is rational in many cases. This may have been a factor in Wei Dai’s decision to use a more EDT-like formulation for UDT, rather than a CDT-like formulation.

UDT and Timeless Decision Theory (TDT) have very similar motivations; indeed, Wei Dai formulated UDT as something like a guess about TDT due to Eliezer being slow to publish details about TDT:

It commonly acknowledged here that current decision theories have deficiencies that show up in the form of various paradoxes. Since there seems to be little hope that Eliezer will publish his Timeless Decision Theory any time soon, I decided to try to synthesize some of the ideas discussed in this forum, along with a few of my own, into a coherent alternative that is hopefully not so paradox-prone.

-- Wei Dai, Towards a New Decision Theory

However, TDT took a more CDT-inspired approach, while UDT choose the EDT-like direction. Nate Soares states: “Wei Dai doesn’t endorse FDT’s focus on causal-graph-style counterpossible reasoning; IIRC he’s holding out for an approach to counterpossible reasoning that falls out of evidential-style conditioning on a logically uncertain distribution”.

UDT 1.0 & 1.1

Let be a random variable representing observations, and be some particular value (some specific observations). is the prior probability distribution. is a random variable representing the utility. is the expectation operator. There is a set of possible actions, . EDT recommends the following action:[5]

In other words, it maximizes expectations conditioning on the action and on observations.

UDT 1.0 simply drops the observations:

This captures the idea that you’re making the decision you would have liked to commit to beforehand. A UDT agent is supposed to behave as if it had made all commitments which it would have liked to make, but without needing to explicitly consider everything beforehand.

However, Wei Dai noticed a problem with UDT 1.0, as detailed in Explicit Optimization of Global Strategy (Fixing a Bug in UDT1). In some cases, UDT 1.0 fails to coordinate with itself. The fix is as follows, with being policies, IE, functions from observations to actions:

In words: we simply choose an optimal policy and act accordingly, rather than choosing each action to individually maximize prior expected utility. This eliminates self-coordination problems which can arise. For example, suppose that we flip a coin and show it to a UDT 1.0 agent. We then ask the agent to choose between $10 and $20. The catch is that we also simulate what it would have said if it saw the coin land the other way, and give it $0 if it disagrees with its counterfactual twin. UDT 1.0 can choose the $10 in this situation, depending on its prior. UDT 1.1 will always choose $20. (Assuming the agent likes money, of course.)

Logical Uncertainty

A robust theory of logical uncertainty is essential to a full formalization of UDT. A UDT agent must calculate probabilities and expected values on the outcome of its possible actions in all possible worlds—sequences of observations and its own actions. However, it does not know its own actions in all possible worlds. (The whole point is to derive its actions.) On the other hand, it does have some knowledge about its actions, just as you know that you are unlikely to walk straight into a wall the next chance you get. So, the UDT agent models itself as an algorithm, and its probability distribution about what it itself will do is an important input into its maximization calculation.

Logical uncertainty is an area which has not yet been properly formalized, and much UDT research is focused on this area.

Blog posts

Relevant Comments

In addition to whole posts on UDT, there are also a number of comments which contain important information, often on less relevant posts.

External links

  1. ^

    Getting up early to get a good start on the day seems appealing the previous evening, but when the alarm rings, the relative reward of sleeping in another few minutes is larger.

  2. ^

    We can more rigorously define dynamic inconsistency as follows:

    If the agent is given the opportunity to commit to a decision early, there are cases where it strictly prefers a different choice than the one it would make in-the-moment.

    In Counterfactual Mugging, we understand Omega as “making a copy” of the agent at some point in time (EG, taking a detailed scan for use in a simulation). If a CDT agent is given the opportunity to commit to a decision in Counterfactual Mugging before this point in time, then it will think of the simulation as being downstream of its decision, so it will make the same decision as UDT. If a CDT agent is asked after the copy is made, but before the coin-flip result is revealed, then the CDT agent will decide to refuse, just like it does after the coin-flip is revealed.

    The analysis for EDT agents is a little bit simpler. If you ask the EDT agent to commit to a decision before the coin-flip is revealed, then it will agree to give up the money just like UDT would, since it sees the cost as smaller than the potential payout, and always thinks of its decision as strongly correlated with Omega’s simulation.[6] If the EDT agent is asked after the coin-flip, then EDT refuses.

  3. ^

    This supposes that the agent isn’t doing any anthropic reasoning; specifically, they don’t put any probability on the possibility that they themselves are the agent inside Omega’s hypothetical simulation. If the agent reasons anthropically, CDT and EDT can reach the same conclusion as UDT.

  4. ^

    There are several different theories of causality, meaning there is a different version of CDT for each alternative.

  5. ^

    In contrast, CDT chooses the following:


    The only difference between this and CDT is that the “do” operator has been wrapped around the choice of action. This notation says that the action is an “intervention” rather than a normal Bayesian conditioning. In the context of a causal Bayesian network, we cut the parents of variables with a “do” before we make inferences (again assuming Judea Pearl’s theory of causality, rather than alternatives).

  6. ^

    Note that this is a questionable assumption: EDT need not think its decision is correlated with the simulation. This is merely a typical part of the problem set-up.

UDT shows that de­ci­sion the­ory is more puz­zling than ever

Wei Dai13 Sep 2023 12:26 UTC
206 points
55 comments1 min readLW link

Towards a New De­ci­sion Theory

Wei Dai13 Aug 2009 5:31 UTC
83 points
148 comments6 min readLW link

Ex­plicit Op­ti­miza­tion of Global Strat­egy (Fix­ing a Bug in UDT1)

Wei Dai19 Feb 2010 1:30 UTC
55 points
38 comments2 min readLW link

List of Prob­lems That Mo­ti­vated UDT

Wei Dai6 Jun 2012 0:26 UTC
43 points
11 comments1 min readLW link

Another at­tempt to ex­plain UDT

cousin_it14 Nov 2010 16:52 UTC
70 points
61 comments2 min readLW link

Why (and why not) Bayesian Up­dat­ing?

Wei Dai16 Nov 2009 21:27 UTC
35 points
26 comments2 min readLW link

Up­date­less­ness doesn’t solve most problems

Martín Soto8 Feb 2024 17:30 UTC
130 points
44 comments12 min readLW link

“UDT2” and “against UD+ASSA”

Wei Dai12 May 2019 4:18 UTC
50 points
7 comments7 min readLW link

What is Wei Dai’s Up­date­less De­ci­sion The­ory?

AlephNeil19 May 2010 10:16 UTC
53 points
70 comments7 min readLW link

What Are Prob­a­bil­ities, Any­way?

Wei Dai11 Dec 2009 0:25 UTC
49 points
89 comments2 min readLW link

UDT can learn an­thropic probabilities

cousin_it24 Jun 2018 18:04 UTC
54 points
10 comments3 min readLW link

UDT as a Nash Equilibrium

cousin_it6 Feb 2018 14:08 UTC
18 points
17 comments1 min readLW link

UDT1.01: Lo­cal Affine­ness and In­fluence Mea­sures (2/​10)

Diffractor31 Mar 2024 7:35 UTC
24 points
0 comments14 min readLW link

UDT1.01: The Story So Far (1/​10)

Diffractor27 Mar 2024 23:22 UTC
37 points
6 comments13 min readLW link

UDT1.01: Plannable and Un­planned Ob­ser­va­tions (3/​10)

Diffractor12 Apr 2024 5:24 UTC
31 points
0 comments7 min readLW link

UDT1.01 Essen­tial Mis­cel­lanea (4/​10)

Diffractor14 Apr 2024 2:23 UTC
19 points
0 comments10 min readLW link

UDT1.01: Log­i­cal In­duc­tors and Im­plicit Beliefs (5/​10)

Diffractor18 Apr 2024 8:39 UTC
33 points
2 comments19 min readLW link

A Short Note on UDT

Chris_Leong8 Aug 2018 13:27 UTC
11 points
9 comments1 min readLW link

Where do self­ish val­ues come from?

Wei Dai18 Nov 2011 23:52 UTC
70 points
62 comments2 min readLW link

In Defense of Open-Minded UDT

abramdemski12 Aug 2024 18:27 UTC
77 points
27 comments11 min readLW link

Up­date­less­ness and Son of X

Scott Garrabrant4 Nov 2016 22:58 UTC
29 points
8 comments3 min readLW link

Tor­ture vs. Dust vs. the Pre­sump­tu­ous Philoso­pher: An­thropic Rea­son­ing in UDT

Wei Dai3 Sep 2009 23:04 UTC
37 points
29 comments2 min readLW link

The Ab­sent-Minded Driver

Wei Dai16 Sep 2009 0:51 UTC
47 points
151 comments3 min readLW link

Log­i­cal De­ci­sion The­o­ries: Our fi­nal failsafe?

Noosphere8925 Oct 2022 12:51 UTC
−7 points
8 comments1 min readLW link
(www.lesswrong.com)

An ex­pla­na­tion of de­ci­sion theories

metachirality1 Jun 2023 3:42 UTC
20 points
4 comments5 min readLW link

Disen­tan­gling four mo­ti­va­tions for act­ing in ac­cor­dance with UDT

Julian Stastny5 Nov 2023 21:26 UTC
33 points
3 comments7 min readLW link

The lat­tice of par­tial updatelessness

Martín Soto10 Feb 2024 17:34 UTC
21 points
5 comments5 min readLW link

Con­cep­tual Prob­lems with UDT and Policy Selection

abramdemski28 Jun 2019 23:50 UTC
64 points
16 comments9 min readLW link

An im­ple­men­ta­tion of modal UDT

Benya_Fallenstein11 Feb 2015 6:02 UTC
8 points
0 comments1 min readLW link

FDT defects in a re­al­is­tic Twin Pri­son­ers’ Dilemma

SMK15 Sep 2022 8:55 UTC
38 points
1 comment26 min readLW link

FDT is not di­rectly com­pa­rable to CDT and EDT

SMK29 Sep 2022 14:42 UTC
40 points
8 comments21 min readLW link

Open-minded updatelessness

10 Jul 2023 11:08 UTC
65 points
21 comments12 min readLW link