# Another attempt to explain UDT

(Attention conservation notice: this post contains no new results, and will be obvious and redundant to many.)

Not everyone on LW understands Wei Dai’s updateless decision theory. I didn’t understand it completely until two days ago. Now that I had the final flash of realization, I’ll try to explain it to the community and hope my attempt fares better than previous attempts.

It’s probably best to avoid talking about “decision theory” at the start, because the term is hopelessly muddled. A better way to approach the idea is by examining what we mean by “truth” and “probability” in the first place. For example, is it meaningful for Sleeping Beauty to ask whether it’s Monday or Tuesday? Phrased like this, the question sounds stupid. Of course there’s a fact of the matter as to what day of the week it is! Likewise, in all problems involving simulations, there seems to be a fact of the matter whether you’re the “real you” or the simulation, which leads us to talk about probabilities and “indexical uncertainty” as to which one is you.

At the core, Wei Dai’s idea is to boldly proclaim that, counterintuitively, you can act as if there were *no fact of the matter* whether it’s Monday or Tuesday when you wake up. Until you learn which it is, you think it’s *both*. You’re all your copies at once.

More formally, you have an initial distribution of “weights” on possible universes (in the currently most general case it’s the Solomonoff prior) that you *never update at all*. In each individual universe you have a utility function over what happens. When you’re faced with a decision, you find all copies of you in the entire “multiverse” that are faced with the same decision (“information set”), and choose the decision that *logically implies* the maximum sum of resulting utilities weghted by universe-weight. If you possess some useful information about the universe you’re in, it’s *magically taken into account* by the choice of “information set”, because logically, your decision cannot affect the universes that contain copies of you with *different* states of knowledge, so they only add a constant term to the utility maximization.

Note that the theory, as described above, has ho notion of “truth” and “probability” divorced from decision-making. That’s how I arrived at understanding it: in The Strong Occam’s Razor I asked whether it makes sense to “believe” one physical theory over another which makes the same predictions. For example, is hurting a human in a sealed box morally equivalent to not hurting him? After all, the laws of physics *could* make a localized exception to save the human from harm. UDT gives a very definite answer: there’s *no* fact of the matter as to which physical theory is “correct”, but you refrain from pushing the button anyway, because it hurts the human more in universes with simpler physical laws, which have more weight according to our “initial” distribution. This is an attractive solution to the problem of the “implied invisible”—possibly even more attractive than Eliezer’s own answer.

As you probably realize by now, UDT is a very sharp tool that can give simple-minded answers to all our decision-theory puzzles so far—even if they involve copying, amnesia, simulations, predictions and other tricks that throw off our approximate intuitions of “truth” and “probability”. Wei Dai gave a detailed example in The Absent-Minded Driver, and the method carries over almost mechanically to other problems. For example, Counterfactual Mugging: by assumption, your decision logically affects both heads-universe and tails-universe, which (also by assumption) have equal weight, so by agreeing to pay you win more cookies overall. Note that updating on the knowledge that you are in tails-universe (because Omega showed up) doesn’t affect anything, because the theory is “updateless”.

At this point some may be tempted to switch to True Believer mode. Please don’t. Just like Bayesianism, utilitarianism, MWI or the Tegmark multiverse, UDT is an idea that’s *irresistibly delicious* to a certain type of person who puts a high value on clarity. And they all play so well together that it *can’t* be an accident! But what does it even mean to consider a theory “true” when it says that our primitive notion of “truth” isn’t “true”? :-) Me, I just consider the idea very fruitful; I’ve been contributing new math to it and plan to do so in the future.

- Comparison of decision theories (with a focus on logical-counterfactual decision theories) by 16 Mar 2019 21:15 UTC; 66 points) (
- Multiverse-wide cooperation in a nutshell by 2 Nov 2017 10:17 UTC; 61 points) (EA Forum;
- 12 Jun 2012 0:11 UTC; 0 points) 's comment on You’re in Newcomb’s Box by (

FWIW, I wrote up brief formal descriptions of UDT1 and UDT1.1 a few months back. So far, none of the DT experts have complained about their accuracy :).

Interesting. So far I’ve avoided most of the posts explaining aspects of TDT/UDT/ADT because I wanted to see if I could figure out a decision theory that correctly handles newcomblike and anthropic problems, just as an intellectual challenge to myself, and that’s pretty much the solution I’ve been working on (though only in informal terms so far).

Perhaps at this point it would be best for me to just catch up on the existing developments in decision theory and see if I’m capable of any further contributions. What open problems in decision theory remain?

Oh, lots of open problems remain. Here’s a handy list of what I have in mind right now:

1) 2TDT-1CDT.

2) “Agent simulates predictor”, or ASP: if you have way more computing power than Omega, then Omega can predict you can obtain its decision just by simulation, so you will two-box; but obviously this isn’t what you want to do.

3) “The stupid winner paradox”: if two superintelligences play a demand game for $10, presumably they can agree to take $5 each to avoid losing it all. But a human playing against a superintelligence can just demand $9, knowing the superintelligence will predict his decision and be left with only $1.

4) “A/B/~CON”: action A gets you $5, action B gets you $10. Additionally you will receive $1 if inconsistency of PA is ever proved. This way you can’t write a terminating utility() function, but can still define the value of utility axiomatically. This is supposed to exemplify all the tractable cases where one action is clearly superior to the other, but total utility is uncomputable.

5) The general case of agents playing a non-zero-sum game against each other, knowing each other’s source code. For example, the Prisoner’s Dilemma with asymmetrized payoffs.

I could make a separate post from this list, but I’ve been making way too many toplevel posts lately.

How is this not resolved? (My comment and the following Eliezer’s comment; I didn’t re-read the rest of the discussion.)

This basically says that the predictor is a rock, doesn’t depend on agent’s decision, which makes the agent lose because of the way problem statement argues into stipulating (outside of predictor’s own decision process) that this must be a two-boxing rock rather than a one-boxing rock.

Same as (2). We stipulate the weak player to be a $9 rock. Nothing to be surprised about.

Requires ability to reason under logical uncertainty, comparing theories of consequences and not just specific possible utilities following from specific possible actions. Under any reasonable axioms for valuation of sets of consequences, action B wins.

Without good understanding of reasoning under logical uncertainty, this one remains out.

True, it doesn’t “depend” on the agent’s decision in the specific sense of “dependency” defined by currently-formulated UDT. The question (as with any proposed DT) is whether that’s in fact the right sense of “dependency” (between action and utility) to use for making decisions. Maybe it is, but the fact that UDT itself says so is insufficient reason to agree.

[EDIT: fixed typo]

The arguments behind UDT’s choice of dependence could prove strong enough to resolve this case as well. The fact that we are arguing about UDT’s answer in no way disqualifies UDT’s arguments.

My current position on ASP is that reasoning used in motivating it exhibits “explicit dependence bias”. I’ll need to (and probably will) write another top-level post on this topic to improve on what I’ve already written here and on the decision theory list.

About 2TDT-1CDT Wei didn’t seem to consider it 100% solved, as of this August or September if I recall right. You’ll have to ask him.

About ASP I agree with Gary: we do not yet completely understand the implications of the fact that a human like me can win in this situation, while UDT can’t.

About A/B/~CON I’d like to see some sort of mechanical reasoning procedure that leads to the answer. You do remember that Wei’s “existential” patch has been shown to not work, and my previous algorithm without that patch can’t handle this particular problem, right?

(For onlookers: this exchange refers to a whole lot of previous discussion on the decision-theory-workshop mailing list. Read at your own risk.)

Both outcomes are stipulated in the corresponding unrelated decision problems. This is an example of explicit dependency bias, where you consider a collection of problem statements indexed by agents’ algorithms, or agents’ decisions in an arbitrary way. Nothing follows from there being a collection with so and so consequences of picking a certain element of it. Relation between the agents and problem statements connected in such a collection is epiphenomenal to agents’ adequacy. I should probably write up a post to that effect. Only ambient consequences count, where you are already the agent that is part of (state of knowledge about) an environment and need to figure out what to do, for example which AI to construct and submit your decision to. Otherwise you are changing the problem, not reasoning about what to do in a given problem.

You can infer that A=>U \in {5,6} and B=>U \in {10,11}. Then, instead of only recognizing moral arguments of the form A=>U=U1, you need to be able to recognize such more general arguments. It’s clear which of the two to pick.

Is that the only basis on which UDT or a UDT-like algorithm would decide on such a problem? What about a variant where action A gives you $5, plus $6 iff it is ever proved that P≠NP, and action B gives you $10, plus $5 iff P=NP is ever proved? Here too you could say that A=>U \in {5,11} and B=>U \in {10,15}, but A is probably preferable.

If you can predict Omega, but Omega can still predict you well enough for the problem to be otherwise the same, then, given that you anticipate that if you predict Omega’s decision then you will two-box and lose, can’t you choose not to predict Omega (instead deciding the usual way, resulting in one-boxing), knowing that Omega will correctly predict that you will

notobtain its decision by simulation?(Sorry, I know that’s a cumbersome sentence; hope its meaning was clear.)

By “demand game” are you referring to the ultimatum game?

Is the $1 independent of whether you pick action A or action B?

1) The challenge is not solving this individual problem, but creating a general theory that happens to solve this special case automatically. Our current formalizations of UDT fail on ASP—they have no concept of “stop thinking”.

2) No, I mean the game where two players write each a sum of money on a piece of paper, if the total is over $10 then both get nothing, otherwise each player gets the sum they wrote.

3) Yeah, the $1 is independent.

Okay.

So, the superintelligent UDT agent can essentially see through both boxes (whether it wants to or not… or, rather, has no concept of not wanting to). Sorry if this is a stupid question, but wouldn’t UDT one-box anyway, whether the box is empty or contains $1,000,000, for the same reason that it pays in Counterfactual Mugging and Parfit’s Hitchhiker? When the box is empty, it takes the empty box so that there will be possible worlds where the box is

notempty (as it would pay the counterfactual mugger so that it will get $10,000 in the other half of worlds), and when the box is not empty, it takes only the one box (despite seeing the extra money in the other box) so that the world it’s in will weigh 50% rather than 0% (as it would pay the driver in Parfit’s Hitchhiker, despite it having “already happened”, so that the worlds in which the hitchhiker gives it a ride in the first place will weigh 100% rather than 0%).In our current implementations of UDT, the agent won’t find any proof that one-boxing leads to the predictor predicting one-boxing, because the agent doesn’t “know” that it’s only going to use a small fraction of its computing resources while searching for the proof. Maybe a different implementation could fix that.

It’s not an implementation of UDT in the sense that it doesn’t talk about all possible programs and universal prior on them. If you consider UDT as generalizing to ADT, where probability assumptions are dropped, then sure.

Um, I don’t consider the universal prior to be part of UDT proper. UDT can run on top of any prior, e.g. when you use it to solve toy problems as Wei did, you use small specialized priors.

There are no priors used in those toy problems, just one utility definition of interest.

Well, the use of any priors over possible worlds is the thing I find objectionable.

Cool, thanks.

So our deconstruction of Many Worlds vs Collapse is that there is a Many Worlds universe, and there are also single world Collapse universes for each sequence of ways the wave function could collapse. After pondering the difference between “worlds” and “universes”, it seems that the winner is still Many Worlds.

Right. (Amusing comment by the way!) Under UDT + simplicity prior, if some decision has different moral consequences under MWI and under Copenhagen, it seems we ought to act as if MWI were true. I still remain on the fence about “accepting” UDT, though.

I believe that the parts about computable worlds and universal prior are simply wrong for most preferences, and human preference in particular. On the other hand, UDT gives an example of a non-confused way of considering a decision problem (even if the decision problem is not the one allegedly considered, that is not a general case).

Eliezer has expressed the idea that using a Solomonoff-type prior over all programs doesn’t mean you believe the universe to be computable—it just means you’re trying to outperform all other (ETA: strike the word “other”) computable agents. This position took me a lot of time to parse, but now I consider it completely correct. Unfortunately the

reasonit’s correct is not easy to express in words, it’s just some sort of free-floating math idea in my head.Not sure how exactly this position meshes with UDT, though.

Also, if the universe is not computable, there may be hyperturing agents running around. You might even want to become one.

Outperform at generating “predictions”, but why is that interesting? Especially if universe is not computable, so that “predictions” don’t in fact have anything to do with the universe? (Which again assumes that “universe” is interesting.)

Why do you say “all

othercomputable agents”? Solomonoff induction is not computable.Right, sorry. My brain must’ve had a hiccup. It’s scary how much this happens. Amended the comment.

This is a good explanation, but I am wary of “I didn’t understand it completely until two days ago.” You might think that was kind of silly if you look back at it after your next insight.

One thing I would like to see from a “complete understanding” is a way that a computationally bounded agent could implement an approximation of uncomputable UDT.

In counterfactual mugging problems, we, trying to use UDT, assign equal weights to heads-universe and tails-universe, because we don’t see any reason to expect one to have a higher Solomonoff prior than the other. So we are using our logical uncertainty about the Solomonoff prior rather than the Solomonoff prior directly, as ideal UDT would. Understanding how handle and systematically reduce this logical uncertainty would be useful.

I object to “magically”, but this is otherwise correct.

I’ve read both the original UDT post and this one, and I’m still not sure I understand this basic point. The only way I can make sense out of it is as follows.

The UDT agent is modeled as a procedure

S, and its interaction with the universe as a programPcalling that procedure and doing something depending on the return value. Some utility is assigned to each such outcome. Now,Sknows the prior probability distribution over all programs that might be callingS, and there is also the inputX. So when the callS(X)occurs, the procedureSwill consider how the expected utility varies depending on whatS(X)evaluates to. However, changing the return value forS(X)affects only those terms in the expected utility calculation that correspond to those programs that might (logically) be callingSwith inputX, so whatever method is used to calculate that maximum, it effectively addresses only those programs. This restriction of the whole set of programs for the given input replaces the Bayesian updating, hence the name “updateless.”Is this anywhere close to the intended idea, or am I rambling in complete misapprehension? I’d be grateful if someone clarified that for me before I make any additional comments.

Yeah, it looks to me like you understand it correctly.

Could you state something you didn’t understand or disagreed with before the recent change of mind that lead to this post?

The post is a natural result of my reading about “the implied invisible”. I got a sort of tension in my mind between the statements “we only care about physics insofar as it yields predictions” and “unobservable events can have moral significance”, so I was on the lookout for a neat way to resolve the tension, then I let it stew for awhile and it became obvious that UDT’s answer is neat enough.

Can you unpack this? (As compared to ADT’s way of eliciting utility functions from definition of actual utility and actual action.)

Can you expand a little on this?

Under UDT you don’t even

noticethe fact that you “are” in tails-universe. You only care that there are two universes, with weights that have been “unchanging” since the beginning of time, and that your decision has certain logical implications in both of them. Then you inspect the sum of utility*weight and see that it’s optimal to pay up.Wait, you said:

But in the CM example, you did learn which it is. I am confused.

The CM example contains two “logical consequences” of your current state—two places that logically depend on your current decision, and so are “glued together” decision-theoretically—but the other “consequence” is

notthe you in heads-universe, which is occupying a different information state. It’s whatever determines Omega’s decision whether to give you money in heads-universe. It may be a simulation of you in tails-universe, or any other computation that provably returns the same answer, UDT doesn’t care.This seems like it’s tailored to solve reputation problems but causes problems in others (i.e. almost everything). Propagating ignorance forward seems unwise, even if we can set up hypotheticals where it is. It looks like the sunk costs fallacy becomes a gaping hole, and if you’re in a casino, you want to notice whether you’re in the heads universe or tails universe.

If you are going to make this sort of claim, which the people you are trying to convince clearly disagree with, you should automatically include at least one example.

UDT does not propagate ignorance. Instead of using evidence to build knowledge of a single universe, it uses that evidence to identify what effects a decision has, possibly in multiple universes.

Ah, I thought the mention of the sunk costs fallacy or the casino were sufficient as examples.

If I’m at a casino in front of a blackjack table, I first make the decision whether or not to sit down, then if I do how much to bet, then I see my cards, then I choose my response. I don’t see how UDT adds value when it comes to making any of those decisions, and it seems detrimental when making the last one (I don’t need to be thinking about what I drew in other universes).

For the problems where it does add value- dealing with paradoxes where you need to not betray people because that’s higher payoff than betraying them- it seems like an overly complex solution to a simple problem (care about your reputation). Essentially, it sounds to me a lot like “Odin made physics”- it sounds like a rationalization that adds complexity without adding value.

What’s the difference between this and “thinking ahead”? The only difference I see is it also suggests that you think behind, which puts you at risk for the sunk costs fallacy. In a few edge cases, that’s beneficial- the Omega paradoxes are designed to reward sunk cost thinking. But in real life, that sort of thinking

is fallacious. If someone offers to sell you a lottery ticket, and you know that ticket is not a winner, you should not buy it on the hopes that they would have offered you the same choice if the ticket was a winner.An example in this case would be actually describing a situation where an agent has to make a decision based on specified available information, and an analysis of what decision UDT and whatever decision theory you would like to compare it to make, and what happens to agents that make those decisions.

It is more like: relativity accurately describes things that go fast, and agrees with Newtonian physics about things that go slow like we are used to.

The sunk cost fallacy is caring more about making a previous investment payoff than getting the best payoff on your current decision. Where is the previous investment in counterfactual mugging?

I don’t have a proper response for you, but this came from thinking about your comments and you may be interested in it.

At the moment, I can’t wrap my head around what it actually means to do math with UDT. If it’s truly updateless, then it’s worthless because a decision theory that ignores evidence is terrible. If it updates in a bizarre fashion, I’m not sure how that’s different from updating normally. It seems like UDT is designed specifically to do well on these sorts of problems, but I think that’s a horrible criterion (as explained in the linked post), and I don’t see it behaving differently from simple second-order game theory. It’s different from first-order game theory, but that’s not its competitor.

UDT doesn’t ask you to think about what you drew in the other universes, because presumably the decisions you’d have made with different cards aren’t a logical consequence of the decision you make with your current cards. So you still end up maximizing the sum of utility*weight over all universes using the original non-updated weights, but the terms corresponding to the other universes happen to be constant, so you only look at the term corresponding to the current universe. UDT doesn’t add value here, but nor does it harm; it actually agrees with CDT in most non-weird situation, like your casino example. UDT is a generalization of CDT to the extreme cases where causal intuition fails—it doesn’t throw away the good parts.

Overall, it seems that my attempt at communication in the original post has failed. Oh well.

Only if it’s costless to check that your decisions in this universe don’t actually impact the other universes. UDT seems useful as a visualization technique in a few problems, but I don’t think that’s sufficient to give it a separate name (intended as speculation, not pronouncement).

Well, it was worth a shot. I think the main confusion on my end, which I think I’ve worked through, is that UDT is designed for problems I don’t believe can exist- and so the well is pretty solidly poisoned there.

UDT is supposed to be about fundamental math, not efficient algorithms. It’s supposed to

definewhat value we ought to optimize, in a way that hopefully accords with some of our intuitions. Before trying to build approximate computations, we ought to understand the ideal we’re trying to approximate in the first place. Real numbers as infinite binary expansions are pretty impractical for computation too, but it pays to get the definition right.Whether UDT is useful in reality is another question entirely. I’ve had a draft post for quite a while now titled “Taking UDT Seriously”, featuring such shining examples as: it pays to retaliate against bullies even at the cost of great harm to yourself, because anticipation of such retaliation makes bullies refrain from attacking counterfactual versions of you. Of course the actual mechanism by which bullies pick victims is different and entirely causal—maybe some sort of pheromones indicating willingness to retaliate—but it’s still instructive how an intuition from the platonic math of UDT unexpectedly transfers to the real world. There may be a lesson here.

That draft would be interesting to see completed, and it may help me see what UDT brings to the table. I find the idea of helping future me and other people in my world far more compelling than the idea of helping mes that don’t exist in my world- and so if I can come to the conclusion “stand up to bullies at high personal cost because doing so benefits you and others in the medium and long term,” I don’t see a need for nonexistent mes, and if I don’t think it’s worth it on the previously stated grounds, I don’t see the consideration of nonexistent mes changing my mind.

Again, that can be a potent visualization technique, by imagining a host of situations to move away from casuistry towards principles or to increase your weighting of your future circumstances or other’s circumstances. I’m not clear on how a good visualization technique makes for an ideal, though.

Not quite (or maybe you mean the same thing). Observations construct new agents that are able to affect different places in the environment than the original agent. Observations don’t constitute new knowledge, they constitute a change in the decision problem, replacing the agent part with a modified one.

Yes, or at least I agree your explanation.

(I think) I understand the first quote, and (I think) I understand the second quote, but they don’t seem to make sense together. If your decision can’t affect universes where you have different states of knowledge, why does the decision in the tails-universe affect the heads-universe? Is it because you might be in Omega’s simulation in the heads-universe and that therefore there’s an instance of you with the same knowledge in both?

Yes. A subroutine in someone else’s mind that’s thinking about what UDT would do given observations X, is itself an instance of UDT with inputs X. Even if it’s just a behaviorally-accurate model rather than a whole conscious simulation, and even if it’s in a world where X is false.

Aren’t the universe weights just probabilities by a different name? (Which leads directly to my formulation “UDT = choose the strategy that maximizes your unconditional expected utility”.) Or are the weights supposed to be subjective—they measure the degree to which you ‘care’ about the various worlds, and different agents can assign different weights without either one being ‘wrong’? But doesn’t that contradict that idea that we’re supposed to use the One True Solomonoff Prior?

Or are you not thinking of the weights as probabilities simply because UDT does away with the idea that one of the possible worlds is ‘true’ and all the others are ‘false’?

If the weights are probabilities, then what are they probabilities of? On some level, the notion of a computation “being” in one specific universe is incoherent. A sorting algorithm that is invoked to sort the array (1,3,2) finds itself simultaneously “in” all the universes that run it. From the computation’s point of view, there really is no fact of the matter as to “where” it is. Grasping this idea while thinking of your own thought process as a computation can really blow a person’s mind :-)

UDT doesn’t require the Solomonoff prior, it’s fine with whatever prior you choose. We already know that the Solomonoff prior can’t be the final solution, because it depends on the choice of programming langage (or universal machine). Me, I don’t like these huge priors. Applying UDT to little toy problems is mathematically interesting enough for me.

Well again, the part about probability suggests a fundamental misunderstanding of the Bayesian interpretation. The math never

hasto use the word “true”. If you think it does when expressed in English grammar, that seems like a flaw in English grammar.Of course the math still works if you replace “true” with “flubby”. But it’s an LW norm to make the epistemological (non-mathematical) claim that cognition has to be Bayesian to work, which is what I was addressing.

Or at least, an incompatibility of English grammar with the purpose in question. I have trouble calling it a flaw when something doesn’t do something well which it isn’t designed for (never mind that it can’t really be called designed at all).