step 1. You are an agent with source code read/write access. You suspect there will be (in the future) Omegas in your environment, posing tricky problems. At this point (step 1), you realize you should “preprocess” your own source code in such a way as to maximize expected utility in such problems.
This is closer to describing the self-modifying CDT approach. One of the motivations for development of TDT and UDT is that you don’t necessarily get an opportunity to do such self-modification beforehand, let alone to compute the optimal decisions for all possible scenarios you think might occur.
So the idea of UDT is that the design of the code should already suffice to guarantee that if you end up in a newcomblike situation you behave “as if” you did have the opportunity to do whatever precommitment would have been useful. When prompted for a decision, UDT asks “what is the (fixed) optimal conditional strategy” and outputs the result of applying that strategy to its current state of knowledge.
That is, for all causal graphs (possibly with Omega causal pathways), you find where nodes for [my source code goes here] are, and you “pick the optimal treatment regime”.
Basically this, except there’s no need to actually do it beforehand.
If you like, you can consider the UDT agent’s code itself to be the output of such “preprocessing”… except that there is no real pre-computation required, apart from giving the UDT agent a realistic prior.
Basically this, except there’s no need to actually do it beforehand.
Actually, no. To implement things correctly, UDT needs to determine its entire strategy all at once. It cannot decide whether to one-box or two-box in Newcomb just by considering the Newcomb that it is currently dealing with. It must also consider all possible hypothetical scenarios where any other agent’s action depends on whether or not UDT one-boxes.
Furthermore, UDT cannot decide what it does in Newcomb independently of what it does in the Counterfactual Mugging, because some hypothetical entity might give it rewards based on some combination of the two behaviors. UDT needs to compute its entire strategy (i.e. it’s response to all possible scenarios) all at the same time before it can determine what it should do in any particular situation [OK. Not quite true. It might be able to prove that whatever the optimal strategy is it involves doing X in situation Y without actually determining the optimal strategy. Then again, this seems really hard since doing almost anything directly from Kolmogorov priors is basically impossible].
To implement things correctly, UDT needs to determine its entire strategy all at once.
Conceptually, yes. The point is that you don’t need to actually literally explicitly compute your entire strategy at t=-∞. All you have to do is prove a particular property of the strategy (namely, its action in situation Y) at the time when you are asked for a decision.
Obviously, like every computational activity ever, you must still make approximations, because it is usually infeasible to make inferences over the entire tegmark-IV multiverse when you need to make a decision. An example of such approximations would be neglecting the measure of “entities that give it rewards based on some combination of [newcomb’s and counterfactual mugging]” in many situations because I expect such things to be rare (significantly rarer than newcomb’s and counterfactual mugging themselves).
One of the motivations for development of TDT and UDT is that you don’t necessarily get an opportunity to
do such self-modification beforehand,
Basically this, except there’s no need to actually do it beforehand.
Consider the set of all possible source codes an agent might have. This set is partitioned in two: those on which Omega rewards you (where you both one box, and your source code is such that Omega can tell you will), and those on which Omega punishes you (all others). Call the former set A, and the latter set B.
Agents are not guaranteed to start with a source code in set A, some start with source code in B. (Maybe they are classically trained causal decision theorists? Maybe they are skeptical about UDT? Maybe their programmers were careless? Who knows!) The point is, there comes a time in an agent’s life when it needs to grow up and move its source code to set A. Maybe it does not immediately self-modify to directly do UDTish things on Newcomb-like causal graphs, maybe it self-modifies to self-modify before being asked to one box.
But it is crucial for the agent to move itself from set B to set A at some point before Omega shows up. This is what I mean by step 1.
This is closer to describing the self-modifying CDT approach. One of the motivations for development of TDT and UDT is that you don’t necessarily get an opportunity to do such self-modification beforehand, let alone to compute the optimal decisions for all possible scenarios you think might occur.
So the idea of UDT is that the design of the code should already suffice to guarantee that if you end up in a newcomblike situation you behave “as if” you did have the opportunity to do whatever precommitment would have been useful. When prompted for a decision, UDT asks “what is the (fixed) optimal conditional strategy” and outputs the result of applying that strategy to its current state of knowledge.
Basically this, except there’s no need to actually do it beforehand.
If you like, you can consider the UDT agent’s code itself to be the output of such “preprocessing”… except that there is no real pre-computation required, apart from giving the UDT agent a realistic prior.
Actually, no. To implement things correctly, UDT needs to determine its entire strategy all at once. It cannot decide whether to one-box or two-box in Newcomb just by considering the Newcomb that it is currently dealing with. It must also consider all possible hypothetical scenarios where any other agent’s action depends on whether or not UDT one-boxes.
Furthermore, UDT cannot decide what it does in Newcomb independently of what it does in the Counterfactual Mugging, because some hypothetical entity might give it rewards based on some combination of the two behaviors. UDT needs to compute its entire strategy (i.e. it’s response to all possible scenarios) all at the same time before it can determine what it should do in any particular situation [OK. Not quite true. It might be able to prove that whatever the optimal strategy is it involves doing X in situation Y without actually determining the optimal strategy. Then again, this seems really hard since doing almost anything directly from Kolmogorov priors is basically impossible].
Conceptually, yes. The point is that you don’t need to actually literally explicitly compute your entire strategy at
t=-∞
. All you have to do is prove a particular property of the strategy (namely, its action in situation Y) at the time when you are asked for a decision.Obviously, like every computational activity ever, you must still make approximations, because it is usually infeasible to make inferences over the entire tegmark-IV multiverse when you need to make a decision. An example of such approximations would be neglecting the measure of “entities that give it rewards based on some combination of [newcomb’s and counterfactual mugging]” in many situations because I expect such things to be rare (significantly rarer than newcomb’s and counterfactual mugging themselves).
Consider the set of all possible source codes an agent might have. This set is partitioned in two: those on which Omega rewards you (where you both one box, and your source code is such that Omega can tell you will), and those on which Omega punishes you (all others). Call the former set A, and the latter set B.
Agents are not guaranteed to start with a source code in set A, some start with source code in B. (Maybe they are classically trained causal decision theorists? Maybe they are skeptical about UDT? Maybe their programmers were careless? Who knows!) The point is, there comes a time in an agent’s life when it needs to grow up and move its source code to set A. Maybe it does not immediately self-modify to directly do UDTish things on Newcomb-like causal graphs, maybe it self-modifies to self-modify before being asked to one box.
But it is crucial for the agent to move itself from set B to set A at some point before Omega shows up. This is what I mean by step 1.