If John’s physician prescribed a burdensome treatment because of a test whose false-positive rate is 99.9999%, John needs a lawyer rather than a statistician. :)

# Gary_Drescher

# A problem with Timeless Decision Theory (TDT)

Wow, this is great work—congratulations! If it pans out, it bridges a really fundamental gap.

I’m still digesting the idea, and perhaps I’m jumping the gun here, but I’m trying to envision a UDT (or TDT) agent using the sense of subjective probability you define. It seems to me that an agent can get into trouble even if its subjective probability meets the coherence criterion. If that’s right, some additional criterion would have to be required. (Maybe that’s what you already intend? Or maybe the following is just muddled.)

Let’s try invoking a coherent P in the case of a simple decision problem for a UDT agent. First, define G <--> P(“G”) < 0.1. Then consider the 5&10 problem:

If the agent chooses A, payoff is 10 if ~G, 0 if G.

If the agent chooses B, payoff is 5.

And suppose the agent can prove the foregoing. Then unless I’m mistaken, there’s a coherent P with the following assignments:

P(G) = 0.1

P(Agent()=A) = 0

P(Agent()=B) = 1

P(G | Agent()=B) = P(G) = 0.1

And P assigns 1 to each of the following:

P(“Agent()=A”) < epsilon

P(“Agent()=B”) > 1-epsilon

P(“G & Agent()=B”) / P(“Agent()=B”) = 0.1 +- epsilon

P(“G & Agent()=A”) / P(“Agent()=A”) > 0.5

The last inequality is consistent with the agent indeed choosing B, because the postulated conditional probability of G makes the expected payoff given A less than the payoff given B.

Is that P actually incoherent for reasons I’m overlooking? If not, then we’d need something beyond coherence to tell us which P a UDT agent should use, correct?

(edit: formatting)

- 6 Apr 2013 4:02 UTC; 8 points) 's comment on How about testing our ideas? by (

Just to elaborate a bit, Nesov’s scenario and mine share the following features:

In both cases, we argue that an agent should forfeit a smaller sum for the sake of a larger reward that would have been obtainted (couterfactually contingently on that forfeiture) if a random event had turned out differently than in fact it did (and than the agent knows it did).

We both argue for using the original coin-flip probability distribution (i.e., not-updating, if I’ve understood that idea correctly) for purposes of this decision, and indeed in general, even in mundane scenarios.

We both note that the forfeiture decision is easier to justify if the coin-toss was quantum under MWI, because then the original probability distribution corresponds to a real physical distribution of amplitude in configuration-space.

Nesov’s scenario improves on mine in several ways. He eliminates some unnecessary complications (he uses one simulation instead of two, and just tells the agent what the coin-toss was, whereas my scenario requires the agent to deduce that). So he makes the point more clearly, succinctly and dramatically. Even more importantly, his analysis (along with Yudkowsky, Dai, and others here) is more formal than my ad hoc argument (if you’ve looked at Good and Real, you can tell that formalism is not my forte.:)).

I too have been striving for a more formal foundation, but it’s been elusive. So I’m quite pleased and encouraged to find a community here that’s making good progress focusing on a similar set of problems from a compatible vantage point.

Thanks, Eliezer—that’s a clear explanation of an elegant theory. So far, TDT (I haven’t looked carefully at UDT) strikes me as more promising than any other decision theory I’m aware of (including my own efforts, past and pending). Congratulations are in order!

I agree, of course, that TDT doesn’t make the A6/A7 mistake. That was just a simple illustration of the need, in counterfactual reasoning (broadly construed), to specify somehow what to hold fixed and what not to, and that different ways of doing so specify different senses of counterfactual inference (i.e., that there are different kinds of ‘if-counterfactually’). If counterfactual inference is construed a la Pearl, for example, then such inferences (causal-counterfactual) correspond to causal links (if-causally).

As you say, TDT’s utility formula doesn’t perform general logical inferences (or evidential-counterfactual inferences) from the antecedents it evaluates (i.e. the candidate outputs of the Platonic computation). Rather, the utility formula performs causal-counterfactual inferences from the set of nodes that designate the outputs of the Platonic computation, in all places where that Platonic computation is approximately physically instantiated.

However, it seems to me we can, if we wish, use TDT to define what we can call a TDT-counterfactual that tells us would be true ‘if-timelessly’ a particular physical agent’s particular physical action were to occur. In particular, whereas CDT says that what would be true (if-causally) consists of what’s causally downstream from that action, TDT says that what would be true (if-timelessly) consists of what’s causally downstream from the output of the suitably-specified Platonic computation that the particular physical agent approximately implements, and also what’s causally downstream from that same Platonic computation in all other places where that computation is approximately physically instantiated. (And the physical TDT agent argmaxes over the utilities of the TDT-counterfactual consequences of that agent’s candidate actions.)

I think there are a few reasons we might sometimes find it useful to think in terms of the TDT-counterfactual consequences of a physical agent’s actions, rather than directly in terms of the standard TDT formulation (even though they’re merely two different ways of expressing the same decision theory, unless I’ve misunderstood).

The TDT-counterfactual perspective places TDT in a common framework with other decision theories that (implicitly or explicitly) use other kinds of counterfactual reasoning, starting with a physical agent’s action as the antecedent. Then we can apply some meta-criterion to ask which of those alternative theories is correct, and why. (That was the intuition behind my MCDT proposal, although MCDT itself was hastily specified and too simpleminded to be correct.)

Plausibly, people are agents who think in terms of the counterfactual consequences of an action, rather than being hardwired to use TDT. If we are to choose to act in accordance with TDT from now on (or, equivalently, if we are to build AIs who act in accordance with TDT), we need to be persuaded that doing so is for the best (even if e.g. a Newcomb snapshot was already taken before we became persuaded). (I’m assuming here that our extant choice machinery allows us the flexibility to be persuaded about what sort of counterfactual to use; if not, alas, we can’t necessarily get there from here).

In the standard formulation of TDT, you effectively view yourself as an abstract computation with one or more approximate physical instantiations, and you ask what you (thus construed) cause (i.e. what follows causal-counterfactually). In the alternative formulation, I view myself as a particular physical agent that is among one or more approximate instantiations of an abstract computation, and I ask what follows TDT-counterfactually from what I (thus construed) choose.

The original formulation seems to require a precommitment to identify oneself with all instantiations (in the causal net) of the abstract computation (or at least seems to require that in order for us non-TDT agents to decide to emulate TDT). And that identification is indeed plausible in the case of fairly exact replication. But consider, say, a 1-shot PD game between Eliezer and me. Our mutual understanding of reflexive consistency would let us win. And I agree that we both approximately instantiate, at some level of abstraction, a common decision computation, which is what lets the TDT framework apply and lets us both win.

But (in contrast with an exact-simulation case) that common computation is at a level of abstraction that does not preserve our respective personal identities. (That’s kind of the point of the abstraction. My utility function for the game places value on Gary’s points and not Eliezer’s points; the common abstract computation lacks that bias.) So I would hesitate to identify either of us with the common abstraction. (And I see in other comments that Eliezer explicitly agrees.) Rather, I’d like to reason that if-timelessly I, Gary, choose ‘Cooperate’, then so does Eliezer. That way, “I am you as you are me” emerges as a (metaphorical) conclusion about the situation (we each have a choice about the other’s action in the game, and are effectively acting together) rather than being needed as the point of departure.

Again, the foregoing is just an alternative but equivalent (unless I’ve erred) way of viewing TDT, an alternative that may be useful for some purposes.

My book discusses a similar scenario: the dual-simulation version of Newcomb’s Problem (section 6.3), in the case where the large box is empty (no $1M) and (I argue) it’s still rational to forfeit the $1K. Nesov’s version nicely streamlines the scenario.

According to information his family graciously posted to his blog, the cause of death was occlusive coronary artery disease with cardiomegaly.

Just to clarify, I think your analysis here doesn’t apply to the transparent-boxes version that I presented in Good and Real. There, the predictor’s task is not necessarily to predict what the agent does for real, but rather to predict what the agent would do in the event that the agent sees $1M in the box. (That is, the predictor simulates what—according to physics—the agent’s configuration would do, if presented with the $1M environment; or equivalently, what the agent’s ‘source code’ returns if called with the $1M argument.)

If the agent would one-box if $1M is in the box, but the predictor leaves the box empty, then the predictor has not predicted correctly, even if the agent (correctly) two-boxes upon seeing the empty box.

In April 2010 Gary Drescher proposed the “Agent simulates predictor” problem, or ASP, that shows how agents with lots of computational power sometimes fare worse than agents with limited resources.

Just to give due credit: Wei Dai and others had already discussed Prisoner’s Dilemma scenarios that exhibit a similar problem, which I then distilled into the ASP problem.

2) Treat differently mathematical knowledge that we learn by genuinely mathematical reasoning and by physical observation. In this case we know (D xor E) not by mathematical reasoning, but by physically observing a box whose state we believe to be correlated with D xor E. This may justify constructing a causal DAG with a node descending from D and E, so a counterfactual setting of D won’t affect the setting of E.

Perhaps I’m misunderstanding you here, but D and E are Platonic computations. What does it mean to construct a causal DAG among Platonic computations? [EDIT: Ok, I may understand that a little better now; see my edit to my reply to (1).] Such a graph links together general mathematical facts, so the same issues arise as in (1), it seems to me: Do the links correspond to logical inference, or something else? What makes the graph acyclic? Is mathematical causality even coherent? And if you did have a module that can detect (presumably timeless) causal links among Platonic computations, then why not use that module directly to solve your decision problems?

Plus I’m not convinced that there’s a meaningful distinction between math knowledge that you gain by genuine math reasoning, and math knowledge that you gain by physical observation.

Let’s say, for instance, that I feed a particular conjecture to an automatic theorem prover, which tells me it’s true. Have I then learned that math fact by genuine mathematical reasoning (performed by the physical computer’s Platonic abstraction)? Or have I learned it by physical observation (of the physical computer’s output), and hence be barred from using that math fact for purposes of TDT’s logical-dependency-detection? Presumably the former, right? (Or else TDT will make even worse errors.)

But then suppose the predictor has simulated the universe sufficiently to establish that U (the universe’s algorithm, including physics and initial conditions) leads to there being $1M in the box in this situation. That’s a mathematical fact about U, obtained by (the simulator’s) mathematical reasoning. Let’s suppose that when the predictor briefs me, the briefing includes mention of this mathematical fact. So even if I keep my eyes closed and never physically see the $1M, I can rely instead on the corresponding mathematically derived fact.

(Or more straightforwardly, we can view the universe itself as a computer that’s performing mathematical reasoning about how U unfolds, in which case any physical observation is intrinsically obtained by mathematical reasoning.)

1) Construct a full-blown DAG of math and Platonic facts, an account of which mathematical facts make other mathematical facts true, so that we can compute mathematical counterfactuals.

“Makes true” means logically implies? Why would that graph be acyclic? [EDIT: Wait, maybe I see what you mean. If you take a pdf of your beliefs about various mathematical facts, and run Pearl’s algorithm, you should be able to construct an acyclic graph.]

Although I know of no worked-out theory that I find convincing, I believe that counterfactual inference (of the sort that’s appropriate to use in the decision computation) makes sense with regard to events in universes characterized by certain kinds of physical laws. But when you speak of mathematical counterfactuals more generally, it’s not clear to me that that’s even coherent.

Plus, if you did have a general math-counterfactual-solving module, why would you relegate it to the logical-dependency-finding subproblem in TDT, and then return to the original factored causal graph? Instead, why not cast the whole problem as a mathematical abstraction, and then directly ask your math-counterfactual-solving module whether, say, (Platonic) C’s one-boxing counterfactually entails (Platonic) $1M? (Then do the argmax over the respective math-counterfactual consequences of C’s candidate outputs.)

- 27 Aug 2012 20:11 UTC; 13 points) 's comment on Stupid Questions Open Thread Round 4 by (

I think this problem is based (at least in part) on an incoherence in the basic transparent box variant of Newcomb’s problem.

If the subject of the problem will two-box if he sees the big box has the million dollars, but will one-box if he sees the big box is empty. Then there is no action Omega could take to satisfy the conditions of the problem.

The rules of the transparent-boxes problem (as specified in

*Good and Real*) are: the predictor conducts a simulation that tentatively presumes there will be $1M in the large box, and then puts $1M in the box (for real) iff the simulation showed one-boxing. So the subject you describe gets an empty box and one-boxes, but that doesn’t violate the conditions of the problem, which do not require the empty box to be predictive of the subject’s choice.- 5 Feb 2010 13:31 UTC; 0 points) 's comment on A problem with Timeless Decision Theory (TDT) by (

Sorry, the above post omits some background information. If E “depends on” C in the particular sense defined, then the TDT algorithm mandates that when you “surgically alter” the output of C in the factored causal graph, you then you must correspondingly surgically alter the output of E in the graph.

So it’s not at all a matter of any intuitive connotation of “depends on”. Rather, “depends on”, in this context, is purely a technical term that designates a particular test that the TDT algorithm performs. And the algorithm’s prescribed use of that test culminates in the algorithm making the wrong decision in the case described above (namely, it tells me to two-box when I should one-box).

[In TDT] If you desire to smoke cigarettes, this would be observed and screened off by conditioning on the fixed initial conditions of the computation—the fact that the utility function had a positive term for smoking cigarettes, would already tell you that you had the gene. (Eells’s “tickle”.) If you can’t observe your own utility function then you are actually taking a step outside the timeless decision theory as formulated.

Consider a different scenario where people with and without the gene both desire to smoke, but the gene makes that desire stronger, and the stronger it is, the more likely one is to smoke. Even when you observe your own utility function, you don’t necessarily have a clue whether the utility assigned to smoking is the level caused by the gene or else by the gene’s absence. So your observation of your utility function doesn’t necessarily help you to move away from the base-level probability of having cancer here.

For now, let me just reply to your incidental concluding point, because that’s brief.

I disagree that the red/green problem is unsolvable. I’d say the solution is that, with respect to the available information, both choices have equal (low) utility, so it’s simply a toss-up. A correct decision algorithm will just flip a coin or whatever.

Having done so, will a correct decision algorithm try to revise its choice in light of its (tentative) new knowledge of what its choice is? Only if it has nothing more productive to do with its remaining time.

Exactly. Unless “cultivating a disposition” amounts to a (subsequent-choice-circumventing) precommitment, you still need a reason, when you make that subsequent choice, to act in accordance with the cultivated disposition. And there’s no good explanation for why that reason should care about whether or not you previously cultivated a disposition.

I don’t think DBDT gives the right answer if the predictor’s snapshot of the local universe-state was taken before the agent was born (or before humans evolved, or whatever), because the “critical point”, as Fisher defines it, occurs too late. But a one-box chooser can still expect a better outcome.

- 27 Jun 2011 20:45 UTC; 7 points) 's comment on Discussion: Yudkowsky’s actual accomplishments besides divulgation by (

That’s very elegant! But the trick here, it seems to me, lies in the rules for setting up the world program in the first place.

First, the world-program’s calling tree should match the structure of TDT’s graph, or at least match the graph’s (physically-)causal links. The physically-causal part of the structure tends to be uncontroversial, so (for present purposes) I’m ok with just stipulating the physical structure for a given problem.

But then there’s the choice to use the same variable S in multiple places in the code. That corresponds to a choice (in TDT) to splice in a logical-dependency link from the Platonic decision-computation node to other Platonic nodes. In both theories, we need to be precise about the criteria for this dependency. Otherwise, the sense of dependency you’re invoking might turn out to be wrong (it makes the theory prescribe incorrect decisions) or question-begging (it implicitly presupposes an answer to the key question that the theory itself is supposed to figure out for us, namely what things are or are not counterfactual consequences of the decision-computation).

So the question, in UDT1, is: under what circumstances do you represent two real-world computations as being tied together via the same variable in a world-program?

That’s perhaps straightforward if S is implemented by literally the same physical state in multiple places. But as you acknowledge, you might instead have distinct Si’s that diverge from one another for some inputs (though not for the actual input in this case). And the different instances need not have the same physical substrate, or even use the same algorithm, as long as they give the same answers when the relevant inputs are the same, for some mapping between the inputs and between the outputs of the two Si’s. So there’s quite a bit of latitude as to whether to construe two computations as “logically equivalent”.

So, for example, for the conventional transparent-boxes problem, what principle tells us to formulate the world program as you proposed, rather than having:

`def P1(i): const S1; E = (Pi(i) == 0) D = Omega_Predict(S1, i, "box contains $1M") if D ^ E: C = S(i, "box contains $1M") payout = 1001000 - C * 1000 else: C = S(i, "box is empty") payout = 1000 - C * 1000`

(along with a similar program P2 that uses constant S2, yielding a different output from Omega_Predict)?

This alternative formulation ends up telling us to two-box. In this formulation, if S and S1 (or S and S2) are in fact the same, they would (counterfactually) differ if a different answer (than the actual one) were output from S—which is precisely what a causalist asserts. (A similar issue arises when deciding what facts to model as “inputs” to S—thus forbidding S to “know” those facts for purposes of figuring out the counterfactual dependencies—and what facts to build instead into the structure of the world-program, or to just leave as implicit background knowledge.)

So my concern is that UDT1 may covertly beg the question by selecting, among the possible formulations of the world-program, a version that turns out to presuppose an answer to the very question that UDT1 is intended to figure out for us (namely, what counterfactually depends on the decision-computation). And although I agree that the formulation you’ve selected in this example is correct and the above alternative formulation isn’t, I think it remains to explain why.

(As with my comments about TDT, my remarks about UDT1 are under the blanket caveat that my grasp of the intended content of the theories is still tentative, so my criticisms may just reflect a misunderstanding on my part.)

If we go down avenue (1), then we give primacy to our intuition that if-counterfactually you make a different decision, this logically controls the mathematical fact (D xor E) with E held constant, but does not logically control E with (D xor E) held constant. While this does sound intuitive in a sense, it isn’t quite nailed down—after all, D is ultimately just as constant as E and (D xor E), and to change any of them makes the model equally inconsistent.

I agree this sounds intuitive. As I mentioned earlier, though, nailing this down is tantamount to circling back and solving the full-blown problem of (decision-supporting) counterfactual reasoning: the problem of how to distinguish which facts to “hold fixed”, and which to “let vary” for consistency with a counterfactual antecedent.

In any event, is the idea to try to build a separate graph for math facts, and use that to analyze “logical dependency” among the Platonic nodes in the original graph, in order to carry out TDT’s modified “surgical alteration” of the original graph? Or would you try to build one big graph that encompasses physical and logical facts alike, and then use Pearl’s decision procedure without further modification?

If we view the physical observation of $1m as telling us the raw mathematical fact (D xor E), and then perform mathematical inference on D, we’ll find that we can affect E, which is not what we want.

Wait, isn’t it decision-computation C—rather than simulation D—whose “effect” (in the sense of logical consequence) on E we’re concerned about here? It’s the logical dependents of C that get surgically altered in the graph when C gets surgically altered, right? (I know C and D are logically equivalent, but you’re talking about inserting a physical node after D, not C, so I’m a bit confused.)

I’m having trouble following the gist of avenue (2) at the moment. Even with the node structure you suggest, we can still infer E from C and from the physical node that matches (D xor E)—unless the new rule prohibits relying on that physical node, which I guess is the idea. But what exactly is the prohibition? Are we forbidden to infer any mathematical fact from any physical indicator of that fact? Or is there something in particular about node (D xor E) that makes it forbidden? (It would be circular to cite the node’s dependence on C in the very sense of “dependence” that the new rule is helping us to compute.)

This is very cool, and I haven’t digested it yet, but I wonder if it might be open to the criticism that you’re effectively postulating the favored answer to Newcomb’s Problem (and other such scenarios) by postulating that when you surgically alter one of the nodes, you correspondingly alter the nodes for the other instances of the computation. After all, the crux of the counterfactual-reasoning dilemma in Newcomb’s Problem (and similarly in the Prisoner’s Dilemma) is to jusftify the inference “If I choose both boxes, then (probably) so does the simulation (even if in fact I/it do not)” rather than “If I choose both boxes, then the simulation doesn’t necessarily match my choice (even though in fact it does)”. It could be objected that your formalism postulates the desired answer rather than giving a basis for deriving it—an objection that becomes more important when we move away from identical or functionally equivalent source code and start to consider approximate similarities. (See my criticism of Leslie (1991)’s proposal that you should make your choice as though you were also choosing on behalf of other agents of similar causal structure. If I’m not mistaken, your proposal seems to be a formalization of that idea.)

Here’s an alternative proposal.

Metacircular Decision Theory (MCDT)

For purposes of this discussion, let me just stipulate that subjective probabilities will be modeled as though they were quantum under MWI—that is, we’ll regard the entire distribution as part of the universe. That move will help with dual-simulation/counterfactual-mugging scenarios; but also, as I argued in Good and Real, we effectively make that move whenever we assign value to probabilistic outcomes even in nonesoteric situations (so we may as well avail ourselves of that move in the weird scenarios too, though eventually we need to justify the move).

Say we have an agent embodied in the universe. The agent knows some facts about the universe (including itself), has an inference system of some sort for expanding on those facts, and has a preference scheme that assigns a value to the set of facts, and is wired to select an action—specifically, the/an action that implies (using its inference system) the/a most-preferred set of facts.

But without further constraint, this process often leads to a contradiction. Suppose the agent’s repertoire of actions is A1, …An, and the value of action Ai is simply i. Say the agent starts by considering the action A7, and dutifully evaluates it as 7. Next, it contemplates the action A6, and reasons as follows: “Suppose I choose A6. I know I’m a utility-maximizing agent, and I already know there’s another choice that has value 7. Therefore, if follows from my (hypothetical) choice of A6 that A6 has a value of at least 7.” But that inference, while sound, contradicts the fact that A6′s value is 6.

Unsurprisingly, a false premise leads to a contradiction. To avoid contradiction, we need to limit the set of facts that the agent is allowed to reason from when making inferences about a hypothetical action. But which facts do we omit? Different choices yield different preferred actions. If we omit the fact that val(A6)=6, then we can infer val(A6)>=7; if instead we omit the fact that the agent utility-maximizes, then we can infer val(A6)=6 without contradiction (or at least without the particular contradiction above).

So this is the usual full-blown problem of counterfactual inference: which things do we “hold fixed” when contemplating a counterfactual antecedent, and which do we “let vary” for consistency with that antecedent? Different choices here correspond to different decision theories. If the agent allows inferences (only) from all facts about physical law as applied to the future, and all facts about the past and present universe-state, except for facts about the agent’s internal decision-making state, then we get CDT. If we leave the criteria unspecified/ambiguous, we get EDT. If we allow the agent to reason from facts about the future as well as the past and present, we get FDT (Fatalist Decision Theory: choice is futile, which most people think follows from determinism).

MCDT’s proposed criterion is this: the agent makes a meta-choice about which facts to omit when making inferences about the hypothetical actions, and selects the set of facts which lead to the best outcome if the agent then evaluates the original candidate actions with respect to that choice of facts. The agent then iterates that meta-evaluation as needed (probably not very far) until a fixed point is reached, i.e. the same choice (as to which facts to omit) leaves the first-order choice unchanged. (It’s ok if that’s intractable or uncomputable; the agent can muddle through with some approximate algorithm.)

EDIT1: The algorithm also needs to check, when it evaluates a meta-level choice candidate, that the winning choice at the next level down is consistent with all known facts. If not, the meta-level candidate is eliminated from consideration. (Otherwise, the A6 choice could remain stable in the example above.)

EDIT2: Or rather, that consistency check can probably

substitute forthe additional meta-iterations.So e.g. in Newcomb’s Problem or the Prisoner’s Dilemma, the agent can calculate that it does better if it retains the fact that its dispositional-state/source-code is functionally equivalent to the simulation’s/other’s (but omits facts about which particular choice is made by both) than if it makes the CDT choice and omits the fact about equivalence, but keeps the facts about the simulation’s/other’s choice (or keeps some probability distribution about the simulation’s/other’s choice).

In other words, metacircular consistency isn’t just a

testthat we’d like the decision theory to pass. Metacircular consistencyisthe theory; itisthe algorithm.