This discussion and a previous conversation with Nate have helped me crystallize my thoughts on why I prefer CDT to any of the attempts to “fix” it using timelessness. Most of the material on TDT/UDT is too technical for me, so it is entirely possible that I am wrong; if there are errors in my reasoning, I would be very grateful if someone could point it out:
Any decision theory depends on the concept of choice: If there is no choice, there is no need for a decision theory. I have seen a quote attributed to Pearl to the effect that we can only talk about “interventions” at a level of abstraction where free will is apparent. This seems true of any decision theory. (Note: From looking at Google, it appears that the only verified source for this quotation is on Less Wrong).
CDT and TDT differ in how they operationalize choice, and therefore whether the decision theories are consistent with free will. In Causal Decision theory, the agents choose actions from a choice set. In contrast, from my limited understanding of TDT/UDT, it seems as if agents choose their source code. This is not only inconsistent with my (perhaps naive) subjective experience of free will, it also seems like it will lead to an incoherent concept of “choice” due to recursion.
Yeah. TDT/UDT agents don’t choose their source code. They just choose their strategy.
They happen to do this in a way that respects logical connections between events, whereas CDT reasons about its options in a way that neglects acausal logical connections (such as in the mirror token trade above).
None of the algorithms are making different assumptions about the nature of “free will” or whether you get to “choose your own code”: they differ only in how they construct their counterfactuals, with TDT/UDT respecting acausal connections that CDT neglects.
I recognize that you understand this much better than me, and that I am almost certainly wrong. I am continuing this discussion only to try to figure out where my lingering sense of confusion comes from.
If an agent is playing a game of PD against another agent running the same source code, and chooses to “cooperate” because he believes the that other agent will necessarily make the same choice: how is that not equivalent to choosing your source code?
It isn’t equivalent. The agent recognizes that their already-existing source code is what causes them to either cooperate or defect, and so because the other agent has the same source code that agent must make the same decision.
As for how the actual decision happens, the agent doesn’t “choose its source code”, it simply runs the source code and outputs “cooperate” or “defect” based on what the result of running that source code is.
As for how the actual decision happens, the agent doesn’t “choose its source code”, it simply runs the source code > and outputs “cooperate” or “defect” based on what the result of running that source code is.
This makes sense, but if it is true, I don’t understand in what sense a “choice” is made. It seems to me you have assumed away free will. Which is fine, it is probably true that free will does not exist. But if it is true, I don’t understand why there is any need for a decision theory, as no decisions are actually made.
Clearly you have a notion of what it means to “make a decision”. Doesn’t it make sense to associate this idea of “making a decision” with the notion of evaluating the outcomes from different (sometimes counterfactual) actions and then selecting one of those actions on the basis of those evaluations?
Surely if the notion of “choice” refers to anything coherent, that’s what it’s talking about? What matters is that the decision is determined directly through the “make a decision” process rather than independently of it.
Also, given that these “make a decision” processes (i.e. decision theories) are things that actually exist and are used, surely it also makes sense to compare different decision theories on the basis of how sensibly they behave?
You are probably right that I have a faulty notion of what it means to make a decision. I’ll have to think about this for a few days to see if I can update...
Basically, my point is that the “running the source code” part is where all of the interesting stuff actually happens, and that’s where the “choice” would actually happen.
It may be true that the agent “runs the source code and outputs the resulting output”, but in saying that I’ve neglected all of the cool stuff that happens when the source code actually gets run (e.g. comparing different options, etc.). In order to establish that source code A leads to output B you would need to talk about how source code A leads to output B, and that’s the interesting part! That’s the part that I associate with the notion of “choice”.
I don’t think you’ve misunderstood; in fact I share your position.
Do you also reject compatibilist accounts of free will? I think the basic point at issue here is whether or not a fully determined action can be genuinely ‘chosen’, any more than the past events that determine it.
The set of assumptions that undermines CDT also ensures that the decision process is nothing more than the deterministic consequence (give or take some irrelevant randomness) of an earlier state of the world + physical law. The ‘agent’ is a fully determined cog in a causally closed system.
In the same-source-code-PD, at the beginning of the decision process each agent knows that the end result will either be mutual cooperation or mutual defection, and also that the following propositions must either be all true or all false:
‘I was programmed to cooperate’
‘the other agent was programmed to cooperate’
‘I will cooperate’
‘the end result will be mutual cooperation’
The agent wants Proposition 4 -- and therefore all of the other propositions—to be true.
Since all of the propositions are known to share the same truth value, choosing to make Proposition 3 true is equivalent to choosing to make all four propositions true—including the two that refer to past events (Propositions 1 and 2). So either the agent can choose the truth value of propositions about the past, or else Proposition 3 is not really under the agent’s control.
I’d be interested to know whether those who disagree with me/us see a logical error above, or simply have a concept of choice/agency/free will/control that renders the previous paragraph either false or unproblematic (presumably because it allows you to single out Proposition 3 as uniquely under the agent’s control, or it isn’t so fussy about temporal order). If the latter, is this ultimately a semantic dispute? (I suspect that some will half-agree with that, but add that the incompatibilist notion of free will is at best empirically false and at worst incoherent. I think the charge of incoherence is false and the charge of empirical falsity is unproven, but I won’t go into that now.)
In any case, responses would be appreciated. (And if you think I’m completely mistaken or confused, please bear in mind that I made a genuine attempt to explain my position clearly!)
Decision theories are algorithms. Free will really doesn’t have much to do with them. A deterministic agent must still actually use some algorithm or another to map from sensory inputs and learned knowledge to effector outputs.
Thank you—this is exactly the point that I was trying to make, just stated much more articulately. I too would much appreciate responses to this, it would help me resolve some deep confusion about why very smart LessWrongers disagree with me about something important.
Hi, sorry, I think I misunderstood what you were unhappy with. I have not fully internalized what is happening, but my understanding (Nate please correct me if I am wrong):
We can have a “causal graph” (which will include exotic causal rules, like Omega deciding based on your source code), and in this “causal graph” we have places where our source code should go, possibly in multiple places. These places have inputs and outputs, and based on these inputs and outputs we can choose any function we like from a (let’s say finite for simplicity) set. We are choosing what goes into those places based on some optimality criterion.
But you might ask, isn’t that really weird, your own source code is what is doing the choosing, how can you be choosing it? (Is that what you mean by “incoherence due to recursion?”) The proposed way out is
this.
What we are actually choosing is from a set of possible source codes that embeds, via quining, the entire problem and varies only in how inputs/outputs are mapped in the right places. Due to the theorem in the link it is always possible to set it up in this way.
Note: the above does not contain the string “free will,” I am just trying to output my incomplete understanding of how the setup is mathematically coherent. I don’t think there is any “free will” in the g-formula either.
I realize the above is still not very clear (even to myself). So here is another attempt.
The key idea seems to be using Kleene’s theorem to reason about own source, while further having the ability to rewrite parts of own source code based on conclusions drawn. In this particular case, “choice” is defined as
thinking about own source code with the “causal graph” (with holes for your source code) embedded.
maximizing expected utility, and realizing it would be optimal if [a particular rule mapping inputs to outputs] should go into all the holes in the “causal graph” where our algorithm should go.
rewriting our own source such that everything stays the same, except in our representation of the “causal graph” we just do the optimal thing rather than redo the analysis again.
(???)
This entire complication goes away if you assume the problem set up just chooses input/output mappings, similarly to how we are choosing optimal treatment regimes in causal inference, without assuming that those mappings represent your own source code.
Thank you—Yes, that is what I meant by recursion, and your second attempt seems to go in the right direction to answer my concerns, but I’ll have to think about this for a while to see if I can be convinced.
As for the G-Formula: I don’t think there is free will in it either, just that in contrast with UDT/TDT, it is not inconsistent with my concept of free will.
As an interesting anecdote, I am a Teaching Assistant for Jamie (who came up with the G-Formula), so I have heard him lecture on it several times now. The last couple of years he brought up experiments that seemingly provide evidence against free will and promised to discuss the implications for his theory. Unfortunately, both years he ran out of time before he got around to it. I should ask him the next time I meet him.
re: free will, this is one area where Jamie and Judea agree, it seems.
I think one thing I personally (and probably others) find very confusing about UDT is reconciling the picture to be consistent with temporal constraints on causality. Nothing should create physical retro causality. Here is my posited sequence of events.
step 1. You are an agent with source code read/write access. You suspect there will be (in the future) Omegas in your environment, posing tricky problems. At this point (step 1), you realize you should “preprocess” your own source code in such a way as to maximize expected utility in such problems.
That is, for all causal graphs (possibly with Omega causal pathways), you find where nodes for [my source code goes here] are, and you “pick the optimal treatment regime”.
step 2. Omega scans your source code, and puts things in boxes based on examining this code, or simulating it.
step 3. Omega gives you boxes, with stuff already in them. You already preprocessed what to do, so you one box immediately and walk away with a million.
Given that Omega can scan your source, and given that you can credibly rewrite own decision making source code, there is nothing exotic in this sequence of steps, in particular there is no retro causality anywhere. It is just that there are some constraints (what people here call “logical counterfactuals”) that force the output of Omega’s sim at step 2, and your output at step 3 to coincide. This constraint is what lead you to preprocess to one box in step 1, by drawing an “exotic causal graph” with Omega’s sim creating an additional causal link that seemingly violates “no retro causality.”
The “decision” is in step 1. Had you counterfactually decided to not preprocess there, or preprocess to something else, you would walk away poorer in step 3. There is this element to UDT of “preprocessing decisions in advance.” It seems all real choice, that is examining of alternatives and picking one wisely, happens there.
step 1. You are an agent with source code read/write access. You suspect there will be (in the future) Omegas in your environment, posing tricky problems. At this point (step 1), you realize you should “preprocess” your own source code in such a way as to maximize expected utility in such problems.
This is closer to describing the self-modifying CDT approach. One of the motivations for development of TDT and UDT is that you don’t necessarily get an opportunity to do such self-modification beforehand, let alone to compute the optimal decisions for all possible scenarios you think might occur.
So the idea of UDT is that the design of the code should already suffice to guarantee that if you end up in a newcomblike situation you behave “as if” you did have the opportunity to do whatever precommitment would have been useful. When prompted for a decision, UDT asks “what is the (fixed) optimal conditional strategy” and outputs the result of applying that strategy to its current state of knowledge.
That is, for all causal graphs (possibly with Omega causal pathways), you find where nodes for [my source code goes here] are, and you “pick the optimal treatment regime”.
Basically this, except there’s no need to actually do it beforehand.
If you like, you can consider the UDT agent’s code itself to be the output of such “preprocessing”… except that there is no real pre-computation required, apart from giving the UDT agent a realistic prior.
Basically this, except there’s no need to actually do it beforehand.
Actually, no. To implement things correctly, UDT needs to determine its entire strategy all at once. It cannot decide whether to one-box or two-box in Newcomb just by considering the Newcomb that it is currently dealing with. It must also consider all possible hypothetical scenarios where any other agent’s action depends on whether or not UDT one-boxes.
Furthermore, UDT cannot decide what it does in Newcomb independently of what it does in the Counterfactual Mugging, because some hypothetical entity might give it rewards based on some combination of the two behaviors. UDT needs to compute its entire strategy (i.e. it’s response to all possible scenarios) all at the same time before it can determine what it should do in any particular situation [OK. Not quite true. It might be able to prove that whatever the optimal strategy is it involves doing X in situation Y without actually determining the optimal strategy. Then again, this seems really hard since doing almost anything directly from Kolmogorov priors is basically impossible].
To implement things correctly, UDT needs to determine its entire strategy all at once.
Conceptually, yes. The point is that you don’t need to actually literally explicitly compute your entire strategy at t=-∞. All you have to do is prove a particular property of the strategy (namely, its action in situation Y) at the time when you are asked for a decision.
Obviously, like every computational activity ever, you must still make approximations, because it is usually infeasible to make inferences over the entire tegmark-IV multiverse when you need to make a decision. An example of such approximations would be neglecting the measure of “entities that give it rewards based on some combination of [newcomb’s and counterfactual mugging]” in many situations because I expect such things to be rare (significantly rarer than newcomb’s and counterfactual mugging themselves).
One of the motivations for development of TDT and UDT is that you don’t necessarily get an opportunity to
do such self-modification beforehand,
Basically this, except there’s no need to actually do it beforehand.
Consider the set of all possible source codes an agent might have. This set is partitioned in two: those on which Omega rewards you (where you both one box, and your source code is such that Omega can tell you will), and those on which Omega punishes you (all others). Call the former set A, and the latter set B.
Agents are not guaranteed to start with a source code in set A, some start with source code in B. (Maybe they are classically trained causal decision theorists? Maybe they are skeptical about UDT? Maybe their programmers were careless? Who knows!) The point is, there comes a time in an agent’s life when it needs to grow up and move its source code to set A. Maybe it does not immediately self-modify to directly do UDTish things on Newcomb-like causal graphs, maybe it self-modifies to self-modify before being asked to one box.
But it is crucial for the agent to move itself from set B to set A at some point before Omega shows up. This is what I mean by step 1.
step 1. You are an agent with source code read/write access. You suspect there will be (in the future) Omegas in your >environment, posing tricky problems. At this point (step 1), you realize you should “preprocess” your own source code in >such a way as to maximize expected utility in such problems.
That is, for all causal graphs (possibly with Omega causal pathways), you find where nodes for [my source code goes >here] are, and you “pick the optimal treatment regime”.
I think what confuses me is that if we want the logical connections to hold (between my decision and the decision of another agent running the same source code), it is necessary that when he preprocesses his source code he will deterministically make the same choice as me. Which means that my decision about how to preprocess has already been made by some deeper part of my source code
My understanding of the statement of Newcomb is that Omega puts things on boxes only based on what your source code says you will do when faced with input that looks like the Newcomb’s problem. Since the agent already preprocessed the source code (possibly using other complicated bits of its own source code) to one box on Newcomb, Omega will populate boxes based on that. Omega does not need to deal with any other part of the agent’s source code, including some unspecified complicated part that dealt with preprocessing and rewriting, except to prove to itself that one boxing will happen.
All that matters is that the code currently has the property that IF it sees the Newcomb input, THEN it will one box.
Omega that examines the agent’s code before the agent preprocessed will also put a million dollars in, if it can prove the agent will self-modify to one-box before choosing the box.
My understanding of the statement of Newcomb is that Omega puts things on boxes only based on what your source code says you will do when faced with input that looks like the Newcomb’s problem.
Phrasing it in terms of source code makes it more obvious that this is equivalent to expecting Omega to be able to solve the halting problem.
If Omega only puts the million in if it finds a proof fast enough, it is then possible that you will one-box and not get the million.
Yes, it’s possible, and serves you right for trying to be clever. Solving the halting problem isn’t actually hard for a large class of programs, including the usual case for an agent in a typical decision problem (ie. those that in fact do halt quickly enough to make an actual decision about the boxes in less than a day). If you try to deliberately write a very hard to predict program, then of course omega takes away the money in retaliation, just like the other attempts to “trick” omega by acting randomly or looking inside the boxes with xrays.
The problem requires that Omega be always able to figure out what you do. If Omega can only figure out what you can do under a limited set of circumstances, you’ve changed one of the fundamental constraints of the problem.
You seem to be thinking of this as “the only time someone won’t come to a decision fast enough is if they deliberately stall”, which is sort of the reverse of fighting the hypothetical—you’re deciding that an objection can’t apply because the objection applies to an unlikely situation.
Suppose that in order to decide what to do, I simulate Omega in my head as one of the steps of the process? That is not intentionally delaying, but it still could result in halting problem considerations. Or do you just say that Omega doesn’t give me the money if I try to simulate him?
Usually, in the thought experiment, we assume that Omega has enough computation power to simulate the agent, but that the agent does not have enough computation power to compute Omega. We usually further assume that the agent halts and that Omega is a perfect predictor. However, these are expositional simplifications, and none of these assumptions are necessary in order to put the agent into a Newcomblike scenario.
For example, in the game nshepperd is describing (where Omega plays Newcomb’s problem, but only puts the money in the box if it has very high confidence that you will one-box) then, if you try to simulate Omega, you won’t get the money. You’re still welcome to simulate Omega, but while you’re doing that, I’ll be walking away with a million dollars and you’ll be spending lots of money on computing resources.
No one’s saying you can’t, they’re just saying that if you find yourself in a situation where someone is predicting you and rewarding you for obviously acting like they want you to, and you know this, then it behooves you to obviously act like they want you to.
Or to put it another way, consider a game where Omega is only a pretty good predictor who only puts the money in the box if Omega predicts that you one-box unconditionally (e.g. without using a source of randomness) and whose predictions are correct 99% of the time. Omega here doesn’t have any perfect knowledge, and we’re not necessarily assuming that anyone has superpowers, but i’d still onebox.
Or if you want to see a more realistic problem (where the predictor has only human-level accuracy) then check out Hintze’s formulation of Parfit’s Hitchhiker (though be warned, I’m pretty sure he’s wrong about TDT succeeding on this formulation of Parfit’s Hitchhiker. UDT succeeds on this problem, but TDT would fail.)
I think some that favor CDT would claim that you are are phrasing the counterfactual incorrectly. You are phrasing the situation as “you are playing against a copy of yourself” rather than “you are playing against an agent running code X (which just happens to be the same as yours) and thinks you are also running code X”. If X=CDT, then TDT and CDT each achieve the result DD. If X=TDT, then TDT achieves CC, but CDT achieves DC.
In other words TDT does beat CDT in the self matchup. But one could argue that self matchup against TDT and self matchup against CDT are different scenarios, and thus should not be compared.
I think UDT is fine (but I think it needs a good intro paper, maybe something with graphs in it...)
For the kinds of problems you and I think about, UDT just reduces to CDT, e.g. it should pick the “optimal treatment regime,” e.g. it is not unsound. So as far as we are concerned, there is no conflict at all.
However, there is a set of (what you and I would call) “weird” problems where if you “represent the weirdness” properly and do the natural thing to pick the best treatment, UDT is what happens. One way to phrase the weirdness that happens in Newcomb is that “conditional ignorability” fails. That is, Omega introduces a new causal pathway by which your decision algorithm may affect the outcome. (Note that you might think that “conditional ignorability” also fails in e.g. the front-door case which is still a “classical problem,” but actually there is a way to think about the front door case as applying conditional ignorability twice.) Since CDT is phrased on “classical” DAGs and (as the SWIG paper points out) it’s all just graphical ways of representing ignorability (what they call modularity and factorization), it cannot really talk about Newcomb type cases properly.
I am not sure I understood the OP though, when he said that Newcomb problems are “the norm.” Classical decision problems seem to be the norm to me.
I am not sure I understood the OP though, when he said that Newcomb problems are “the norm.”
Yeah, it’s a bold claim :-) I haven’t made any of the arguments yet, but I’m getting there.
(The rough version is that Newcomblike problems happen whenever knowledge about your decision theory leaks to other agents, and that this happens all the time in humans. Evolution has developed complex signaling tools, humans instinctively make split-second assessments of the trustworthiness of others, etc. In most real-world multi-agent scenarios, we implicitly expect that the other agents have some knowledge of how we make decisions, even if that’s only a via knowledge of shared humanity. Any AI interacting with humans who have knowledge of its source code, even tangentially, faces similar difficulties. You could assume away the implications of this “leaked” knowledge, or artificially design scenarios in which this knowledge is unavailable. This is often quite useful as a simplifying assumption or a computational expedient, but it requires extra assumptions or extra work. By default, real-world decision problems on Earth are Newcomblike. Still a rough argument, I know, I’m working on filling it out and turning it into posts.)
I prefer to argue that many real-world problems are AMD-like, because there’s a nonzero chance of returning to the same mental state later, and that chance has a nonzero dependence on what you choose now. To the extent that’s true, CDT is not applicable and you really need UDT, or at least this simplified version. That argument works even if the universe contains only one agent, as long as that agent has finite memory :-)
I think it might be helpful to be more precise about problem classes, e.g. what does “Newcomb-like” mean?.
That is, the kinds of things that I can see informally arising in settings humans deal with (lots of agents running around) also contain things like blackmail problems, which UDT does not handle. So it is not really fair to say this class is “Newcomb-like,” if by that class we mean “problems UDT handles properly.”
(For reference, I’ll be defining “Newcomblike” roughly as “other agents have knowledge of your decision algorithm”. You’re correct that this includes problems where UDT performs poorly, and that UDT is by no means the One Final Answer. In fact, I’m not planning to discuss UDT at all in this sequence; my goal is to motivate the idea that we don’t know enough about decision theory yet to be comfortable constructing a system capable of undergoing an intelligence explosion. The fact that Newcomblike problems are fairly common in the real world is one facet of that motivation.)
You’re correct that this includes problems where UDT performs poorly, and that UDT is by no means the One Final Answer.
What problems does UDT fail on?
my goal is to motivate the idea that we don’t know enough about decision theory yet to be comfortable constructing a system capable of undergoing an intelligence explosion.
Why would a self-improving agent not improve its own decision-theory to reach an optimum without human intervention, given a “comfortable” utility function in the first place?
Why would a self-improving agent not improve its own decision-theory to reach an optimum without human intervention, given a “comfortable” utility function in the first place?
A self-improving agent does improve its own decision theory, but it uses its current decision theory to predict which self-modifications would be improvements, and broken decision theories can be wrong about that. Not all starting points converge to the same answer.
The fact that Newcomblike problems are fairly common in the real world is one facet of that motivation.
I disagree. CDT correctly solves all problems in which other agents cannot read your mind. Real world occurrences of mind reading are actually uncommon.
This discussion and a previous conversation with Nate have helped me crystallize my thoughts on why I prefer CDT to any of the attempts to “fix” it using timelessness. Most of the material on TDT/UDT is too technical for me, so it is entirely possible that I am wrong; if there are errors in my reasoning, I would be very grateful if someone could point it out:
Any decision theory depends on the concept of choice: If there is no choice, there is no need for a decision theory. I have seen a quote attributed to Pearl to the effect that we can only talk about “interventions” at a level of abstraction where free will is apparent. This seems true of any decision theory. (Note: From looking at Google, it appears that the only verified source for this quotation is on Less Wrong).
CDT and TDT differ in how they operationalize choice, and therefore whether the decision theories are consistent with free will. In Causal Decision theory, the agents choose actions from a choice set. In contrast, from my limited understanding of TDT/UDT, it seems as if agents choose their source code. This is not only inconsistent with my (perhaps naive) subjective experience of free will, it also seems like it will lead to an incoherent concept of “choice” due to recursion.
Have I misunderstood something fundamental?
Yeah. TDT/UDT agents don’t choose their source code. They just choose their strategy.
They happen to do this in a way that respects logical connections between events, whereas CDT reasons about its options in a way that neglects acausal logical connections (such as in the mirror token trade above).
None of the algorithms are making different assumptions about the nature of “free will” or whether you get to “choose your own code”: they differ only in how they construct their counterfactuals, with TDT/UDT respecting acausal connections that CDT neglects.
I recognize that you understand this much better than me, and that I am almost certainly wrong. I am continuing this discussion only to try to figure out where my lingering sense of confusion comes from.
If an agent is playing a game of PD against another agent running the same source code, and chooses to “cooperate” because he believes the that other agent will necessarily make the same choice: how is that not equivalent to choosing your source code?
It isn’t equivalent. The agent recognizes that their already-existing source code is what causes them to either cooperate or defect, and so because the other agent has the same source code that agent must make the same decision.
As for how the actual decision happens, the agent doesn’t “choose its source code”, it simply runs the source code and outputs “cooperate” or “defect” based on what the result of running that source code is.
This makes sense, but if it is true, I don’t understand in what sense a “choice” is made. It seems to me you have assumed away free will. Which is fine, it is probably true that free will does not exist. But if it is true, I don’t understand why there is any need for a decision theory, as no decisions are actually made.
Clearly you have a notion of what it means to “make a decision”. Doesn’t it make sense to associate this idea of “making a decision” with the notion of evaluating the outcomes from different (sometimes counterfactual) actions and then selecting one of those actions on the basis of those evaluations?
Surely if the notion of “choice” refers to anything coherent, that’s what it’s talking about? What matters is that the decision is determined directly through the “make a decision” process rather than independently of it.
Also, given that these “make a decision” processes (i.e. decision theories) are things that actually exist and are used, surely it also makes sense to compare different decision theories on the basis of how sensibly they behave?
You are probably right that I have a faulty notion of what it means to make a decision. I’ll have to think about this for a few days to see if I can update...
This may help you. (Well, at least it helped me—YMMV.)
Basically, my point is that the “running the source code” part is where all of the interesting stuff actually happens, and that’s where the “choice” would actually happen.
It may be true that the agent “runs the source code and outputs the resulting output”, but in saying that I’ve neglected all of the cool stuff that happens when the source code actually gets run (e.g. comparing different options, etc.). In order to establish that source code A leads to output B you would need to talk about how source code A leads to output B, and that’s the interesting part! That’s the part that I associate with the notion of “choice”.
I don’t think you’ve misunderstood; in fact I share your position.
Do you also reject compatibilist accounts of free will? I think the basic point at issue here is whether or not a fully determined action can be genuinely ‘chosen’, any more than the past events that determine it.
The set of assumptions that undermines CDT also ensures that the decision process is nothing more than the deterministic consequence (give or take some irrelevant randomness) of an earlier state of the world + physical law. The ‘agent’ is a fully determined cog in a causally closed system.
In the same-source-code-PD, at the beginning of the decision process each agent knows that the end result will either be mutual cooperation or mutual defection, and also that the following propositions must either be all true or all false:
‘I was programmed to cooperate’
‘the other agent was programmed to cooperate’
‘I will cooperate’
‘the end result will be mutual cooperation’
The agent wants Proposition 4 -- and therefore all of the other propositions—to be true.
Since all of the propositions are known to share the same truth value, choosing to make Proposition 3 true is equivalent to choosing to make all four propositions true—including the two that refer to past events (Propositions 1 and 2). So either the agent can choose the truth value of propositions about the past, or else Proposition 3 is not really under the agent’s control.
I’d be interested to know whether those who disagree with me/us see a logical error above, or simply have a concept of choice/agency/free will/control that renders the previous paragraph either false or unproblematic (presumably because it allows you to single out Proposition 3 as uniquely under the agent’s control, or it isn’t so fussy about temporal order). If the latter, is this ultimately a semantic dispute? (I suspect that some will half-agree with that, but add that the incompatibilist notion of free will is at best empirically false and at worst incoherent. I think the charge of incoherence is false and the charge of empirical falsity is unproven, but I won’t go into that now.)
In any case, responses would be appreciated. (And if you think I’m completely mistaken or confused, please bear in mind that I made a genuine attempt to explain my position clearly!)
Decision theories are algorithms. Free will really doesn’t have much to do with them. A deterministic agent must still actually use some algorithm or another to map from sensory inputs and learned knowledge to effector outputs.
Thank you—this is exactly the point that I was trying to make, just stated much more articulately. I too would much appreciate responses to this, it would help me resolve some deep confusion about why very smart LessWrongers disagree with me about something important.
Hi, sorry, I think I misunderstood what you were unhappy with. I have not fully internalized what is happening, but my understanding (Nate please correct me if I am wrong):
We can have a “causal graph” (which will include exotic causal rules, like Omega deciding based on your source code), and in this “causal graph” we have places where our source code should go, possibly in multiple places. These places have inputs and outputs, and based on these inputs and outputs we can choose any function we like from a (let’s say finite for simplicity) set. We are choosing what goes into those places based on some optimality criterion.
But you might ask, isn’t that really weird, your own source code is what is doing the choosing, how can you be choosing it? (Is that what you mean by “incoherence due to recursion?”) The proposed way out is this.
What we are actually choosing is from a set of possible source codes that embeds, via quining, the entire problem and varies only in how inputs/outputs are mapped in the right places. Due to the theorem in the link it is always possible to set it up in this way.
Note: the above does not contain the string “free will,” I am just trying to output my incomplete understanding of how the setup is mathematically coherent. I don’t think there is any “free will” in the g-formula either.
I realize the above is still not very clear (even to myself). So here is another attempt.
The key idea seems to be using Kleene’s theorem to reason about own source, while further having the ability to rewrite parts of own source code based on conclusions drawn. In this particular case, “choice” is defined as
thinking about own source code with the “causal graph” (with holes for your source code) embedded.
maximizing expected utility, and realizing it would be optimal if [a particular rule mapping inputs to outputs] should go into all the holes in the “causal graph” where our algorithm should go.
rewriting our own source such that everything stays the same, except in our representation of the “causal graph” we just do the optimal thing rather than redo the analysis again.
(???)
This entire complication goes away if you assume the problem set up just chooses input/output mappings, similarly to how we are choosing optimal treatment regimes in causal inference, without assuming that those mappings represent your own source code.
Thank you—Yes, that is what I meant by recursion, and your second attempt seems to go in the right direction to answer my concerns, but I’ll have to think about this for a while to see if I can be convinced.
As for the G-Formula: I don’t think there is free will in it either, just that in contrast with UDT/TDT, it is not inconsistent with my concept of free will.
As an interesting anecdote, I am a Teaching Assistant for Jamie (who came up with the G-Formula), so I have heard him lecture on it several times now. The last couple of years he brought up experiments that seemingly provide evidence against free will and promised to discuss the implications for his theory. Unfortunately, both years he ran out of time before he got around to it. I should ask him the next time I meet him.
re: free will, this is one area where Jamie and Judea agree, it seems.
I think one thing I personally (and probably others) find very confusing about UDT is reconciling the picture to be consistent with temporal constraints on causality. Nothing should create physical retro causality. Here is my posited sequence of events.
step 1. You are an agent with source code read/write access. You suspect there will be (in the future) Omegas in your environment, posing tricky problems. At this point (step 1), you realize you should “preprocess” your own source code in such a way as to maximize expected utility in such problems.
That is, for all causal graphs (possibly with Omega causal pathways), you find where nodes for [my source code goes here] are, and you “pick the optimal treatment regime”.
step 2. Omega scans your source code, and puts things in boxes based on examining this code, or simulating it.
step 3. Omega gives you boxes, with stuff already in them. You already preprocessed what to do, so you one box immediately and walk away with a million.
Given that Omega can scan your source, and given that you can credibly rewrite own decision making source code, there is nothing exotic in this sequence of steps, in particular there is no retro causality anywhere. It is just that there are some constraints (what people here call “logical counterfactuals”) that force the output of Omega’s sim at step 2, and your output at step 3 to coincide. This constraint is what lead you to preprocess to one box in step 1, by drawing an “exotic causal graph” with Omega’s sim creating an additional causal link that seemingly violates “no retro causality.”
The “decision” is in step 1. Had you counterfactually decided to not preprocess there, or preprocess to something else, you would walk away poorer in step 3. There is this element to UDT of “preprocessing decisions in advance.” It seems all real choice, that is examining of alternatives and picking one wisely, happens there.
(???)
This is closer to describing the self-modifying CDT approach. One of the motivations for development of TDT and UDT is that you don’t necessarily get an opportunity to do such self-modification beforehand, let alone to compute the optimal decisions for all possible scenarios you think might occur.
So the idea of UDT is that the design of the code should already suffice to guarantee that if you end up in a newcomblike situation you behave “as if” you did have the opportunity to do whatever precommitment would have been useful. When prompted for a decision, UDT asks “what is the (fixed) optimal conditional strategy” and outputs the result of applying that strategy to its current state of knowledge.
Basically this, except there’s no need to actually do it beforehand.
If you like, you can consider the UDT agent’s code itself to be the output of such “preprocessing”… except that there is no real pre-computation required, apart from giving the UDT agent a realistic prior.
Actually, no. To implement things correctly, UDT needs to determine its entire strategy all at once. It cannot decide whether to one-box or two-box in Newcomb just by considering the Newcomb that it is currently dealing with. It must also consider all possible hypothetical scenarios where any other agent’s action depends on whether or not UDT one-boxes.
Furthermore, UDT cannot decide what it does in Newcomb independently of what it does in the Counterfactual Mugging, because some hypothetical entity might give it rewards based on some combination of the two behaviors. UDT needs to compute its entire strategy (i.e. it’s response to all possible scenarios) all at the same time before it can determine what it should do in any particular situation [OK. Not quite true. It might be able to prove that whatever the optimal strategy is it involves doing X in situation Y without actually determining the optimal strategy. Then again, this seems really hard since doing almost anything directly from Kolmogorov priors is basically impossible].
Conceptually, yes. The point is that you don’t need to actually literally explicitly compute your entire strategy at
t=-∞. All you have to do is prove a particular property of the strategy (namely, its action in situation Y) at the time when you are asked for a decision.Obviously, like every computational activity ever, you must still make approximations, because it is usually infeasible to make inferences over the entire tegmark-IV multiverse when you need to make a decision. An example of such approximations would be neglecting the measure of “entities that give it rewards based on some combination of [newcomb’s and counterfactual mugging]” in many situations because I expect such things to be rare (significantly rarer than newcomb’s and counterfactual mugging themselves).
Consider the set of all possible source codes an agent might have. This set is partitioned in two: those on which Omega rewards you (where you both one box, and your source code is such that Omega can tell you will), and those on which Omega punishes you (all others). Call the former set A, and the latter set B.
Agents are not guaranteed to start with a source code in set A, some start with source code in B. (Maybe they are classically trained causal decision theorists? Maybe they are skeptical about UDT? Maybe their programmers were careless? Who knows!) The point is, there comes a time in an agent’s life when it needs to grow up and move its source code to set A. Maybe it does not immediately self-modify to directly do UDTish things on Newcomb-like causal graphs, maybe it self-modifies to self-modify before being asked to one box.
But it is crucial for the agent to move itself from set B to set A at some point before Omega shows up. This is what I mean by step 1.
I think what confuses me is that if we want the logical connections to hold (between my decision and the decision of another agent running the same source code), it is necessary that when he preprocesses his source code he will deterministically make the same choice as me. Which means that my decision about how to preprocess has already been made by some deeper part of my source code
My understanding of the statement of Newcomb is that Omega puts things on boxes only based on what your source code says you will do when faced with input that looks like the Newcomb’s problem. Since the agent already preprocessed the source code (possibly using other complicated bits of its own source code) to one box on Newcomb, Omega will populate boxes based on that. Omega does not need to deal with any other part of the agent’s source code, including some unspecified complicated part that dealt with preprocessing and rewriting, except to prove to itself that one boxing will happen.
All that matters is that the code currently has the property that IF it sees the Newcomb input, THEN it will one box.
Omega that examines the agent’s code before the agent preprocessed will also put a million dollars in, if it can prove the agent will self-modify to one-box before choosing the box.
Phrasing it in terms of source code makes it more obvious that this is equivalent to expecting Omega to be able to solve the halting problem.
This is fighting the hypothetical, Omega can say it will only put a million in if it can find a proof of you one boxing quickly enough.
If Omega only puts the million in if it finds a proof fast enough, it is then possible that you will one-box and not get the million.
(And saying “there isn’t any such Omega” may be fighting the hypothetical. Saying there can’t in principle be such an Omega is not.)
Yes, it’s possible, and serves you right for trying to be clever. Solving the halting problem isn’t actually hard for a large class of programs, including the usual case for an agent in a typical decision problem (ie. those that in fact do halt quickly enough to make an actual decision about the boxes in less than a day). If you try to deliberately write a very hard to predict program, then of course omega takes away the money in retaliation, just like the other attempts to “trick” omega by acting randomly or looking inside the boxes with xrays.
The problem requires that Omega be always able to figure out what you do. If Omega can only figure out what you can do under a limited set of circumstances, you’ve changed one of the fundamental constraints of the problem.
You seem to be thinking of this as “the only time someone won’t come to a decision fast enough is if they deliberately stall”, which is sort of the reverse of fighting the hypothetical—you’re deciding that an objection can’t apply because the objection applies to an unlikely situation.
Suppose that in order to decide what to do, I simulate Omega in my head as one of the steps of the process? That is not intentionally delaying, but it still could result in halting problem considerations. Or do you just say that Omega doesn’t give me the money if I try to simulate him?
Usually, in the thought experiment, we assume that Omega has enough computation power to simulate the agent, but that the agent does not have enough computation power to compute Omega. We usually further assume that the agent halts and that Omega is a perfect predictor. However, these are expositional simplifications, and none of these assumptions are necessary in order to put the agent into a Newcomblike scenario.
For example, in the game nshepperd is describing (where Omega plays Newcomb’s problem, but only puts the money in the box if it has very high confidence that you will one-box) then, if you try to simulate Omega, you won’t get the money. You’re still welcome to simulate Omega, but while you’re doing that, I’ll be walking away with a million dollars and you’ll be spending lots of money on computing resources.
No one’s saying you can’t, they’re just saying that if you find yourself in a situation where someone is predicting you and rewarding you for obviously acting like they want you to, and you know this, then it behooves you to obviously act like they want you to.
Or to put it another way, consider a game where Omega is only a pretty good predictor who only puts the money in the box if Omega predicts that you one-box unconditionally (e.g. without using a source of randomness) and whose predictions are correct 99% of the time. Omega here doesn’t have any perfect knowledge, and we’re not necessarily assuming that anyone has superpowers, but i’d still onebox.
Or if you want to see a more realistic problem (where the predictor has only human-level accuracy) then check out Hintze’s formulation of Parfit’s Hitchhiker (though be warned, I’m pretty sure he’s wrong about TDT succeeding on this formulation of Parfit’s Hitchhiker. UDT succeeds on this problem, but TDT would fail.)
I think some that favor CDT would claim that you are are phrasing the counterfactual incorrectly. You are phrasing the situation as “you are playing against a copy of yourself” rather than “you are playing against an agent running code X (which just happens to be the same as yours) and thinks you are also running code X”. If X=CDT, then TDT and CDT each achieve the result DD. If X=TDT, then TDT achieves CC, but CDT achieves DC.
In other words TDT does beat CDT in the self matchup. But one could argue that self matchup against TDT and self matchup against CDT are different scenarios, and thus should not be compared.
I think UDT is fine (but I think it needs a good intro paper, maybe something with graphs in it...)
For the kinds of problems you and I think about, UDT just reduces to CDT, e.g. it should pick the “optimal treatment regime,” e.g. it is not unsound. So as far as we are concerned, there is no conflict at all.
However, there is a set of (what you and I would call) “weird” problems where if you “represent the weirdness” properly and do the natural thing to pick the best treatment, UDT is what happens. One way to phrase the weirdness that happens in Newcomb is that “conditional ignorability” fails. That is, Omega introduces a new causal pathway by which your decision algorithm may affect the outcome. (Note that you might think that “conditional ignorability” also fails in e.g. the front-door case which is still a “classical problem,” but actually there is a way to think about the front door case as applying conditional ignorability twice.) Since CDT is phrased on “classical” DAGs and (as the SWIG paper points out) it’s all just graphical ways of representing ignorability (what they call modularity and factorization), it cannot really talk about Newcomb type cases properly.
I am not sure I understood the OP though, when he said that Newcomb problems are “the norm.” Classical decision problems seem to be the norm to me.
Yeah, it’s a bold claim :-) I haven’t made any of the arguments yet, but I’m getting there.
(The rough version is that Newcomblike problems happen whenever knowledge about your decision theory leaks to other agents, and that this happens all the time in humans. Evolution has developed complex signaling tools, humans instinctively make split-second assessments of the trustworthiness of others, etc. In most real-world multi-agent scenarios, we implicitly expect that the other agents have some knowledge of how we make decisions, even if that’s only a via knowledge of shared humanity. Any AI interacting with humans who have knowledge of its source code, even tangentially, faces similar difficulties. You could assume away the implications of this “leaked” knowledge, or artificially design scenarios in which this knowledge is unavailable. This is often quite useful as a simplifying assumption or a computational expedient, but it requires extra assumptions or extra work. By default, real-world decision problems on Earth are Newcomblike. Still a rough argument, I know, I’m working on filling it out and turning it into posts.)
I prefer to argue that many real-world problems are AMD-like, because there’s a nonzero chance of returning to the same mental state later, and that chance has a nonzero dependence on what you choose now. To the extent that’s true, CDT is not applicable and you really need UDT, or at least this simplified version. That argument works even if the universe contains only one agent, as long as that agent has finite memory :-)
I think it might be helpful to be more precise about problem classes, e.g. what does “Newcomb-like” mean?.
That is, the kinds of things that I can see informally arising in settings humans deal with (lots of agents running around) also contain things like blackmail problems, which UDT does not handle. So it is not really fair to say this class is “Newcomb-like,” if by that class we mean “problems UDT handles properly.”
Thanks, I think you’re right.
(For reference, I’ll be defining “Newcomblike” roughly as “other agents have knowledge of your decision algorithm”. You’re correct that this includes problems where UDT performs poorly, and that UDT is by no means the One Final Answer. In fact, I’m not planning to discuss UDT at all in this sequence; my goal is to motivate the idea that we don’t know enough about decision theory yet to be comfortable constructing a system capable of undergoing an intelligence explosion. The fact that Newcomblike problems are fairly common in the real world is one facet of that motivation.)
What problems does UDT fail on?
Why would a self-improving agent not improve its own decision-theory to reach an optimum without human intervention, given a “comfortable” utility function in the first place?
A self-improving agent does improve its own decision theory, but it uses its current decision theory to predict which self-modifications would be improvements, and broken decision theories can be wrong about that. Not all starting points converge to the same answer.
Oh. Oh dear. DERP. Of course: the decision theory of sound self-improvement is a special case of the decision theory for dealing with other agents.
I disagree. CDT correctly solves all problems in which other agents cannot read your mind. Real world occurrences of mind reading are actually uncommon.