I think UDT is fine (but I think it needs a good intro paper, maybe something with graphs in it...)
For the kinds of problems you and I think about, UDT just reduces to CDT, e.g. it should pick the “optimal treatment regime,” e.g. it is not unsound. So as far as we are concerned, there is no conflict at all.
However, there is a set of (what you and I would call) “weird” problems where if you “represent the weirdness” properly and do the natural thing to pick the best treatment, UDT is what happens. One way to phrase the weirdness that happens in Newcomb is that “conditional ignorability” fails. That is, Omega introduces a new causal pathway by which your decision algorithm may affect the outcome. (Note that you might think that “conditional ignorability” also fails in e.g. the front-door case which is still a “classical problem,” but actually there is a way to think about the front door case as applying conditional ignorability twice.) Since CDT is phrased on “classical” DAGs and (as the SWIG paper points out) it’s all just graphical ways of representing ignorability (what they call modularity and factorization), it cannot really talk about Newcomb type cases properly.
I am not sure I understood the OP though, when he said that Newcomb problems are “the norm.” Classical decision problems seem to be the norm to me.
I am not sure I understood the OP though, when he said that Newcomb problems are “the norm.”
Yeah, it’s a bold claim :-) I haven’t made any of the arguments yet, but I’m getting there.
(The rough version is that Newcomblike problems happen whenever knowledge about your decision theory leaks to other agents, and that this happens all the time in humans. Evolution has developed complex signaling tools, humans instinctively make split-second assessments of the trustworthiness of others, etc. In most real-world multi-agent scenarios, we implicitly expect that the other agents have some knowledge of how we make decisions, even if that’s only a via knowledge of shared humanity. Any AI interacting with humans who have knowledge of its source code, even tangentially, faces similar difficulties. You could assume away the implications of this “leaked” knowledge, or artificially design scenarios in which this knowledge is unavailable. This is often quite useful as a simplifying assumption or a computational expedient, but it requires extra assumptions or extra work. By default, real-world decision problems on Earth are Newcomblike. Still a rough argument, I know, I’m working on filling it out and turning it into posts.)
I prefer to argue that many real-world problems are AMD-like, because there’s a nonzero chance of returning to the same mental state later, and that chance has a nonzero dependence on what you choose now. To the extent that’s true, CDT is not applicable and you really need UDT, or at least this simplified version. That argument works even if the universe contains only one agent, as long as that agent has finite memory :-)
I think it might be helpful to be more precise about problem classes, e.g. what does “Newcomb-like” mean?.
That is, the kinds of things that I can see informally arising in settings humans deal with (lots of agents running around) also contain things like blackmail problems, which UDT does not handle. So it is not really fair to say this class is “Newcomb-like,” if by that class we mean “problems UDT handles properly.”
(For reference, I’ll be defining “Newcomblike” roughly as “other agents have knowledge of your decision algorithm”. You’re correct that this includes problems where UDT performs poorly, and that UDT is by no means the One Final Answer. In fact, I’m not planning to discuss UDT at all in this sequence; my goal is to motivate the idea that we don’t know enough about decision theory yet to be comfortable constructing a system capable of undergoing an intelligence explosion. The fact that Newcomblike problems are fairly common in the real world is one facet of that motivation.)
You’re correct that this includes problems where UDT performs poorly, and that UDT is by no means the One Final Answer.
What problems does UDT fail on?
my goal is to motivate the idea that we don’t know enough about decision theory yet to be comfortable constructing a system capable of undergoing an intelligence explosion.
Why would a self-improving agent not improve its own decision-theory to reach an optimum without human intervention, given a “comfortable” utility function in the first place?
Why would a self-improving agent not improve its own decision-theory to reach an optimum without human intervention, given a “comfortable” utility function in the first place?
A self-improving agent does improve its own decision theory, but it uses its current decision theory to predict which self-modifications would be improvements, and broken decision theories can be wrong about that. Not all starting points converge to the same answer.
The fact that Newcomblike problems are fairly common in the real world is one facet of that motivation.
I disagree. CDT correctly solves all problems in which other agents cannot read your mind. Real world occurrences of mind reading are actually uncommon.
I think UDT is fine (but I think it needs a good intro paper, maybe something with graphs in it...)
For the kinds of problems you and I think about, UDT just reduces to CDT, e.g. it should pick the “optimal treatment regime,” e.g. it is not unsound. So as far as we are concerned, there is no conflict at all.
However, there is a set of (what you and I would call) “weird” problems where if you “represent the weirdness” properly and do the natural thing to pick the best treatment, UDT is what happens. One way to phrase the weirdness that happens in Newcomb is that “conditional ignorability” fails. That is, Omega introduces a new causal pathway by which your decision algorithm may affect the outcome. (Note that you might think that “conditional ignorability” also fails in e.g. the front-door case which is still a “classical problem,” but actually there is a way to think about the front door case as applying conditional ignorability twice.) Since CDT is phrased on “classical” DAGs and (as the SWIG paper points out) it’s all just graphical ways of representing ignorability (what they call modularity and factorization), it cannot really talk about Newcomb type cases properly.
I am not sure I understood the OP though, when he said that Newcomb problems are “the norm.” Classical decision problems seem to be the norm to me.
Yeah, it’s a bold claim :-) I haven’t made any of the arguments yet, but I’m getting there.
(The rough version is that Newcomblike problems happen whenever knowledge about your decision theory leaks to other agents, and that this happens all the time in humans. Evolution has developed complex signaling tools, humans instinctively make split-second assessments of the trustworthiness of others, etc. In most real-world multi-agent scenarios, we implicitly expect that the other agents have some knowledge of how we make decisions, even if that’s only a via knowledge of shared humanity. Any AI interacting with humans who have knowledge of its source code, even tangentially, faces similar difficulties. You could assume away the implications of this “leaked” knowledge, or artificially design scenarios in which this knowledge is unavailable. This is often quite useful as a simplifying assumption or a computational expedient, but it requires extra assumptions or extra work. By default, real-world decision problems on Earth are Newcomblike. Still a rough argument, I know, I’m working on filling it out and turning it into posts.)
I prefer to argue that many real-world problems are AMD-like, because there’s a nonzero chance of returning to the same mental state later, and that chance has a nonzero dependence on what you choose now. To the extent that’s true, CDT is not applicable and you really need UDT, or at least this simplified version. That argument works even if the universe contains only one agent, as long as that agent has finite memory :-)
I think it might be helpful to be more precise about problem classes, e.g. what does “Newcomb-like” mean?.
That is, the kinds of things that I can see informally arising in settings humans deal with (lots of agents running around) also contain things like blackmail problems, which UDT does not handle. So it is not really fair to say this class is “Newcomb-like,” if by that class we mean “problems UDT handles properly.”
Thanks, I think you’re right.
(For reference, I’ll be defining “Newcomblike” roughly as “other agents have knowledge of your decision algorithm”. You’re correct that this includes problems where UDT performs poorly, and that UDT is by no means the One Final Answer. In fact, I’m not planning to discuss UDT at all in this sequence; my goal is to motivate the idea that we don’t know enough about decision theory yet to be comfortable constructing a system capable of undergoing an intelligence explosion. The fact that Newcomblike problems are fairly common in the real world is one facet of that motivation.)
What problems does UDT fail on?
Why would a self-improving agent not improve its own decision-theory to reach an optimum without human intervention, given a “comfortable” utility function in the first place?
A self-improving agent does improve its own decision theory, but it uses its current decision theory to predict which self-modifications would be improvements, and broken decision theories can be wrong about that. Not all starting points converge to the same answer.
Oh. Oh dear. DERP. Of course: the decision theory of sound self-improvement is a special case of the decision theory for dealing with other agents.
I disagree. CDT correctly solves all problems in which other agents cannot read your mind. Real world occurrences of mind reading are actually uncommon.