I’ve been reading a little of the philosophical literature on decision theory lately, and at least some two-boxers have an intuition that I hadn’t thought about before that Newcomb’s problem is “unfair.” That is, for a wide range of pairs of decision theories X and Y, you could imagine a problem which essentially takes the form “Omega punishes agents who use decision theory X and rewards agents who use decision theory Y,” and this is not a “fair” test of the relative merits of the two decision theories.
The idea that rationalists should win, in this context, has a specific name: it’s called the Why Ain’cha Rich defense, and I think what I’ve said above is the intuition powering counterarguments to it.
I’m a little more sympathetic to this objection than I was before delving into the literature. A complete counterargument to it should at least attempt to define what fair means and argue that Newcomb is in fact a fair problem. (This seems related to the issue of defining what a fair opponent is in modal combat.)
Informally: Since Omega represents a setup which rewards agents who make a certain decision X, and reality doesn’t care why or by what exact algorithm you arrive at X so long as you arrive at X, the problem is fair. Unfair would be “We’ll examine your source code and punish you iff you’re a CDT agent, but we won’t punish another agent who two-boxes as the output of a different algorithm even though your two algorithms had the same output.” The problem should not care whether you arrive at your decisions by maximizing expected utility or by picking the first option in English alphabetical order, so long as you arrive at the same decision either way.
More formally: TDT corresponds to maximizing on the class of problems whose payoff is determined by ‘the sort of decision you make in the world that you actually encounter, having the algorithm that you do’. CDT corresponds to maximizing over a fair problem class consisting of scenarios whose payoff is determined only by your physical act, and would be a good strategy in the real world if no other agent ever had an algorithm similar to yours (you must be the only CDT-agent in the universe, so that your algorithm only acts at one physical point) and where no other agent could gain any info about your algorithm except by observing your controllable physical acts (tallness being correlated with intelligence is not allowed). UDT allows for maximizing over classes of scenarios where your payoff can depend on actions you would have taken in universes you could have encountered but didn’t, i.e., the Counterfactual Mugging. (Parfit’s Hitchhiker is outside TDT’s problem class, and in UDT, because the car-driver asks “What will this hitchhiker do if I take them to town? so that a dishonorable hitchhiker who is left in the desert is getting a payoff which depends on what they would have done in a situation they did not actually encounter. Likewise the transparent Newcomb’s Box. We can clearly see how to maximize on the problem but it’s in UDT’s class of ‘fair’ scenarios, not TDT’s class.)
If the scenario handed to the TDT algorithm is that only one copy of your algorithm exists within the scenario, acting at one physical point, and no other agent in the scenario has any knowledge of your algorithm apart from acts you can maximize over, then TDT reduces to CDT and outputs the same action as CDT, which is implied by CDT maximizing over its problem class and TDT’s class of ‘fair’ problems strictly including all CDT-fair problems.
If Omega rewards having particular algorithms independently of their outputs, by examining the source code without running it, the only way to maximize is to have the most rewarded algorithm regardless of its output. But this is uninteresting.
If a setup rewards some algorithms more than others because of their different outputs, this is just life. You might as well claim that a cliff punishes people who rationally choose to jump off it.
This situation is interestingly blurred in modal combat where an algorithm may perhaps do better than another because its properties were more transparent (more provable) to another algorithm examining it. Of this I can only say that if, in real life, we end up with AIs examining each other’s source code and trying to prove things about each other, calling this ‘unfair’ is uninteresting. Reality is always the most important domain to maximize over.
This explanation makes UDT seem strictly more powerful than TDT (if UDT can handle Parfit’s Hitchhiker and TDT can’t).
If that’s the case, then is there a point in still focusing on developing TDT? Is it meant as just a stepping stone to an even better decision theory (possibly UDT itself) down the line? Or do you believe UDT’s advantages to be counterbalanced by disadvantages?
UDT doesn’t handle non-base-level maximization vantage points (previously “epistemic vantage points”) for blackmail—you can blackmail a UDT agent because it assumes your strategy is fixed, and doesn’t realize you’re only blackmailing it because you’re simulating it being blackmailable. As currently formulated UDT is also non-naturalistic and assumes the universe is divided into a not-you environment and a UDT algorithm in a Cartesian bubble, which is something TDT is supposed to be better at (though we don’t actually have good fill-in for the general-logical-consequence algorithm TDT is supposed to call).
I expect the ultimate theory to look more like “TDT modded to handle UDT’s class of problems and blackmail and anything else we end up throwing at it” than “UDT modded to be naturalistic and etc”, but I could be wrong—others have different intuitions about this.
As currently formulated UDT is also non-naturalistic and assumes the universe is divided into a not-you environment and a UDT algorithm in a Cartesian bubble, which is something TDT is supposed to be better at (though we don’t actually have good fill-in for the general-logical-consequence algorithm TDT is supposed to call).
UDT was designed to move away from the kind of Cartesian dualism as represented in AIXI. I don’t understand where it’s assuming its own Cartesian bubble. Can you explain?
The version I saw involved a Universe computation which accepts an Agent function and then computes itself, with the Agent makings it choices based on its belief about the Universe? That seemed to me like a pretty clean split.
No, the version we’ve been discussing for the last several years involves an argumentless Universe function that contains the argumentless Agent function as a part. Agent knows the source code of Agent (via quining) and the source code of Universe, but does not apriori know which part of the Universe is the Agent. The code of Universe might be mixed up so it’s hard to pick out copies of Agent. Then Agent tries to prove logical statements of the form “if Agent returns a certain value, then Universe returns a certain value”. As you can see, that automatically takes into account the logical correlates of Agent as well.
I find it rather disappointing that the UDT people and the TDT people have seemingly not been communicating very efficiently with each other in the last few years...
I think what has happened is that most of the LW people working on decision theory in the past few years have been working with different variations on UDT, while Eliezer hasn’t participated much in the discussions due to being preoccupied with other projects. It seems understandable that he saw some ideas that somebody was playing with, and thought that everyone was assuming something similar.
I think you must have been looking at someone else’s idea. None of the versions of UDT that I’ve proposed are like this. See my original UDT post for the basic setup, which all of my subsequent proposals share.
“The answer is, we can view the physical universe as a program that runs S as a subroutine, or more generally, view it as a mathematical object which has S embedded within it.” A big computation with embedded discrete copies of S seems to me like a different concept from doing logical updates on a big graph with causal and logical nodes, some of which may correlate to you even if they are not exact copies of you.
The sentence you quoted was just trying to explain how “physical consequences” might be interpreted as “logical consequences” and therefore dealt with within the UDT framework (which doesn’t natively have a concept of “physical consequences”). It wasn’t meant to suggest that UDT only works if there are discrete copies of S in the universe.
In that same post I also wrote, “A more general class of consequences might be called logical consequences. Consider a program P’ that doesn’t call S, but a different subroutine S’ that’s logically equivalent to S. In other words, S’ always produces the same output as S when given the same input. Due to the logical relationship between S and S’, your choice of output for S must also affect the subsequent execution of P’. Another example of a logical relationship is an S’ which always returns the first bit of the output of S when given the same input, or one that returns the same output as S on some subset of inputs.”
I guess I didn’t explicitly write about parts of the universe that are “correlate to you” as opposed to having more exact logical relationships with you, but given how UDT is supposed to work, it was meant to just handle them naturally. At least I don’t see why it wouldn’t do so as well as TDT (assuming it had access to your “general-logical-consequence algorithm” which I’m guessing is the same thing as my “math intuition module”).
FWIW, as far as I can remember I’ve always understood this the same way as Wei and cousin_it. (cousin_it was talking about the later logic-based work rather than Wei’s original post, but that part of the idea is common between the two systems.) If the universe is a Game of Life automaton initialized with some simple configuration which, when run with unlimited resources and for a very long time, eventually by evolution and natural selection produces a structure that is logically equivalent to the agent’s source code, that’s sufficient for falling under the purview of the logic-based versions of UDT, and Wei’s informal (underspecified) probabilistic version would not even require equivalence. There’s nothing Cartesian about UDT.
UDT doesn’t handle non-base-level maximization vantage points (previously “epistemic vantage points”) for blackmail—you can blackmail a UDT agent because it assumes your strategy is fixed, and doesn’t realize you’re only blackmailing it because you’re simulating it being blackmailable.
I’m not so sure about this one…
It seems that UDT would be deciding “If blackmailed, pay or don’t pay” without knowing whether it actually will be blackmailed yet. Assuming it knows the payoffs the other agent receives, it would reason “If a pay if blackmailed...I get blackmailed, whereas if I don’t pay if blackmailed...I don’t get blackmailed. I therefore should never pay if blackmailed”, unless there’s something I’m missing.
I share the intuition that Newcomb’s problem might be “unfair” (not a meaningful problem / not worth trying to win at), and have generally found LW/MIRI discussions of decision theory more enlightening when they dealt with other scenarios (like AIs exchanging source code) rather than Newcomb.
One way to frame the “unfairness” issue: if you knew in advance that you would encounter something like Newcomb’s problem, then it would clearly be beneficial to adopt a decision-making algorithm that (predictably) one-boxes. (Even CDT supports this, if you apply CDT to the decision of what algorithm to adopt, and have the option of adopting an algorithm that binds your future decision.) But why choose to optimize your decision-making algorithm for the possibility that you might encounter something like Newcomb’s problem? The answer to the question “What algorithm should I adopt?” depends on what decision problems I am likely to face—why is it a priority to prepare for Newcomb-like problems?
Well-defined games (like modal combat) seem to give more traction on this question than a fanciful thought experiment like Newcomb, although perhaps I just haven’t read the right pro-one-boxing rejoinder.
“Omega punishes agents who use decision theory X and rewards agents who use decision theory Y,”
Those who look for evidence will be thrown into the pits of hell, where they will weep and gnash their teeth forever and ever. Amen. And those who have faith will sit in glory in His presence for all time. Hallelujah!
the connection to the prisoners’ dilemma? (which is there from the very beginning, right?)
I think David Lewis was the first to observe this connection, in 1979, 10 years after Nozick’s publication of the problem. But the prisioner’s dilemma is only Newcomb-like if the two prisoners are psychological twins, i.e. if they use the same decision proceedure and know this about each other. One might object that this is just as unfair as Newcomb’s problem.
But the objection that Newcomb’s is unfair isn’t to be confused with the objection that it’s unrealistic. I think everybody working on the problem accepts that Newcomb-like situations are practically possible. Unfairness is a different issue.
Nozick mentioned PD. I’ve always heard it asserted that Newcomb started with PD (eg, here).
Oddly, Nozick does not give Newcomb’s first name. He talks about the problem on page 1, but waits to page 10 to say that it is the problem in the title of the paper.
Someone building a decision theory can equally well say that Newcomb’s problem and someone who threatens to duplicate them and make them play PD against their duplicate are equally unfair, but that’s not the connection.
Well, that depends. It could turn out to be the case that, in reality, for some fixed definition of fair, the universe is unfair. If that were the case, I think at least some of the philosophers who study decision theory would maintain a distinction between ideal rational behavior, whatever that means, and the behavior that, in the universe, consistently results in the highest payoffs. But Eliezer / MIRI is solely interested in the latter. So it depends on what your priorities are.
for a wide range of pairs of decision theories X and Y, you could imagine a problem which essentially takes the form “Omega punishes agents who use decision theory X and rewards agents who use decision theory Y,”
Then we should be able to come up with a Newcomb like problem that specifically punishes TDT agents (off the top of my head, Omega gives an additional 10 million to any agent not using TDT at the end of the box exercise). And if we can come up with such a problem, and EY/MIRI can’t respond by calling foul (for the reasons you give), then getting richer on Newcomb isn’t a reason to accept TDT.
The “practical” question is whether you in fact expect there to be things in the universe that specifically punish TDT agents. Omega in Newcomb’s problem is doing something that plausibly is very general, namely attempting to predict the behavior of other agents: this is plausibly a general thing that agents in the universe do, as opposed to specifically punishing TDT agents.
TDT also isn’t perfect; Eliezer has examples of (presumably, in his eyes, fair) problems where it gives the wrong answer (although I haven’t worked through them myself).
Omega in Newcomb’s problem is doing something that plausibly is very general
This seems to be the claim under dispute, and the question of fairness should be distinguished from the claim that Omega is doing something realistic or unrealistic. I think we agree that Newcomb-like situations are practically possible. But it may be that my unfair game is practically possible too, and that in principle no decision theory can come out maximizing utility in every practically possible game.
One response might be to say Newcomb’s problem is more unfair than the problem of simply choosing between two boxes containing different amounts of money, because Newcomb’s distribution of utility makes mention of the decision. Newcomb’s is unfair because it goes meta on the decider. My TDT punishing game is much more unfair than Newcomb’s because it goes one ‘meta’ level up from there, making mention of the decision theories.
You could argue that even if no decision theory can maximise in every arbitrarily unfair game, there are degrees of unfairness related to the degree to which the problem ‘goes meta’. We should just prefer the decision theory that can maximise the at the highest level of unfairness. This could probably be supported by the observation that while all these unfair games are practically possible, the more unfair a game is the less likely we are to encounter it outside of a philosophy paper. You could probably come up with a formalization of unfairness, though it might be tricky to argue that it’s relevantly exhaustive and linear.
EDIT: (Just a note, you could argue all this without actually granting that my unfair game is practically possible, or that Newcomb’s problem is unfair, since the two-boxer will provide those premises.)
A theory that is incapable of dealing with agents that make decisions based on the projected reactions of other players, is worthless in the real world.
TDT does in fact sketch a fairly detailed model of “what sort of situation is ‘fair’ for the purpose of this paper”, and it explicitly excludes referring to the specific theory that the agent implements. Note that Newcomb did not set out to deliberately punish TDT (would be hard; considering Newcomb predates TDT); so your variation shouldn’t either.
I think an easy way to judge between fair and unfair problems is whether you need to label the decision theory. Without a little label saying “TDT” or “CDT”, Omega can still punish two-boxers based on the outcome (factual or counterfactual) of their decision theory, regardless of what decision theory they used.
How do you penalize TDT, without actually having to say “I’ll penalize TDT”, based solely on the expected results of the decision theory?
How do you penalize TDT, without actually having to say “I’ll penalize TDT”, based solely on the expected results of the decision theory?
Typically by withholding information about the actual payoffs that will be experienced. eg. Tell the agents they are playing Newcomb’s problem but don’t mention that all millionaires are going to be murdered...
That’s a good question. Here’s a definition of “fair” aimed at UDT-type thought experiments:
The agent has to know what thought experiment they are in as background knowledge, so the universe can only predict their counterfactual actions in situations that are in that thought experiment, and where the agent still has the knowledge of being in the thought experiment.
This disallows my anti-oneboxer setup here: http://lesswrong.com/lw/hqs/why_do_theists_undergrads_and_less_wrongers_favor/97ak (because the predictor is predicting what decision would be made if the agent knew they were in Newcomb’s problem, not what decision would be made if the agent knew they were in the anti-oneboxer experiment) but still allows Newcomb’s problem, including the transparent box variation, and Parfit’s Hitchhiker.
I don’t think much argument is required to show Newcomb’s problem is fair by this definition, the argument would be about deciding to use this definition of fair, rather than one that favours CDT, or one that favours EDT.
I’ve been reading a little of the philosophical literature on decision theory lately, and at least some two-boxers have an intuition that I hadn’t thought about before that Newcomb’s problem is “unfair.” That is, for a wide range of pairs of decision theories X and Y, you could imagine a problem which essentially takes the form “Omega punishes agents who use decision theory X and rewards agents who use decision theory Y,” and this is not a “fair” test of the relative merits of the two decision theories.
The idea that rationalists should win, in this context, has a specific name: it’s called the Why Ain’cha Rich defense, and I think what I’ve said above is the intuition powering counterarguments to it.
I’m a little more sympathetic to this objection than I was before delving into the literature. A complete counterargument to it should at least attempt to define what fair means and argue that Newcomb is in fact a fair problem. (This seems related to the issue of defining what a fair opponent is in modal combat.)
TDT’s reply to this is a bit more specific.
Informally: Since Omega represents a setup which rewards agents who make a certain decision X, and reality doesn’t care why or by what exact algorithm you arrive at X so long as you arrive at X, the problem is fair. Unfair would be “We’ll examine your source code and punish you iff you’re a CDT agent, but we won’t punish another agent who two-boxes as the output of a different algorithm even though your two algorithms had the same output.” The problem should not care whether you arrive at your decisions by maximizing expected utility or by picking the first option in English alphabetical order, so long as you arrive at the same decision either way.
More formally: TDT corresponds to maximizing on the class of problems whose payoff is determined by ‘the sort of decision you make in the world that you actually encounter, having the algorithm that you do’. CDT corresponds to maximizing over a fair problem class consisting of scenarios whose payoff is determined only by your physical act, and would be a good strategy in the real world if no other agent ever had an algorithm similar to yours (you must be the only CDT-agent in the universe, so that your algorithm only acts at one physical point) and where no other agent could gain any info about your algorithm except by observing your controllable physical acts (tallness being correlated with intelligence is not allowed). UDT allows for maximizing over classes of scenarios where your payoff can depend on actions you would have taken in universes you could have encountered but didn’t, i.e., the Counterfactual Mugging. (Parfit’s Hitchhiker is outside TDT’s problem class, and in UDT, because the car-driver asks “What will this hitchhiker do if I take them to town? so that a dishonorable hitchhiker who is left in the desert is getting a payoff which depends on what they would have done in a situation they did not actually encounter. Likewise the transparent Newcomb’s Box. We can clearly see how to maximize on the problem but it’s in UDT’s class of ‘fair’ scenarios, not TDT’s class.)
If the scenario handed to the TDT algorithm is that only one copy of your algorithm exists within the scenario, acting at one physical point, and no other agent in the scenario has any knowledge of your algorithm apart from acts you can maximize over, then TDT reduces to CDT and outputs the same action as CDT, which is implied by CDT maximizing over its problem class and TDT’s class of ‘fair’ problems strictly including all CDT-fair problems.
If Omega rewards having particular algorithms independently of their outputs, by examining the source code without running it, the only way to maximize is to have the most rewarded algorithm regardless of its output. But this is uninteresting.
If a setup rewards some algorithms more than others because of their different outputs, this is just life. You might as well claim that a cliff punishes people who rationally choose to jump off it.
This situation is interestingly blurred in modal combat where an algorithm may perhaps do better than another because its properties were more transparent (more provable) to another algorithm examining it. Of this I can only say that if, in real life, we end up with AIs examining each other’s source code and trying to prove things about each other, calling this ‘unfair’ is uninteresting. Reality is always the most important domain to maximize over.
I’d just like to say that this comparison of CDT, TDT, and UDT was a very good explanation of the differences. Thanks for that.
Agreed. Found the distinction between TDT and UDT especially clear here.
This explanation makes UDT seem strictly more powerful than TDT (if UDT can handle Parfit’s Hitchhiker and TDT can’t).
If that’s the case, then is there a point in still focusing on developing TDT? Is it meant as just a stepping stone to an even better decision theory (possibly UDT itself) down the line? Or do you believe UDT’s advantages to be counterbalanced by disadvantages?
UDT doesn’t handle non-base-level maximization vantage points (previously “epistemic vantage points”) for blackmail—you can blackmail a UDT agent because it assumes your strategy is fixed, and doesn’t realize you’re only blackmailing it because you’re simulating it being blackmailable. As currently formulated UDT is also non-naturalistic and assumes the universe is divided into a not-you environment and a UDT algorithm in a Cartesian bubble, which is something TDT is supposed to be better at (though we don’t actually have good fill-in for the general-logical-consequence algorithm TDT is supposed to call).
I expect the ultimate theory to look more like “TDT modded to handle UDT’s class of problems and blackmail and anything else we end up throwing at it” than “UDT modded to be naturalistic and etc”, but I could be wrong—others have different intuitions about this.
UDT was designed to move away from the kind of Cartesian dualism as represented in AIXI. I don’t understand where it’s assuming its own Cartesian bubble. Can you explain?
The version I saw involved a Universe computation which accepts an Agent function and then computes itself, with the Agent makings it choices based on its belief about the Universe? That seemed to me like a pretty clean split.
No, the version we’ve been discussing for the last several years involves an argumentless Universe function that contains the argumentless Agent function as a part. Agent knows the source code of Agent (via quining) and the source code of Universe, but does not apriori know which part of the Universe is the Agent. The code of Universe might be mixed up so it’s hard to pick out copies of Agent. Then Agent tries to prove logical statements of the form “if Agent returns a certain value, then Universe returns a certain value”. As you can see, that automatically takes into account the logical correlates of Agent as well.
I find it rather disappointing that the UDT people and the TDT people have seemingly not been communicating very efficiently with each other in the last few years...
I think what has happened is that most of the LW people working on decision theory in the past few years have been working with different variations on UDT, while Eliezer hasn’t participated much in the discussions due to being preoccupied with other projects. It seems understandable that he saw some ideas that somebody was playing with, and thought that everyone was assuming something similar.
Yes. And now, MIRI is planning a decision theory workshop (for September) so that some of this can be hashed out.
I honestly thought we’d been communicating. Posting all our work on LW and all that. Eliezer’s comment surprised me. Still not sure how to react...
UDT can be modeled with a Universe computation that takes no arguments.
I think you must have been looking at someone else’s idea. None of the versions of UDT that I’ve proposed are like this. See my original UDT post for the basic setup, which all of my subsequent proposals share.
“The answer is, we can view the physical universe as a program that runs S as a subroutine, or more generally, view it as a mathematical object which has S embedded within it.” A big computation with embedded discrete copies of S seems to me like a different concept from doing logical updates on a big graph with causal and logical nodes, some of which may correlate to you even if they are not exact copies of you.
The sentence you quoted was just trying to explain how “physical consequences” might be interpreted as “logical consequences” and therefore dealt with within the UDT framework (which doesn’t natively have a concept of “physical consequences”). It wasn’t meant to suggest that UDT only works if there are discrete copies of S in the universe.
In that same post I also wrote, “A more general class of consequences might be called logical consequences. Consider a program P’ that doesn’t call S, but a different subroutine S’ that’s logically equivalent to S. In other words, S’ always produces the same output as S when given the same input. Due to the logical relationship between S and S’, your choice of output for S must also affect the subsequent execution of P’. Another example of a logical relationship is an S’ which always returns the first bit of the output of S when given the same input, or one that returns the same output as S on some subset of inputs.”
I guess I didn’t explicitly write about parts of the universe that are “correlate to you” as opposed to having more exact logical relationships with you, but given how UDT is supposed to work, it was meant to just handle them naturally. At least I don’t see why it wouldn’t do so as well as TDT (assuming it had access to your “general-logical-consequence algorithm” which I’m guessing is the same thing as my “math intuition module”).
FWIW, as far as I can remember I’ve always understood this the same way as Wei and cousin_it. (cousin_it was talking about the later logic-based work rather than Wei’s original post, but that part of the idea is common between the two systems.) If the universe is a Game of Life automaton initialized with some simple configuration which, when run with unlimited resources and for a very long time, eventually by evolution and natural selection produces a structure that is logically equivalent to the agent’s source code, that’s sufficient for falling under the purview of the logic-based versions of UDT, and Wei’s informal (underspecified) probabilistic version would not even require equivalence. There’s nothing Cartesian about UDT.
I’m not so sure about this one… It seems that UDT would be deciding “If blackmailed, pay or don’t pay” without knowing whether it actually will be blackmailed yet. Assuming it knows the payoffs the other agent receives, it would reason “If a pay if blackmailed...I get blackmailed, whereas if I don’t pay if blackmailed...I don’t get blackmailed. I therefore should never pay if blackmailed”, unless there’s something I’m missing.
I share the intuition that Newcomb’s problem might be “unfair” (not a meaningful problem / not worth trying to win at), and have generally found LW/MIRI discussions of decision theory more enlightening when they dealt with other scenarios (like AIs exchanging source code) rather than Newcomb.
One way to frame the “unfairness” issue: if you knew in advance that you would encounter something like Newcomb’s problem, then it would clearly be beneficial to adopt a decision-making algorithm that (predictably) one-boxes. (Even CDT supports this, if you apply CDT to the decision of what algorithm to adopt, and have the option of adopting an algorithm that binds your future decision.) But why choose to optimize your decision-making algorithm for the possibility that you might encounter something like Newcomb’s problem? The answer to the question “What algorithm should I adopt?” depends on what decision problems I am likely to face—why is it a priority to prepare for Newcomb-like problems?
Well-defined games (like modal combat) seem to give more traction on this question than a fanciful thought experiment like Newcomb, although perhaps I just haven’t read the right pro-one-boxing rejoinder.
You may not expect to encounter Newcomb’s problems, but you might expect to encounter prisoner’s dilemmas, and CDT recommends defecting on these.
Those who look for evidence will be thrown into the pits of hell, where they will weep and gnash their teeth forever and ever. Amen. And those who have faith will sit in glory in His presence for all time. Hallelujah!
You might be interested in reading TDT chapter 5 “Is Decision-Dependency Fair” if you haven’t already.
Are people who think in terms of fairness aware of the connection to the prisoners’ dilemma? (which is there from the very beginning, right?)
I think David Lewis was the first to observe this connection, in 1979, 10 years after Nozick’s publication of the problem. But the prisioner’s dilemma is only Newcomb-like if the two prisoners are psychological twins, i.e. if they use the same decision proceedure and know this about each other. One might object that this is just as unfair as Newcomb’s problem.
But the objection that Newcomb’s is unfair isn’t to be confused with the objection that it’s unrealistic. I think everybody working on the problem accepts that Newcomb-like situations are practically possible. Unfairness is a different issue.
Nozick mentioned PD. I’ve always heard it asserted that Newcomb started with PD (eg, here).
Oddly, Nozick does not give Newcomb’s first name. He talks about the problem on page 1, but waits to page 10 to say that it is the problem in the title of the paper.
Someone building a decision theory can equally well say that Newcomb’s problem and someone who threatens to duplicate them and make them play PD against their duplicate are equally unfair, but that’s not the connection.
Ah, good to know, thanks.
Hmm, that is an interesting objection. Would you be willing to sketch out (or point me to) a response to it?
Well, that depends. It could turn out to be the case that, in reality, for some fixed definition of fair, the universe is unfair. If that were the case, I think at least some of the philosophers who study decision theory would maintain a distinction between ideal rational behavior, whatever that means, and the behavior that, in the universe, consistently results in the highest payoffs. But Eliezer / MIRI is solely interested in the latter. So it depends on what your priorities are.
Well, if this is right...
Then we should be able to come up with a Newcomb like problem that specifically punishes TDT agents (off the top of my head, Omega gives an additional 10 million to any agent not using TDT at the end of the box exercise). And if we can come up with such a problem, and EY/MIRI can’t respond by calling foul (for the reasons you give), then getting richer on Newcomb isn’t a reason to accept TDT.
The “practical” question is whether you in fact expect there to be things in the universe that specifically punish TDT agents. Omega in Newcomb’s problem is doing something that plausibly is very general, namely attempting to predict the behavior of other agents: this is plausibly a general thing that agents in the universe do, as opposed to specifically punishing TDT agents.
TDT also isn’t perfect; Eliezer has examples of (presumably, in his eyes, fair) problems where it gives the wrong answer (although I haven’t worked through them myself).
This seems to be the claim under dispute, and the question of fairness should be distinguished from the claim that Omega is doing something realistic or unrealistic. I think we agree that Newcomb-like situations are practically possible. But it may be that my unfair game is practically possible too, and that in principle no decision theory can come out maximizing utility in every practically possible game.
One response might be to say Newcomb’s problem is more unfair than the problem of simply choosing between two boxes containing different amounts of money, because Newcomb’s distribution of utility makes mention of the decision. Newcomb’s is unfair because it goes meta on the decider. My TDT punishing game is much more unfair than Newcomb’s because it goes one ‘meta’ level up from there, making mention of the decision theories.
You could argue that even if no decision theory can maximise in every arbitrarily unfair game, there are degrees of unfairness related to the degree to which the problem ‘goes meta’. We should just prefer the decision theory that can maximise the at the highest level of unfairness. This could probably be supported by the observation that while all these unfair games are practically possible, the more unfair a game is the less likely we are to encounter it outside of a philosophy paper. You could probably come up with a formalization of unfairness, though it might be tricky to argue that it’s relevantly exhaustive and linear.
EDIT: (Just a note, you could argue all this without actually granting that my unfair game is practically possible, or that Newcomb’s problem is unfair, since the two-boxer will provide those premises.)
A theory that is incapable of dealing with agents that make decisions based on the projected reactions of other players, is worthless in the real world.
However, an agent that makes decisions based on the fact that it perfectly predicts the reactions of other players does not exist in the real world.
Newcomb does not require a perfect predictor.
I know that the numbers in the canonical case work out to .5005 accuracy for the required; within noise of random.
TDT does in fact sketch a fairly detailed model of “what sort of situation is ‘fair’ for the purpose of this paper”, and it explicitly excludes referring to the specific theory that the agent implements. Note that Newcomb did not set out to deliberately punish TDT (would be hard; considering Newcomb predates TDT); so your variation shouldn’t either.
I think an easy way to judge between fair and unfair problems is whether you need to label the decision theory. Without a little label saying “TDT” or “CDT”, Omega can still punish two-boxers based on the outcome (factual or counterfactual) of their decision theory, regardless of what decision theory they used.
How do you penalize TDT, without actually having to say “I’ll penalize TDT”, based solely on the expected results of the decision theory?
You penalise based on the counterfactual outcome: if they were in Newcomb’s problem, this person would choose one box.
Typically by withholding information about the actual payoffs that will be experienced. eg. Tell the agents they are playing Newcomb’s problem but don’t mention that all millionaires are going to be murdered...
That’s a good question. Here’s a definition of “fair” aimed at UDT-type thought experiments:
The agent has to know what thought experiment they are in as background knowledge, so the universe can only predict their counterfactual actions in situations that are in that thought experiment, and where the agent still has the knowledge of being in the thought experiment.
This disallows my anti-oneboxer setup here: http://lesswrong.com/lw/hqs/why_do_theists_undergrads_and_less_wrongers_favor/97ak (because the predictor is predicting what decision would be made if the agent knew they were in Newcomb’s problem, not what decision would be made if the agent knew they were in the anti-oneboxer experiment) but still allows Newcomb’s problem, including the transparent box variation, and Parfit’s Hitchhiker.
I don’t think much argument is required to show Newcomb’s problem is fair by this definition, the argument would be about deciding to use this definition of fair, rather than one that favours CDT, or one that favours EDT.
.
Oops. Yes.