Ah! Thank you. I see now. The circumstance in which a CDT agent will self modify to use a different decision theory are that:
The agent was programmed by Eliezer Yudkowsky and hence is just looking for an excuse to self-modify.
The agent is provided with a prior leading it to be open to the possibility of omnicient, yet perverse agents bearing boxes full of money.
The agent is supplied with (presumably faked) empirical data leading it to believe that all such omniscient agents reward one-boxers.
Since the agent seeks reflective equilibrium (because programmed by aforesaid Yudkowsky), and since it knows that CDT requires two boxing, and since it has no reason to doubt that causality is important in this world, it makes exactly the change to its decision theory that seems appropriate. It continues to use CDT except on Newcomb problems, where it one boxes. That is, it self-modifies to use a different decision theory, which we can call CDTEONPWIOB.
Well, ok, though I wouldn’t have said that these are cases where CDT agents do something weird. These are cases where EYDT agents do something weird.
I apologize if it seems that the target of my sarcasm is you WrongBot. It is not.
EY has deluded himself into thinking that reflective consistency is some kind of gold standard of cognitive stability. And then he uses reflective consistency as a lever by which completely fictitious data can uproot the fundamental algorithms of rationality.
Which would be fine, except that he has apparently convinced a lot of smart people here that he knows what he is talking about. Even though he has published nothing on the topic. Even though other smart people like Robin tell him that he is trying to solve an already solved problem.
I would say more but …
This manuscript was cut off here, but interested readers are suggested to look at these sources for more discussion:Bibliography
Gibbard, A., and Harper, W. L. (1978), “Counterfactuals and Two Kinds of Expected Utility”, in C. A. Hooker, J. J. Leach, and E. F. McClennen (eds.), Foundations and Applications of Decision Theory, vol. 1, Reidel, Dordrecht, pp. 125-162.
Reflective consistency is not a “gold standard”. It is a basic requirement. It should be easy to come up with terrible, perverse decision theories that are reflectively consistent (EY does so, sort of, in his TDT outline, though it’s not exactly serious / thorough). The point is not that reflective consistency is a sign you’re on the right track, but that a lack of it is a sign that something is really wrong, that your decision theory is perverse. If using your decision theory causes you to abandon that same decision theory, it can’t have been a very good decision theory.
Consider it as being something like monotonicity in a voting system; it’s a weak requirement for weeding out things that are clearly bad. (Well, perhaps not everyone would agree IRV is “clearly bad”, but… it isn’t even monotonic!) It just happens that in this case evidently nobody noticed before that this would be a good condition to satisfy and hence didn’t try. :)
Am not sure that decision theory is an “already solved” problem. There’s the issue of what happens when agents can self-modify—and so wirehead themselves. I am pretty sure that is an unresolved “grand challenge” problem.
Because it can’t find a write-up that explains how to use it?
Perhaps you can answer the questions that I asked here What play does TDT make in the game of Chicken? Can you point me to a description of TDT that would allow me to answer that question for myself?
Suppose I’m an agent implementing TDT. My decision in Chicken depends on how much I know about my opponent.
If I know my opponent implements the same decision procedure I do (because I have access to its source code, say), and my opponent has this knowledge about me, I swerve. In this case, my opponent and I are in symmetrical positions and its choice is fully determined by mine; my choice is between payoffs of (0,0) and (-10,-10).
Else, I act identically to a CDT agent.
As Eliezer says here, the one-sentence version of TDT is “Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.”
If I know my opponent implements the same decision procedure I do (because I have access to its source code, say), and my opponent has this knowledge about me, I swerve. In this case, my opponent and I are in symmetrical positions and its choice is fully determined by mine; my choice is between payoffs of (0,0) and (-10,-10).
I’m not sure this is right. Isn’t there a correlated equilibrium that does better?
I think we’re looking at different payoff matrices. I was using the formulation of Chicken that rewards
# | ….C....|.....D..... C | +0, +0 | −1,+1 D | +1, −1 | −10, −10
which doesn’t have a correlated equilibrium that beats (C,C).
Using the payoff matrix Perplexed posted here, there is indeed a correlated equilibrium, which I believe the TDT agents would arrive at (given a source of randomness). My bad for not specifying the exact game I was talking about.
Why do you believe the TDT agents would find the correlated equilibrium? Your previous statement and Eliezer quote suggested that a pair of TDT agents would always play symmetrically in a symmetric game. No “spontaneous symmetry breaking”.
Even without a shared random source, there is a Nash mixed equilibrium that is also better than symmetric cooperation. Do you believe TDT would play that if there were no shared random input?
In a symmetric game, TDT agents choose symmetric strategies. Without a source of randomness, this entails playing symmetrically as well.
I’m not sure why you’re talking about shared random input. If both agents get the same input, they can both be expected to treat it in the same way and make the same decision, regardless of the input’s source. Each agent needs an independent source of randomness in order to play the mixed equilibrium; if my strategy is to play C 30% of the time, I need to know whether this iteration is part of that 30%, which I can’t do deterministically because my opponent is simulating me.
Yeah, I think any use of correlated equilibrium here is wrong—that requires a shared random source. I think in this case we just get symmetric strategies, i.e., it reduces to superrationality, where they each just get their own private random source.
I’m not sure why you’re talking about shared random input.
Sorry if this was unclear. It was a reference to the correlated pair of random variables used in a correlated equilibrium. I was saying that even without such a correlated pair, you may presume the availability of independent random variables which would allow a Nash equilibrium—still better than symmetric play in this game.
Gah, wait. I feel dumb. Why would TDT find correlated equilibria? I think I had the “correlated equilibrium” concept confused. A correlated equilibrium would require a public random source, which two TDTers won’t have.
Ignoring the whole pi-is-not-known-to-be-normal thing, how do you determine which digit of pi to use when you can’t actually communicate and you have no idea how many digits of pi the other player may already know?
Thank you. I hope you realize that you have provided an example of a game in which CDT does better than TDT. For example, in the game with the payoff matrix shown below, there is a mixed strategy Nash equilibrium which is better than the symmetric cooperative result.
So TDT is different from CDT only in cases where the game is perfectly symmetric? If you are playing a game that is roughly the symmetric PD, except that one guy’s payoffs are shifted by a tiny +epsilon, then they should both defect?
TDT is different from CDT whenever one needs to consider the interaction of multiple decisions made using the same TDT-based decision procedure. This applies both to competitions between agents, as in the case of Chicken, and to cases where an agent needs to make credible precommitments, as in Newcomb’s Problem.
In the case of an almost-symmetric PD, the TDT agents should still cooperate. To change that, you’d have to make the PD asymmetrical enough that the agents were no longer evaluating their options in the same way. If a change is small enough that a CDT agent wouldn’t change its strategy, TDT agents would also ignore it.
This doesn’t strike me as the world’s greatest explanation, but I can’t think of a better way to formulate it. Please let me know if there’s something that’s still unclear.
If a change is small enough that a CDT agent wouldn’t change its strategy, TDT agents would also ignore it.
This strikes me as a bit bizarre. You test whether a warped PD is still close enough to symmetric by asking whether a CDT agent still defects in order to decide whether a TDT agent should still cooperate? Are you sure you are not just making up these rules as you go?
Please let me know if there’s something that’s still unclear.
Much is unclear and very little seems to be coherently written down. What amazes me is that there is so much confidence given to something no one can explain clearly. So far, the only stable thing in your description of TDT is that it is better than CDT.
Ah! Thank you. I see now. The circumstance in which a CDT agent will self modify to use a different decision theory are that:
The agent was programmed by Eliezer Yudkowsky and hence is just looking for an excuse to self-modify.
The agent is provided with a prior leading it to be open to the possibility of omnicient, yet perverse agents bearing boxes full of money.
The agent is supplied with (presumably faked) empirical data leading it to believe that all such omniscient agents reward one-boxers.
Since the agent seeks reflective equilibrium (because programmed by aforesaid Yudkowsky), and since it knows that CDT requires two boxing, and since it has no reason to doubt that causality is important in this world, it makes exactly the change to its decision theory that seems appropriate. It continues to use CDT except on Newcomb problems, where it one boxes. That is, it self-modifies to use a different decision theory, which we can call CDTEONPWIOB.
Well, ok, though I wouldn’t have said that these are cases where CDT agents do something weird. These are cases where EYDT agents do something weird.
I apologize if it seems that the target of my sarcasm is you WrongBot. It is not.
EY has deluded himself into thinking that reflective consistency is some kind of gold standard of cognitive stability. And then he uses reflective consistency as a lever by which completely fictitious data can uproot the fundamental algorithms of rationality. Which would be fine, except that he has apparently convinced a lot of smart people here that he knows what he is talking about. Even though he has published nothing on the topic. Even though other smart people like Robin tell him that he is trying to solve an already solved problem.
I would say more but …
This manuscript was cut off here, but interested readers are suggested to look at these sources for more discussion: Bibliography Gibbard, A., and Harper, W. L. (1978), “Counterfactuals and Two Kinds of Expected Utility”, in C. A. Hooker, J. J. Leach, and E. F. McClennen (eds.), Foundations and Applications of Decision Theory, vol. 1, Reidel, Dordrecht, pp. 125-162.
Reflective consistency is not a “gold standard”. It is a basic requirement. It should be easy to come up with terrible, perverse decision theories that are reflectively consistent (EY does so, sort of, in his TDT outline, though it’s not exactly serious / thorough). The point is not that reflective consistency is a sign you’re on the right track, but that a lack of it is a sign that something is really wrong, that your decision theory is perverse. If using your decision theory causes you to abandon that same decision theory, it can’t have been a very good decision theory.
Consider it as being something like monotonicity in a voting system; it’s a weak requirement for weeding out things that are clearly bad. (Well, perhaps not everyone would agree IRV is “clearly bad”, but… it isn’t even monotonic!) It just happens that in this case evidently nobody noticed before that this would be a good condition to satisfy and hence didn’t try. :)
Am not sure that decision theory is an “already solved” problem. There’s the issue of what happens when agents can self-modify—and so wirehead themselves. I am pretty sure that is an unresolved “grand challenge” problem.
TDT gets better outcomes than CDT when faced with Newcomb’s Problem, Parfit’s Hitchhiker, and the True Prisoner’s Dilemma.
When does CDT outperform TDT? If the answer is “never”, as it currently seems to be, why wouldn’t a CDT agent self-modify to use TDT?
Because it can’t find a write-up that explains how to use it?
Perhaps you can answer the questions that I asked here What play does TDT make in the game of Chicken? Can you point me to a description of TDT that would allow me to answer that question for myself?
Suppose I’m an agent implementing TDT. My decision in Chicken depends on how much I know about my opponent.
If I know my opponent implements the same decision procedure I do (because I have access to its source code, say), and my opponent has this knowledge about me, I swerve. In this case, my opponent and I are in symmetrical positions and its choice is fully determined by mine; my choice is between payoffs of (0,0) and (-10,-10).
Else, I act identically to a CDT agent.
As Eliezer says here, the one-sentence version of TDT is “Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.”
I’m not sure this is right. Isn’t there a correlated equilibrium that does better?
I think we’re looking at different payoff matrices. I was using the formulation of Chicken that rewards
which doesn’t have a correlated equilibrium that beats (C,C).
Using the payoff matrix Perplexed posted here, there is indeed a correlated equilibrium, which I believe the TDT agents would arrive at (given a source of randomness). My bad for not specifying the exact game I was talking about.
...and, this is what I get for not actually checking things before I post them.
Two questions:
Why do you believe the TDT agents would find the correlated equilibrium? Your previous statement and Eliezer quote suggested that a pair of TDT agents would always play symmetrically in a symmetric game. No “spontaneous symmetry breaking”.
Even without a shared random source, there is a Nash mixed equilibrium that is also better than symmetric cooperation. Do you believe TDT would play that if there were no shared random input?
In a symmetric game, TDT agents choose symmetric strategies. Without a source of randomness, this entails playing symmetrically as well.
I’m not sure why you’re talking about shared random input. If both agents get the same input, they can both be expected to treat it in the same way and make the same decision, regardless of the input’s source. Each agent needs an independent source of randomness in order to play the mixed equilibrium; if my strategy is to play C 30% of the time, I need to know whether this iteration is part of that 30%, which I can’t do deterministically because my opponent is simulating me.
Yeah, I think any use of correlated equilibrium here is wrong—that requires a shared random source. I think in this case we just get symmetric strategies, i.e., it reduces to superrationality, where they each just get their own private random source.
Sorry if this was unclear. It was a reference to the correlated pair of random variables used in a correlated equilibrium. I was saying that even without such a correlated pair, you may presume the availability of independent random variables which would allow a Nash equilibrium—still better than symmetric play in this game.
Gah, wait. I feel dumb. Why would TDT find correlated equilibria? I think I had the “correlated equilibrium” concept confused. A correlated equilibrium would require a public random source, which two TDTers won’t have.
Digits of pi are kind of like a public random source.
Ignoring the whole pi-is-not-known-to-be-normal thing, how do you determine which digit of pi to use when you can’t actually communicate and you have no idea how many digits of pi the other player may already know?
Same way you meet up in New York with someone you’ve never talked to: something like Schelling points. I’m not sure that answer works in practice.
Thank you. I hope you realize that you have provided an example of a game in which CDT does better than TDT. For example, in the game with the payoff matrix shown below, there is a mixed strategy Nash equilibrium which is better than the symmetric cooperative result.
Looks like we’re talking about different versions of Chicken. Please see my reply to Sniffnoy.
So TDT is different from CDT only in cases where the game is perfectly symmetric? If you are playing a game that is roughly the symmetric PD, except that one guy’s payoffs are shifted by a tiny +epsilon, then they should both defect?
TDT is different from CDT whenever one needs to consider the interaction of multiple decisions made using the same TDT-based decision procedure. This applies both to competitions between agents, as in the case of Chicken, and to cases where an agent needs to make credible precommitments, as in Newcomb’s Problem.
In the case of an almost-symmetric PD, the TDT agents should still cooperate. To change that, you’d have to make the PD asymmetrical enough that the agents were no longer evaluating their options in the same way. If a change is small enough that a CDT agent wouldn’t change its strategy, TDT agents would also ignore it.
This doesn’t strike me as the world’s greatest explanation, but I can’t think of a better way to formulate it. Please let me know if there’s something that’s still unclear.
This strikes me as a bit bizarre. You test whether a warped PD is still close enough to symmetric by asking whether a CDT agent still defects in order to decide whether a TDT agent should still cooperate? Are you sure you are not just making up these rules as you go?
Much is unclear and very little seems to be coherently written down. What amazes me is that there is so much confidence given to something no one can explain clearly. So far, the only stable thing in your description of TDT is that it is better than CDT.