We all know that in a world of one-shot Prisoner’s Dilemmas with read-access to the other player’s source code, it’s good to be Timeless Decision Theory.
I would go as far as to actually doubt it. TDT seems to be insufficiently well-specified for this to clearly follow one way or the other, and in some cases I expect that TDT should be designed so that it won’t unconditionally cooperate with other TDTs, picking other mixed strategies on the Pareto frontier instead.
(I assume for me to discuss stuff about this post not engaging the solution to its riddles directly is fine; I haven’t even read it yet.)
(I assume for me to discuss stuff about this post not engaging the solution to its riddles directly is fine; I haven’t even read it yet.)
Of course.
TDT seems to be insufficiently well-specified for this to clearly follow one way or the other, and in some cases I expect that TDT should be designed so that it won’t unconditionally cooperate with other TDTs, picking other mixed strategies on the Pareto frontier instead.
In the simplest possible case, where two agents in a one-shot PD have isomorphic TDT algorithms maximizing different utility functions, they should cooperate. Let me know if I overstated the case in my first paragraph or Footnote 1.
The question is of course in what counts as “isomorphic TDT algorithms” and how do the agents figure out if that’s the case. However this post appears conclusively free of these problems.
However this post appears conclusively free of these problems.
Uh, do you mean “this post wrongly sweeps these problems under the rug”, or “this post sweeps these problems under the rug, and that’s OK”?
Anyway, although we don’t have a coding implementation of any of these decision theories, Eliezer’s description of TDT seems to keep the utility function separate from the causal network.
These problems don’t affect this post, so far as we assume the TDT agents to be suitably identical, since the games you consider are all symmetrical with respect to permutations of TDT agents, so superrationality (that TDT agents know how to apply) does the trick.
Anyway, although we don’t have a coding implementation of any of these decision theories, Eliezer’s description of TDT seems to keep the utility function separate from the causal network.
(Don’t understand what you intended to communicate by this remark.)
In retrospect, that remark doesn’t apply to multiplayer games; I was thinking of the way that in Newcomb’s Problem, the Predictor only cares what you choose and doesn’t care about your utility function, so that the only place a TDT agent’s utility function enters into its calculation there is at the very last stage, when summing over outcomes. But that’s not the case for the Prisoner’s Dilemma, it seems.
Right, for TDT agents to expect each other acting identically from the symmetry argument, we need to be able to permute not just places of TDT agents in the game, but also simultaneously places of TDT agents in TDT agents’ utility functions without changing the game, which accomodates the difference in agents’ utility functions.
and in some cases I expect that TDT should be designed so that it won’t unconditionally cooperate with other TDTs, picking other mixed strategies on the Pareto frontier instead.
I assume you are not expecting these cases to include the simple one shot prisoner’s dilemma with full code access? I would be skeptical.
In a simple special case where everything is symmetric, they will cooperate if the problem is formalized in the spirit of TDT, but this is basically good old superrationality, not something TDT-specific. The doubt I expressed is about the case where the TDT agents are not exactly symmetric, so that each of them can’t automagically assume that the other will do exactly the same thing. In the context of this post, this assumption may be necessary.
I think it is unfair to TDT to say that it is just Hofstadter’s superrationality. If TDT is an actual algorithm to which Hofstadter’s argument applies, even just in the purely symmetric version, that is a great advance. I would definitely say that about UDT.
Yes, TDT is underspecified. But is it a class of fully specified algorithms, all of which cooperate with pure clones, or is it not clear if there is any way of specifying which logical counterfactuals it can consider?
Two relevant links: Gary Drescher on a problem with (a specification of?) TDT; you on underspecification.
The doubt I expressed is about the case where the TDT agents are not exactly symmetric, so that each of them can’t automagically assume that the other will do exactly the same thing. In the context of this post, this assumption may be necessary.
The assumption of symmetry is not necessary in the context of this post. The ability to read the code and know that if they read your code and find that you would cooperate if they do (etc) is all that is necessary. Being the same as you isn’t privileged at all. It’s just convenient.
Code access is basically just better than knowledge they will do exactly the same thing.
I think Vladimir is saying that TDT agents with a superior bargaining position might extract further concessions from TDTs with an inferior bargaining position- or, rather, that we can’t yet rigorously show that they wouldn’t do such things. In the world of one-shot PDs, numerical superiority of one kind of TDT agent over another might be such a bargaining advantage.
In the world of one-shot PDs, numerical superiority of one kind of TDT agent over another might be such a bargaining advantage.
I had been considering a whole population of agents doing lots of prisoner’s dilemmas among themselves to not be a one shot prisoner’s dilemma. It does make sense for all sorts of other plays to be made when the situation becomes political.
Omega can wipe their memories of past interactions with other particular agents, as in the example I made up. That would make each interaction a one-shot, and it wouldn’t prevent the sort of leverage we’re talking about.
Omega can wipe their memories of past interactions with other particular agents, as in the example I made up. That would make each interaction a one-shot
I wouldn’t call a game one shot just because memory constraints are applied. What matters is that the game that is being played is so much bigger than one prisoner’s dilemma. Again, I don’t dispute that there are all sorts of potential considerations that can be made, even if very little evidence about the external political environment is available to the agents, as in this case. Given this it seems likely that I don’t disagree with Vlad significantly.
I thought I saw a formal specification a while back, but perhaps that was UDT.
You’re probably thinking of cousin_it’s proof sketch of cooperation in PD. That was ADT/UDT. TDT talking about formal proofs is not part of its theory that was discussed anywhere that I know of.
I don’t doubt it, but when was this proven?
I would go as far as to actually doubt it. TDT seems to be insufficiently well-specified for this to clearly follow one way or the other, and in some cases I expect that TDT should be designed so that it won’t unconditionally cooperate with other TDTs, picking other mixed strategies on the Pareto frontier instead.
(I assume for me to discuss stuff about this post not engaging the solution to its riddles directly is fine; I haven’t even read it yet.)
Of course.
In the simplest possible case, where two agents in a one-shot PD have isomorphic TDT algorithms maximizing different utility functions, they should cooperate. Let me know if I overstated the case in my first paragraph or Footnote 1.
The question is of course in what counts as “isomorphic TDT algorithms” and how do the agents figure out if that’s the case. However this post appears conclusively free of these problems.
Uh, do you mean “this post wrongly sweeps these problems under the rug”, or “this post sweeps these problems under the rug, and that’s OK”?
Anyway, although we don’t have a coding implementation of any of these decision theories, Eliezer’s description of TDT seems to keep the utility function separate from the causal network.
These problems don’t affect this post, so far as we assume the TDT agents to be suitably identical, since the games you consider are all symmetrical with respect to permutations of TDT agents, so superrationality (that TDT agents know how to apply) does the trick.
(Don’t understand what you intended to communicate by this remark.)
Ah, good.
In retrospect, that remark doesn’t apply to multiplayer games; I was thinking of the way that in Newcomb’s Problem, the Predictor only cares what you choose and doesn’t care about your utility function, so that the only place a TDT agent’s utility function enters into its calculation there is at the very last stage, when summing over outcomes. But that’s not the case for the Prisoner’s Dilemma, it seems.
Right, for TDT agents to expect each other acting identically from the symmetry argument, we need to be able to permute not just places of TDT agents in the game, but also simultaneously places of TDT agents in TDT agents’ utility functions without changing the game, which accomodates the difference in agents’ utility functions.
I assume you are not expecting these cases to include the simple one shot prisoner’s dilemma with full code access? I would be skeptical.
If you doubt it, then I doubt it as well. I thought I saw a formal specification a while back, but perhaps that was UDT.
In a simple special case where everything is symmetric, they will cooperate if the problem is formalized in the spirit of TDT, but this is basically good old superrationality, not something TDT-specific. The doubt I expressed is about the case where the TDT agents are not exactly symmetric, so that each of them can’t automagically assume that the other will do exactly the same thing. In the context of this post, this assumption may be necessary.
I think it is unfair to TDT to say that it is just Hofstadter’s superrationality. If TDT is an actual algorithm to which Hofstadter’s argument applies, even just in the purely symmetric version, that is a great advance. I would definitely say that about UDT.
Yes, TDT is underspecified. But is it a class of fully specified algorithms, all of which cooperate with pure clones, or is it not clear if there is any way of specifying which logical counterfactuals it can consider?
Two relevant links: Gary Drescher on a problem with (a specification of?) TDT; you on underspecification.
The assumption of symmetry is not necessary in the context of this post. The ability to read the code and know that if they read your code and find that you would cooperate if they do (etc) is all that is necessary. Being the same as you isn’t privileged at all. It’s just convenient.
Code access is basically just better than knowledge they will do exactly the same thing.
I think Vladimir is saying that TDT agents with a superior bargaining position might extract further concessions from TDTs with an inferior bargaining position- or, rather, that we can’t yet rigorously show that they wouldn’t do such things. In the world of one-shot PDs, numerical superiority of one kind of TDT agent over another might be such a bargaining advantage.
I had been considering a whole population of agents doing lots of prisoner’s dilemmas among themselves to not be a one shot prisoner’s dilemma. It does make sense for all sorts of other plays to be made when the situation becomes political.
Omega can wipe their memories of past interactions with other particular agents, as in the example I made up. That would make each interaction a one-shot, and it wouldn’t prevent the sort of leverage we’re talking about.
I wouldn’t call a game one shot just because memory constraints are applied. What matters is that the game that is being played is so much bigger than one prisoner’s dilemma. Again, I don’t dispute that there are all sorts of potential considerations that can be made, even if very little evidence about the external political environment is available to the agents, as in this case. Given this it seems likely that I don’t disagree with Vlad significantly.
You’re probably thinking of cousin_it’s proof sketch of cooperation in PD. That was ADT/UDT. TDT talking about formal proofs is not part of its theory that was discussed anywhere that I know of.