Well, this article puts my confusion from the primer into context, and I think I understand the issue now.
The “problem” (and it’s ultimately not a problem, but I’ll get to that) I have is that the game these programs are playing is ill-posed, in that it’s not a closed system. It might appear as if they’re playing, for example, Prisoners’ Dilemna, but with access to each other’s source code they’re really playing some sort of Prisoners’ Meta-Dilemna, a fundamentally different game for all that it might seem superficially similar. Now I’m not enough of an expert to tell you exactly what defines a “well posed” game, but I’m quite confident this is not it. Or maybe I’m abusing the terminology a bit but hopefully my point is clear enough (if not I’ll elaborate). The distinction though is that the mechanism by which you make your decision on how to play should be outside the game itself, and therefore only observed indirectly.
One property this metagame has is that there is fundamentally no “correct” answer for what “decision theory” the programs should use. The proof is simple, imagine you have a correct answer D. Imagine then that this program faces a population of the following opponents (I’ll use the Prisoners’ Dilemna subgame as a simple example):
Look at your opponent’s source code.
If your opponent is using D, defect.
Otherwise co-operate.
D is in fact the single worst strategy in this scenario by quite a margin, regardless of what it is.
But that’s fine, why would we even expect there to be a single correct strategy for every possible set of opponents? Of course there isn’t. This is where we take a step back to a game that is actually well posed and for which Naive Decision Theory works fine—the programmers’ game. Applying it there would go something like this:
Create a probability distribution over the (potentially infinite) space of all possible programs that yours might be up against.
Find the program which maximises the expected utility against that distribution of opponents.
For an infinite space of possible programs the second step might technically be impossible to perform optimally, but you do the best you can. There is of course a devil in the details of the first step in how you create that distribution, and when you look into it ultimately it’s a game between programmers, with its own Nash Equilibria and co-operative strategies and so on, not just a simple decision. But until you start having programmers read each others’ minds (in which case you’ve really just pointlessly shifted the whole thing up one meta-level) it’s at least a well-posed (if complex) game that the ordinary approaches should work for.
As I said earlier though the fact that I might call these games ill-posed doesn’t mean they’re not interesting. I’ll be fascinated to read about the strategies that people have devised for them and the attempts to be as general as possible. I’m a little concerned by the fact that the arguments so far seem to be along the lines of “You could do X, but then what if Y happens?”, purely because as I’ve shown above there is ultimately no end to that chain of logic. But maybe you’ll eventually step outside of it. My guess is that ultimately the “smarter” program wins. Certainly my generic counter-strategy has to generally be more complex (and therefore “smarter”? Maybe) than the “D” strategy it punishes in order to identify “D” in the first place.
There’s a relevant footnote in the next post: while it’s possible to write agents that punish a particular decision theory, an agent that’s trying (in any meaningful sense) to maximize its stated payoffs won’t do this, and it’s certainly not a constant strategy (i.e. a one-player game) either. In that sense, we can say that such problems are less likely to be encountered in reality.
It might appear as if they’re playing, for example, Prisoners’ Dilemna, but with access to each other’s source code they’re really playing some sort of Prisoners’ Meta-Dilemna, a fundamentally different game for all that it might seem superficially similar.
I agree, in the same way that Iterated Prisoners’ Dilemma (with no source code access) is a fundamentally different game from One-Shot Prisoners’ Dilemma (with no source code access). But there are real-life instances that are closer to the source-code access version than to the classical game theory version- for instance, a Cold War between two powers, each of which has a network of spies in the other’s government.
This is where we take a step back to a game that is actually well posed and for which Naive Decision Theory works fine—the programmers’ game. Applying it there would go something like this:
Create a probability distribution over the (potentially infinite) space of all possible programs that yours might be up against.
Find the program which maximises the expected utility against that distribution of opponents.
Well, this article puts my confusion from the primer into context, and I think I understand the issue now.
The “problem” (and it’s ultimately not a problem, but I’ll get to that) I have is that the game these programs are playing is ill-posed, in that it’s not a closed system. It might appear as if they’re playing, for example, Prisoners’ Dilemna, but with access to each other’s source code they’re really playing some sort of Prisoners’ Meta-Dilemna, a fundamentally different game for all that it might seem superficially similar. Now I’m not enough of an expert to tell you exactly what defines a “well posed” game, but I’m quite confident this is not it. Or maybe I’m abusing the terminology a bit but hopefully my point is clear enough (if not I’ll elaborate). The distinction though is that the mechanism by which you make your decision on how to play should be outside the game itself, and therefore only observed indirectly.
One property this metagame has is that there is fundamentally no “correct” answer for what “decision theory” the programs should use. The proof is simple, imagine you have a correct answer D. Imagine then that this program faces a population of the following opponents (I’ll use the Prisoners’ Dilemna subgame as a simple example):
Look at your opponent’s source code.
If your opponent is using D, defect.
Otherwise co-operate.
D is in fact the single worst strategy in this scenario by quite a margin, regardless of what it is.
But that’s fine, why would we even expect there to be a single correct strategy for every possible set of opponents? Of course there isn’t. This is where we take a step back to a game that is actually well posed and for which Naive Decision Theory works fine—the programmers’ game. Applying it there would go something like this:
Create a probability distribution over the (potentially infinite) space of all possible programs that yours might be up against.
Find the program which maximises the expected utility against that distribution of opponents.
For an infinite space of possible programs the second step might technically be impossible to perform optimally, but you do the best you can. There is of course a devil in the details of the first step in how you create that distribution, and when you look into it ultimately it’s a game between programmers, with its own Nash Equilibria and co-operative strategies and so on, not just a simple decision. But until you start having programmers read each others’ minds (in which case you’ve really just pointlessly shifted the whole thing up one meta-level) it’s at least a well-posed (if complex) game that the ordinary approaches should work for.
As I said earlier though the fact that I might call these games ill-posed doesn’t mean they’re not interesting. I’ll be fascinated to read about the strategies that people have devised for them and the attempts to be as general as possible. I’m a little concerned by the fact that the arguments so far seem to be along the lines of “You could do X, but then what if Y happens?”, purely because as I’ve shown above there is ultimately no end to that chain of logic. But maybe you’ll eventually step outside of it. My guess is that ultimately the “smarter” program wins. Certainly my generic counter-strategy has to generally be more complex (and therefore “smarter”? Maybe) than the “D” strategy it punishes in order to identify “D” in the first place.
In any case I look forward to finding out.
There’s a relevant footnote in the next post: while it’s possible to write agents that punish a particular decision theory, an agent that’s trying (in any meaningful sense) to maximize its stated payoffs won’t do this, and it’s certainly not a constant strategy (i.e. a one-player game) either. In that sense, we can say that such problems are less likely to be encountered in reality.
I agree, in the same way that Iterated Prisoners’ Dilemma (with no source code access) is a fundamentally different game from One-Shot Prisoners’ Dilemma (with no source code access). But there are real-life instances that are closer to the source-code access version than to the classical game theory version- for instance, a Cold War between two powers, each of which has a network of spies in the other’s government.
… I believe this is called AIXI.
(Not exactly, but …)