It will defect on all prisoners dilemmas, even if they’re iterated. So, for example, if we’d left it in charge of our nuclear arsenal during the cold war, it would have launched missiles as fast as possible.
But I think the main motivation was that, when given the option to self-modify, a CDT agent will self-modify as a method of precommittment—CDT isn’t “reflectively consistent.” And so if you want to predict an AI’s behavior, if you predict based on CDT with no self-modification you’ll get it wrong, since it doesn’t stay CDT. Instead, you should try to find out what the AI wants to self-modify to, and predict based on that.
A more correct analysis is that CDT defects against itself in iterated Prisoner’s Dilemma, provided there is any finite bound to the number of iterations. So two CDTs in charge of nuclear weapons would reason “Hmm, the sun’s going to go Red Giant at some point, and even if we escape that, there’s still that Heat Death to worry about. Looks like an upper bound to me”. And then they’d immediately nuke each other.
A CDT playing against a “RevengeBot”—if you nuke it, it nukes back with an all out strike - would never fire its weapons. But then the RevengeBot could just take out one city at a time, without fear of retaliation.
Since CDT was the “gold standard” of rationality developed during the time of the Cold War, I am somewhat puzzled why we’re still here.
So two CDTs in charge of nuclear weapons would reason “Hmm, the sun’s going to go Red Giant at some point, and even if we escape that, there’s still that Heat Death to worry about. Looks like an upper bound to me”. And then they’d immediately nuke each other.
This assumes that the mutual possession of nuclear weapons constitutes a prisoners dilemma. There isn’t necessarily a positive payoff to nuking folks. (You know, unless they are really jerks!)
Well nuking the other side eliminates the chance that they’ll ever nuke you (or will attack with conventional weapons), so there is arguably a slight positive for nuking first as opposed to keeping the peace.
There were some very serious thinkers arguing for a first strike against the Soviet Union immediately after WW2, including (on some readings) Bertrand Russell, who later became a leader of CND. And a pure CDT (with selfish utility) would have done so. I don’t see how Schelling theory could have modified that… just push the other guy over the cliff before the ankle-chains get fastened.
Probably the reason it didn’t happen was the rather obvious “we don’t want to go down in history as even worse than the Nazis”—also there was complacency about how far behind the Soviets actually were. If it had been known that they would explode an A-bomb as little as 4 years after the war, then the calculation would have been different. (Last ditch talks to ban nuclear weapons completely and verifiably—by thorough spying on each other—or bombs away. More likely bombs away I think.)
It will defect on all prisoners dilemmas, even if they’re iterated. So, for example, if we’d left it in charge of our nuclear arsenal during the cold war, it would have launched missiles as fast as possible.
I don’t think MAD is a prisoner dilemma: in the prisoner dilemma, if I know you’re going to cooperate no matter what, I’m better off defecting, and if I know you’re going to defect no matter what, I’m better off defecting. This doesn’t seem to be the case here: bombing you doesn’t make me better off all things being equal, it just makes you worse off. If anything, it’s a game of Chicken where bombing the opponent corresponds to going straight and not bombing them corresponds to swerving. And CDTists don’t always go straight in Chicken, do they?
Hm, I disagree—if nuking the Great Enemy never made you any better off, why was anyone ever afraid of anyone getting nuked in the first place? It might not grow your crops for you or buy you a TV, but gains in security and world power are probably enough incentive to at least make people worry.
Still better modelled by Chicken (where the utility of winning is assumed to be much smaller than the negative of the utility of dying, but still non-zero) than by PD.
I expect army1987′s talking about Chicken), the game of machismo in which participants rush headlong at each other in cars or other fast-moving dangerous objects and whoever swerves first loses. The payoff matrix doesn’t resemble the Prisoner’s Dilemma all that much: there’s more than one Nash equilibrium, and by far the worst outcome from either player’s perspective occurs when both players play the move analogous to defection (i.e. don’t swerve). It’s probably most interesting as a vehicle for examining precommitment tactics.
The game-theoretic version of Chicken has often been applied to MAD, as the Wikipedia page mentions.
That doesn’t seem right. Defecting causes the opponent to defect next time. It’s a bad idea with any decision theory.
Instead, you should try to find out what the AI wants to self-modify to, and predict based on that.
It won’t self-modify to TDT. It will self-modify to something similar, but using its beliefs at the time of modification as the priors. For example, it will use the doomsday argument immediately to find out how long the world is likely to last, and it will use that information from then on, rather than redoing it as its future self (getting a different answer).
That doesn’t seem right. Defecting causes the opponent to defect next time. It’s a bad idea with any decision theory.
Fair enough. I guess I had some special case stuff in mind—there are certainly ways to get a CDT agent to cooperate on prisoner’s dilemma ish problems.
That doesn’t seem right. Defecting causes the opponent to defect next time. It’s a bad idea with any decision theory.
Reason backwards from the inevitable end of the iteration. Defecting makes sense there, so defecting one turn earlier makes sense, so one turn earlier...
That depends on if it’s known what the last iteration will be.
Also, I think any deviation from CDT in common knowledge (such as if you’re not sure that they’re sure that you’re sure that they’re a perfect CDT) would result in defecting a finite, and small, number of iterations from the end.
I think TDT reduces to CDT if there’s no other agent with similar or greater intelligence than you around. (You also mustn’t have any dynamical inconsistency such as akrasia, otherwise your future and past selves count as ‘other’ as well.) So I don’t think it’d make much of a difference for a singleton—but I’d rather use an RDT just in case.
I think TDT reduces to CDT if there’s no other agent with similar or greater intelligence than you around.
It isn’t the absolute level of intelligence that is required, but rather that the other agent is capable of making a specific kind of reasoning. Even this can be relaxed to things that can only dubiously be said to qualify as being classed “agent”. The requirement is that some aspect of the environment has (utility-relevant) behavior that is entangled with the output of the decision to be made in a way that is other than a forward in time causal influence. This almost always implies that some agent is involved but that need not necessarily be the case.
Caveat: Maybe TDT is dumber than I remember and artificially limits itself in a way that is relevant here. I’m more comfortable making assertions about what a correct decision theory would do than about what some specific attempt to specify a decision theory would do.
Can someone answer the following: Say someone implemented an AGI using CDT. What exactly would go wrong that a better decision theory would fix?
It will defect on all prisoners dilemmas, even if they’re iterated. So, for example, if we’d left it in charge of our nuclear arsenal during the cold war, it would have launched missiles as fast as possible.
But I think the main motivation was that, when given the option to self-modify, a CDT agent will self-modify as a method of precommittment—CDT isn’t “reflectively consistent.” And so if you want to predict an AI’s behavior, if you predict based on CDT with no self-modification you’ll get it wrong, since it doesn’t stay CDT. Instead, you should try to find out what the AI wants to self-modify to, and predict based on that.
A more correct analysis is that CDT defects against itself in iterated Prisoner’s Dilemma, provided there is any finite bound to the number of iterations. So two CDTs in charge of nuclear weapons would reason “Hmm, the sun’s going to go Red Giant at some point, and even if we escape that, there’s still that Heat Death to worry about. Looks like an upper bound to me”. And then they’d immediately nuke each other.
A CDT playing against a “RevengeBot”—if you nuke it, it nukes back with an all out strike - would never fire its weapons. But then the RevengeBot could just take out one city at a time, without fear of retaliation.
Since CDT was the “gold standard” of rationality developed during the time of the Cold War, I am somewhat puzzled why we’re still here.
Well, it’s good that you’re puzzled, because it wasn’t—see Schelling’s “The Strategy of Conflict.”
I get the point that a CDT would pre-commit to retaliation if it had time (i.e. self-modify into a RevengeBot).
The more interesting question is why it bothers to do that re-wiring when it is expecting the nukes from the other side any second now...
This assumes that the mutual possession of nuclear weapons constitutes a prisoners dilemma. There isn’t necessarily a positive payoff to nuking folks. (You know, unless they are really jerks!)
Well nuking the other side eliminates the chance that they’ll ever nuke you (or will attack with conventional weapons), so there is arguably a slight positive for nuking first as opposed to keeping the peace.
There were some very serious thinkers arguing for a first strike against the Soviet Union immediately after WW2, including (on some readings) Bertrand Russell, who later became a leader of CND. And a pure CDT (with selfish utility) would have done so. I don’t see how Schelling theory could have modified that… just push the other guy over the cliff before the ankle-chains get fastened.
Probably the reason it didn’t happen was the rather obvious “we don’t want to go down in history as even worse than the Nazis”—also there was complacency about how far behind the Soviets actually were. If it had been known that they would explode an A-bomb as little as 4 years after the war, then the calculation would have been different. (Last ditch talks to ban nuclear weapons completely and verifiably—by thorough spying on each other—or bombs away. More likely bombs away I think.)
I don’t think MAD is a prisoner dilemma: in the prisoner dilemma, if I know you’re going to cooperate no matter what, I’m better off defecting, and if I know you’re going to defect no matter what, I’m better off defecting. This doesn’t seem to be the case here: bombing you doesn’t make me better off all things being equal, it just makes you worse off. If anything, it’s a game of Chicken where bombing the opponent corresponds to going straight and not bombing them corresponds to swerving. And CDTists don’t always go straight in Chicken, do they?
Hm, I disagree—if nuking the Great Enemy never made you any better off, why was anyone ever afraid of anyone getting nuked in the first place? It might not grow your crops for you or buy you a TV, but gains in security and world power are probably enough incentive to at least make people worry.
Still better modelled by Chicken (where the utility of winning is assumed to be much smaller than the negative of the utility of dying, but still non-zero) than by PD.
(edited to add a link)
I don’t understand what you mean by “modeled better by chicken” here.
I expect army1987′s talking about Chicken), the game of machismo in which participants rush headlong at each other in cars or other fast-moving dangerous objects and whoever swerves first loses. The payoff matrix doesn’t resemble the Prisoner’s Dilemma all that much: there’s more than one Nash equilibrium, and by far the worst outcome from either player’s perspective occurs when both players play the move analogous to defection (i.e. don’t swerve). It’s probably most interesting as a vehicle for examining precommitment tactics.
The game-theoretic version of Chicken has often been applied to MAD, as the Wikipedia page mentions.
I was. I should have linked to it, and I have now.
That doesn’t seem right. Defecting causes the opponent to defect next time. It’s a bad idea with any decision theory.
It won’t self-modify to TDT. It will self-modify to something similar, but using its beliefs at the time of modification as the priors. For example, it will use the doomsday argument immediately to find out how long the world is likely to last, and it will use that information from then on, rather than redoing it as its future self (getting a different answer).
Fair enough. I guess I had some special case stuff in mind—there are certainly ways to get a CDT agent to cooperate on prisoner’s dilemma ish problems.
Reason backwards from the inevitable end of the iteration. Defecting makes sense there, so defecting one turn earlier makes sense, so one turn earlier...
That depends on if it’s known what the last iteration will be.
Also, I think any deviation from CDT in common knowledge (such as if you’re not sure that they’re sure that you’re sure that they’re a perfect CDT) would result in defecting a finite, and small, number of iterations from the end.
Ah, that second paragraph makes perfect sense. Thanks.
I think TDT reduces to CDT if there’s no other agent with similar or greater intelligence than you around. (You also mustn’t have any dynamical inconsistency such as akrasia, otherwise your future and past selves count as ‘other’ as well.) So I don’t think it’d make much of a difference for a singleton—but I’d rather use an RDT just in case.
It isn’t the absolute level of intelligence that is required, but rather that the other agent is capable of making a specific kind of reasoning. Even this can be relaxed to things that can only dubiously be said to qualify as being classed “agent”. The requirement is that some aspect of the environment has (utility-relevant) behavior that is entangled with the output of the decision to be made in a way that is other than a forward in time causal influence. This almost always implies that some agent is involved but that need not necessarily be the case.
Caveat: Maybe TDT is dumber than I remember and artificially limits itself in a way that is relevant here. I’m more comfortable making assertions about what a correct decision theory would do than about what some specific attempt to specify a decision theory would do.
You make me happy! RDT!