That doesn’t seem right. Defecting causes the opponent to defect next time. It’s a bad idea with any decision theory.
Instead, you should try to find out what the AI wants to self-modify to, and predict based on that.
It won’t self-modify to TDT. It will self-modify to something similar, but using its beliefs at the time of modification as the priors. For example, it will use the doomsday argument immediately to find out how long the world is likely to last, and it will use that information from then on, rather than redoing it as its future self (getting a different answer).
That doesn’t seem right. Defecting causes the opponent to defect next time. It’s a bad idea with any decision theory.
Fair enough. I guess I had some special case stuff in mind—there are certainly ways to get a CDT agent to cooperate on prisoner’s dilemma ish problems.
That doesn’t seem right. Defecting causes the opponent to defect next time. It’s a bad idea with any decision theory.
Reason backwards from the inevitable end of the iteration. Defecting makes sense there, so defecting one turn earlier makes sense, so one turn earlier...
That depends on if it’s known what the last iteration will be.
Also, I think any deviation from CDT in common knowledge (such as if you’re not sure that they’re sure that you’re sure that they’re a perfect CDT) would result in defecting a finite, and small, number of iterations from the end.
That doesn’t seem right. Defecting causes the opponent to defect next time. It’s a bad idea with any decision theory.
It won’t self-modify to TDT. It will self-modify to something similar, but using its beliefs at the time of modification as the priors. For example, it will use the doomsday argument immediately to find out how long the world is likely to last, and it will use that information from then on, rather than redoing it as its future self (getting a different answer).
Fair enough. I guess I had some special case stuff in mind—there are certainly ways to get a CDT agent to cooperate on prisoner’s dilemma ish problems.
Reason backwards from the inevitable end of the iteration. Defecting makes sense there, so defecting one turn earlier makes sense, so one turn earlier...
That depends on if it’s known what the last iteration will be.
Also, I think any deviation from CDT in common knowledge (such as if you’re not sure that they’re sure that you’re sure that they’re a perfect CDT) would result in defecting a finite, and small, number of iterations from the end.