“Any potentially blackmailing AI would much prefer to have you believe that it is blackmailing you, without actually expending resources on following through with the blackmail, insofar as they think they can exert any control on you at all via an exotic decision theory. Just like in the one-shot Prisoner’s Dilemma, the “ideal” outcome is for the other player to believe you are modeling them and will cooperate if and only if they cooperate, and so they cooperate, but then actually you just defect anyway. For the other player to be confident this will not happen in the Prisoner’s Dilemma, for them to expect you not to sneakily defect anyway, they must have some very strong knowledge about you.” I don’t understand why this needs to be the case as opposed to a scenario where the AI reasons that torture slightly increases the probability that you will cooperate. Even if it requires the expenditure of resources, if doing so increases the probability of the AI coming into existence in the first place, it might still make sense for the AI in question.
I would appreciate if someone would explain where I’ve made a mistake.
“Any potentially blackmailing AI would much prefer to have you believe that it is blackmailing you, without actually expending resources on following through with the blackmail, insofar as they think they can exert any control on you at all via an exotic decision theory. Just like in the one-shot Prisoner’s Dilemma, the “ideal” outcome is for the other player to believe you are modeling them and will cooperate if and only if they cooperate, and so they cooperate, but then actually you just defect anyway. For the other player to be confident this will not happen in the Prisoner’s Dilemma, for them to expect you not to sneakily defect anyway, they must have some very strong knowledge about you.” I don’t understand why this needs to be the case as opposed to a scenario where the AI reasons that torture slightly increases the probability that you will cooperate. Even if it requires the expenditure of resources, if doing so increases the probability of the AI coming into existence in the first place, it might still make sense for the AI in question.
I would appreciate if someone would explain where I’ve made a mistake.