By the way, in one sense, the “True Prisoner’s Dilemma” is impossible between agents of the sort I’m imagining. They see the game set-up and the payoff table, and immediately figure out the Nash bargaining solution (or something like it), and re-write their own utility function to care about the other player.
This seems strange to me. My intuitions about agent design say that you should practically never rewrite your own utility function. The thing that “re-write their own utility function” here points to seems to something more accurately described as “making an unbreakable commitment”, which seems like it could be done via a separate mechanism than literally rewriting your utility function. Humans seem to do something in that space (i.e. we have desires and commitments, both of which feel quite different and separate from the inside).
I agree, that’s a more accurate description. The sense in which “true prisoner’s dilemma” is impossible is the sense in which your utility function is the cooperative one you commit to. It makes sense to think in terms of your “personal” (original) utility function and an “acting” utility function, or something like that.
I still think this undermines the point of the “true prisoner’s dilemma”, since thinking of humans gives decent intuitions about this sort of reasoning.
This seems strange to me. My intuitions about agent design say that you should practically never rewrite your own utility function. The thing that “re-write their own utility function” here points to seems to something more accurately described as “making an unbreakable commitment”, which seems like it could be done via a separate mechanism than literally rewriting your utility function. Humans seem to do something in that space (i.e. we have desires and commitments, both of which feel quite different and separate from the inside).
I agree, that’s a more accurate description. The sense in which “true prisoner’s dilemma” is impossible is the sense in which your utility function is the cooperative one you commit to. It makes sense to think in terms of your “personal” (original) utility function and an “acting” utility function, or something like that.
I still think this undermines the point of the “true prisoner’s dilemma”, since thinking of humans gives decent intuitions about this sort of reasoning.