In the Prisoner’s Dilemma, players have the opportunity to harm the opponent for minor gain. To aid decision, players may be granted some knowledge about the matchup. In decreasing order of fidelity:
Both player’s source codes/both player’s behavior in all possible environments. No strategy can be better against every opponent than any distinguishable strategy, since some opponent will punish one for not being the other.
Both player’s behavior in the current matchup. PrudentBot, who cooperates iff he knows both players will act alike[1], operates here.
The opponent’s behavior in the current matchup. FairBot, who cooperates iff he knows the opponent will cooperate, operates here.
Nothing. Classical game theory says that, of the few possible strategies, defection is optimal.
This post focuses on the third case.
In domain theory, we partially order our sets by “information content”. Suppose every player can cooperate, defect, or fail to have decided. The latter case helps to model algorithmic nontermination. Both cooperation and defection would have more information content than indecision.
Answer is ordered by knowing something having more content than not knowing it. I will order Query by implication, because then, the domain-theoretic custom of considering only monotonic functions means that Knowledge must respect implication!
Shrinking Decision to {Cooperate, ⊥} restricts players to only ever start cooperating as their knowledge increases, never stop. Of course, someone who isn’t going to cooperate might still eventually figure this out. That might be equivalent.
Across these 24 encounters a player might have, of the 8 players the only ones whose strategies aren’t dominated are FairBot and ⊥Bot. After deleting the dominated players, FairBot dominates ⊥Bot. So in this setting where one can only know the opponent’s decision and knowledge can only increase cooperation, FairBot is optimal.
This seems like a good time to pause for feedback. And can you recommend software for doing this reasoning at scale?
Domain Theory and the Prisoner’s Dilemma: FairBot
Wishes it were crossposted from the AI Alignment Forum. Contains more technical jargon than usual.
Recall Robust Cooperation in the Prisoner’s Dilemma and a hint of domain theory.
In the Prisoner’s Dilemma, players have the opportunity to harm the opponent for minor gain. To aid decision, players may be granted some knowledge about the matchup. In decreasing order of fidelity:
Both player’s source codes/both player’s behavior in all possible environments. No strategy can be better against every opponent than any distinguishable strategy, since some opponent will punish one for not being the other.
Both player’s behavior in the current matchup. PrudentBot, who cooperates iff he knows both players will act alike[1], operates here.
The opponent’s behavior in the current matchup. FairBot, who cooperates iff he knows the opponent will cooperate, operates here.
Nothing. Classical game theory says that, of the few possible strategies, defection is optimal.
This post focuses on the third case.
In domain theory, we partially order our sets by “information content”. Suppose every player can cooperate, defect, or fail to have decided. The latter case helps to model algorithmic nontermination. Both cooperation and defection would have more information content than indecision.
Decision := {Cooperate, Defect, ⊥}
Query := {”other cooperates”, “both act alike”, …}
Answer := {”I know this.”, “I don’t know this.”}
Knowledge := Query → Answer
Player := Knowledge → Decision
Answer is ordered by knowing something having more content than not knowing it. I will order Query by implication, because then, the domain-theoretic custom of considering only monotonic functions means that Knowledge must respect implication!
Shrinking Decision to {Cooperate, ⊥} restricts players to only ever start cooperating as their knowledge increases, never stop. Of course, someone who isn’t going to cooperate might still eventually figure this out. That might be equivalent.
Across these 24 encounters a player might have, of the 8 players the only ones whose strategies aren’t dominated are FairBot and ⊥Bot. After deleting the dominated players, FairBot dominates ⊥Bot. So in this setting where one can only know the opponent’s decision and knowledge can only increase cooperation, FairBot is optimal.
This seems like a good time to pause for feedback. And can you recommend software for doing this reasoning at scale?
No, wait. Then it would cooperate with CooperateBot. Hmm.