For example, here’s a Nash equilibrium: “Everyone agrees to put 99 each round. Whenever someone deviates from 99 (for example to put 30), punish them by putting 100 for the rest of eternity.”
I don’t think this is actually a Nash equilibrium? It is dominated by the strategy “put 99 every round. Whenever someone deviates from 99, put 30 for the rest of eternity.”
The original post I believe solved this by instead having the equilibrium be “Everyone agrees to put 99 each round. Whenever someone deviates from 99 (for example to put 30), punish them by putting 100 for the next 2 rounds”, which I think is a Nash equilibrium because the punishment being finite means that you’re incentivized to stick with the algo even after punishment occurs.
The strategy profile I describe is where each person has the following strategy (call it “Strategy A”):
If empty history, play 99
If history consists only of 99s from all other people, play 99
If any other player’s history contains a choice which is not 99, play 100
The strategy profile you are describing is the following (call it “Strategy B”):
If empty history, play 99
If history consists only of 99s from all other people, play 99
If any other player’s history contains a choice which is not 99, play 30
I agree Strategy B weakly dominates Strategy A. However, saying “everyone playing Strategy A forms a Nash equilibrium” just means that no player has a profitable deviation assuming everyone else continues to play Strategy A. Strategy B isn’t a profitable deviation—if you switch to Strategy B and everyone else is playing Strategy A, everyone will still just play 99 for all eternity.
The general name for these kinds of strategies is grim trigger.
I don’t think this is actually a Nash equilibrium? It is dominated by the strategy “put 99 every round. Whenever someone deviates from 99, put 30 for the rest of eternity.”
The original post I believe solved this by instead having the equilibrium be “Everyone agrees to put 99 each round. Whenever someone deviates from 99 (for example to put 30), punish them by putting 100 for the next 2 rounds”, which I think is a Nash equilibrium because the punishment being finite means that you’re incentivized to stick with the algo even after punishment occurs.
It’s not dominated—holding all other players constant the two strategies have equal payoffs, so neither dominates the other.
The strategy profile I describe is where each person has the following strategy (call it “Strategy A”):
If empty history, play 99
If history consists only of 99s from all other people, play 99
If any other player’s history contains a choice which is not 99, play 100
The strategy profile you are describing is the following (call it “Strategy B”):
If empty history, play 99
If history consists only of 99s from all other people, play 99
If any other player’s history contains a choice which is not 99, play 30
I agree Strategy B weakly dominates Strategy A. However, saying “everyone playing Strategy A forms a Nash equilibrium” just means that no player has a profitable deviation assuming everyone else continues to play Strategy A. Strategy B isn’t a profitable deviation—if you switch to Strategy B and everyone else is playing Strategy A, everyone will still just play 99 for all eternity.
The general name for these kinds of strategies is grim trigger.