How the hell would the strategy profile “Everyone plays 99, unless someone defects [by the way this could be by playing 30, an action which literally helps everyone], in which case we play 100” arise in the real world? The answer is… that’s a good question.
Probably this outcome indicates the limited players’ knowledge or bounded rationality.
I’ve run this experiment with computer-simulated RL agents with the following four basic strategies:
minimize (always play 30);
maximize (always play 100);
equilibrium (the Hell profile);
random (play random integer from [30; 100], each with equal probability).
The agent’s algorithm is as follows: with probability 7⁄8 continue running the same strategy as before; with probability 1⁄8 select the strategy, where strategies having better rewards have higher chances to be selected. The initial strategy is chosen at random.
Ten times out of ten, this RL simulation consistently converged into all agents playing 30 (and not changing into other strategies) in not more than 10000 steps.
Interesting follow-up: how long do they take to break out of the bad equilibrium if all start there? How about if we choose a less extreme bad equilibrium (say 80 degrees)?
Probably this outcome indicates the limited players’ knowledge or bounded rationality.
I’ve run this experiment with computer-simulated RL agents with the following four basic strategies:
minimize (always play 30);
maximize (always play 100);
equilibrium (the Hell profile);
random (play random integer from [30; 100], each with equal probability).
The agent’s algorithm is as follows: with probability 7⁄8 continue running the same strategy as before; with probability 1⁄8 select the strategy, where strategies having better rewards have higher chances to be selected. The initial strategy is chosen at random.
Ten times out of ten, this RL simulation consistently converged into all agents playing 30 (and not changing into other strategies) in not more than 10000 steps.
Interesting follow-up: how long do they take to break out of the bad equilibrium if all start there? How about if we choose a less extreme bad equilibrium (say 80 degrees)?
By less extreme bad equilibrium, do you mean “play 79, until someone defects, and then play 80”? Or “play 80 or 100″?
Here is the Python script I’ve used: https://gist.github.com/ProgramCrafter/2af6a5b1cde0ff8995b9502f1c502151
To make all agents start from Hell, you need to change line 31 to
self.strategy = equilibrium
.