wnoise comments on Harry Potter and the Methods of Rationality discussion thread, part 3

wnoise 3 Sep 2010 18:13 UTC
10 points
0
I don’t have any direct citations to current social science, no, but I’ll tell a plausible story, and give some indirect citations.

Often? Yes. Always or near always? No. It depends crucially on the complexity of the game, the familiarity of the person playing the game, and the intelligence of the people playing.

Most people playing a game iteratively update their strategies with each game, learning both which moves of theirs worked better, and what their opponents are likely to do.

If both sides constantly do these updates, they are driven towards a Nash equilibrium. (It might be better to say they are driven away from anything that is not a Nash equilibrium.) The definition of a Nash equilibrium is a combination of strategies where neither side can do better by unilaterally altering their own move. If one side can do better by altering their own move, and realizes it, they will.

Complexity makes it harder for people to explore the space and find the Nash equilibrium. The more familiar with the game, the more likely you are to have found that neighborhood and play near it.

The smarter you are the faster this happens—you can essentially model your opponent to figure out what they’ll do, and respond. But your opponent can model you as well, so you must include that in your model of him, and so forth.

A surprisingly large number of people are essentially “level 0” modelers, who aren’t influenced at all by a model of what their opponent does until they have gained data on it. They may use the “maximin” strategy that says pick the choice that maximizes your gain if in each of your possible choices your opponent does what helps you the least—brutal pessimism. Similarly there is an optimistic “maximax” strategy—pick the choice that maximizes your gain if in each of your possible choices, you opponent chooses what is best for you. Or there is the expected value over a flat distribution of the opponent’s choices. And ones that output a number for each choice can be combined with some weighting. There are many other possibilities, of course, but if there is one choice that strictly dominates another (better for every value of the opponent’s choice), they should not pick the one that is strictly dominated.

Another large number are “level 1” modelers—they figure the other guy will do something given by one of these “level 0“ models. There are a few “level 2” modelers that model the other guy as “level 1” , and level 3, and so forth. The Nash equilibria are the stable fixed points of this process, so are what “high enough” people will do. (Note that this process may not converge unless it starts at a fixed point—consider rock, paper, scissors. But doing the equivalent of Cesàro summation will make it converge to the unique mixed strategy of randomly picking one).

In practice, you want to be exactly one level higher than your opponent. It is, of course, possible to model your opponent as a probabilistic mixture of these, though your best response is (in general) not going to be a probabilistic mixture of levels one-higher. And the best response to you modeling like that will not be a simple mixture of levels either.

So, why do I say many are level 0, and level 1? Well, consider the Guess ²⁄₃ of the average game. People are restricted to numbers between 0 and 100. The person guessing closest to ²⁄₃ of the mean wins (utility 1), and everyone else loses (utility 0), (pick the winner randomly in case of ties). What will a level 0 modeler do? The maximin strategy gives no restriction—you can always lose. The maximax strategy eliminates everything above 66 ²⁄₃, because that’s the maximum the average can possibly be. The equiprobabal expected value strategy puts the mean at 50, and suggests 33 ¹⁄₃ (getting more peaked the more people there are and the more they are modeled as independent). A level 1 modeler realizes that everybody else should know at least this, so will probably guess around ²⁄₃ * 33 ¹⁄₃ = 22, perhaps higher realizing that some level 0s won’t go through even the utility analysis and pick randomly. A level 2 modeler will be ²⁄₃ of this, at roughly 14 or 15. This obviously converges to 0 for a “level infinity” modeler, and this is the Nash equilibrium.

But of course, very few people pick near 0, so it is not a good idea to pick 0. What is it rational to pick? From the link, a newspaper ran this with a prize, and the winner was at 21.6 (so the average was around 32.4). http://museumofmoney.org/exhibitions/games/numberpop.html references a study with college students winning with 24 (average of 36!), and a financial newspaper winning with 13 (average of 19.5). When I’ve seen histograms, they tend to have a spike at 33 1/3, indicating lots of pretty directly level 0. Curiously, there also tends to be a spike at 66 2/3 indicating a fair number really not quite understanding the game.

In this case, with an unfamiliar game, playing the Nash equilibrium is not optimal, and people don’t do it. Levels 1-3 seem to be what wins in this case, with the majority effectively playing at levels 0-2. But I can guarantee that played multiple times with the same people this will go down to 0 quite rapidly. Tautologically, people are more familiar with the games they play more often, and will in practice be effectively higher—not because they explicitly model higher levels, but because the “level 0” models are not random, but incorporate how people have played before (rather than how they should play now).

For the specific case of the Prisoner’s Dilemma, all of the “level 0” strategies pick Defect, which is the Nash equilibrium. Even so, http://en.wikipedia.org/wiki/Prisoner’s_dilemma#cite_note-2 claims that 40% cooperate. I would expect that this is from some innate valuing of fairness so that the rewards they get are not actually their utilities for those outcomes, but this is not clear.

EDITED: links fixed, and a bit of clarifications and grammar rewrites.
- TobyBartels 3 Sep 2010 21:08 UTC
  0 points
  0
  Parent
  Thanks; upvoted.