So You Think You’re a Bayesian? The Natural Mode of Probabilistic Reasoning

Related to: The Conjunction Fallacy, Conjunction Controversy

The heuristics and biases research program in psychology has discovered many different ways that humans fail to reason correctly under uncertainty. In experiment after experiment, they show that we use heuristics to approximate probabilities rather than making the appropriate calculation, and that these heuristics are systematically biased. However, a tweak in the experiment protocols seems to remove the biases altogether and shed doubt on whether we are actually using heuristics. Instead, it appears that the errors are simply an artifact of how our brains internally store information about uncertainty. Theoretical considerations support this view.

EDIT: The view presented here is controversial in the heuristics and biases literature; see Unnamed’s comment on this post below.

EDIT 2: The author no longer holds the views presented in this post. See this comment.

A common example of the failure of humans to reason correctly under uncertainty is the conjunction fallacy. Consider the following question:

Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.

What is the probability that Linda is:

(a) a bank teller

(b) a bank teller and active in the feminist movement

In a replication by Gigerenzer, 91% of subjects rank (b) as more probable than (a), saying that it is more likely that Linda is active in the feminist movement AND a bank teller than that Linda is simply a bank teller (1993). The conjunction rule of probability states that the probability of two things being true is less than or equal to the probability of one of those things being true. Formally, P(A & B) ≤ P(A). So this experiment shows that people violate the conjunction rule, and thus fail to reason correctly under uncertainty. The representative heuristic has been proposed as an explanation for this phenomenon. To use this heuristic, you evaluate the probability of a hypothesis by comparing how “alike” it is to the data. Someone using the representative heuristic looks at the Linda question and sees that Linda’s characteristics resemble those of a feminist bank teller much more closely than that of just a bank teller, and so they conclude that Linda is more likely to be a feminist bank teller than a bank teller.

This is the standard story, but are people really using the representative heuristic in the Linda problem? Consider the following rewording of the question:

Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.

There are 100 people who fit the description above. How many of them are:

(a) bank tellers

(b) bank tellers and active in the feminist movement

Notice that the question is now strictly in terms of frequencies. Under this version, only 22% of subjects rank (b) as more probable than (a) (Gigerenzer, 1993). The only thing that changed is the question that is asked; the description of Linda (and the 100 people) remains unchanged, so the representativeness of the description for the two groups should remain unchanged. Thus people are not using the representative heuristic—at least not in general.

Tversky and Kahneman, champions and founders of the heuristics and biases research program, acknowledged that the conjunction fallacy can be mitigated by changing the wording of the question (1983, pg 309), but this isn’t the only anomaly. Consider another problem:

If a test to detect a disease whose prevalence is 1/​1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming you know nothing about the person’s symptoms or signs?

Using Bayes’ theorem, the correct answer is .02, or 2%. In one replication, only 12% of subjects correctly calculated this probability. In these experiments, the most common wrong answer given is usually .95, or 95% (Gigerenzer, 1993). This is what’s known as the base rate fallacy because the error comes from ignoring the “base rate” of the disease in the population. Intuitively, if absolutely no one has the disease, it doesn’t matter what the test says—you still wouldn’t think you had the disease.

Now consider the same question framed in terms of relative frequencies.

One out of 1000 Americans has disease X. A test has been developed to detect when a person has disease X. Every time the test is given to a person who has the disease, the test comes out positive. But sometimes the test also comes out positive when it is given to a person who is completely healthy. Specifically, out of every 1000 people who are perfectly healthy, 50 of them test positive for the disease.

Imagine that we have assembled a random sample of 1000 Americans. They were selected by a lottery. Those who conducted the lottery had no information about the health status of any of these people. How many people who test positive for the disease will actually have the disease?

_____ out of _____.

Using this version of the question, 76% of subjects answered correctly with 1 out of 50. Instructing subjects to visualize frequencies in graphs increases this percentage to 92% (Gigerenzer, 1993). Again, re-framing the question in terms of relative frequencies rather than (subjective) probabilities results in improved performance on the test.

Consider yet another typical question in these experiments:

Which city has more inhabitants?

(a) Hyderabad

(b) Islamabad

How confident are you that your answer is correct?

50%, 60%, 70%, 80%, 90%, 100%

According to Gigerenzer (1993),

The major finding of some two decades of research is the following: In all the cases where subjects said, “I am 100% confident that my answer is correct,” the relative frequency of correct answers was only about 80%; in all the cases where subjects said, “I am 90% confident” the relative frequency of correct answers was only about 75%, when subjects said “I am 80% confident” the relative frequency of correct answers was only about 65%, and so on.

This is called overconfidence bias. A Bayesian might say that you aren’t calibrated. In any case, it’s generally frowned upon by both statistical camps. If when you say you’re 90% confident and you’re only right 80% of the time, why not just say you’re 80% confident? But consider a different experimental setup. Instead of only asking subjects one general knowledge question like the Hyderabad-Islamabad question above, ask them 50; and instead of asking them how confident they are that their answer is correct every time, ask them at the end how many they think they answered correctly. If people are biased in the way that overconfidence bias says they are there should be no difference between the two experiments.

First, Gigerenzer replicated the original experiments, showing an overconfidence bias of 13.8% - that is, subjects were an additional 13.8% more confident than the true relative frequency of correct answers, on average. For example, if they claimed a confidence of 90%, on average they would answer correctly 76.2% of the time. Using the 50 question treatment, overconfidence biased dropped to −2.4%! In a second replication, the control was 15.4% and the treatment was −4.2% (1993). Note that −2.4% and −4.2% are likely not significantly different from 0, so don’t interpret that as underconfidence bias. Once the probability judgment was framed in terms of relative frequencies, the bias basically disappeared.

So in all three experiments, the standard results of the heuristics and biases program fall once the problem is recast in terms of relative frequencies. Humans don’t simply use heuristics; something else more complicated is going on. But the important question is, of course, what else? To answer that, we need to take a detour through information representation. Any computer—and the brain is just a very difficult to understand computer—has to represent its information symbolically. The problem is that there are usually many ways to represent the same information. For example, 31, 11111, and XXXI all represent the same number using different systems of representation. Aside from the obvious visual differences, systems of representation also differ based on how easy they are to use for a variety of operations. If this doesn’t seem obvious, as Gigerenzer says, try long division using roman numerals (1993). Crucially, this difficulty is relative to the computer attempting to perform the operations. Your calculator works great in binary, but your brain works better when things are represented visually.

What does the representation of information have to do with the experimental results above? Well, let’s take another detour—this time through the philosophy of probability. As most of you already know, there the two most common positions are frequentism and Bayesianism. I won’t get into the details of either position beyond what is relevant, so if you’re unaware of the difference and are interested click the links. According to the Bayesian position, all probabilities are subjective degrees of belief. Don’t worry about the sense in which probabilities are subjective, just focus on the degrees of belief part. A Bayesian is comfortable assigning a probability to any proposition you can come up with. Some Bayesians don’t even care if the proposition is coherent.

Frequentists are different beasts altogether. For a frequentist, the probability of an event happening is its relative frequency in some well defined reference class. A useful though not entirely accurate way to think about frequentist probability is that there must be a numerator and a denominator in order to get a probability. The reference class of events you are considering provides the denominator (the total number of events), and the particular event you are considering provides the numerator (the number of times that particular event occurs in the reference class). If you flip a coin 100 times and get 37 heads and are interested in heads, the reference class is coin flips. Then the probability of flipping a coin and getting heads is 37100.1 Key to all of this is that the frequentist thinks there is no such thing as the probability of a single event happening without referring to some reference class. So returning to the Linda problem, there is no such thing as a frequentist probability that Linda is a bank teller, or a bank teller and active in the feminist movement. But there is a probability that, out of 100 people who have the same description as Linda, a randomly selected person is a bank teller, or a bank teller and active in the feminist movement.

In addition to the various philosophical differences between the Bayesians and frequentists, the two different schools also naturally lead to two different ways of representing the information contained in probabilities. Since all the frequentist cares about is relative frequencies, the natural way to represent probabilities in her mind is through, well, frequencies. The actual number representing the probability (e.g. p=.23) can always be calculated later as an afterthought. The Bayesian approach, on the other hand, leads to thinking in terms of percentages. If probability is just a degree of belief, why not represent it as such with, say, a number between 0 and 1? A “natural frequentist” would store all probabilistic information as frequencies, carefully counting each time an event occurs, while a “natural Bayesian” would store it as a single number—a percentage—to be updated later using Bayes’ theorem as information comes in. It wouldn’t be surprising if the natural frequentist had trouble operating with Bayesian probabilities. She thinks in terms of frequencies, but a single number isn’t a frequency—it has to be converted to a frequency in some way that allows her to keep counting events accurately if she wants to use this information.

So if it isn’t obvious by now, we’re natural frequentists! How many of you thought you were Bayesians?2 Gigerenzer’s experiments show that changing the representation of uncertainty from probabilities to frequencies drastically alters the results, making humans appear much better at statistical reasoning than previously thought. It’s not that we use heuristics that are systematically biased, our native architecture for representing uncertainty is just better at working with frequencies. When uncertainty isn’t represented using frequencies, our brains have trouble and fail in apparently predictable ways. To anyone who had Bayes’ theorem intuitively explained to them, it shouldn’t be all that surprising that we’re natural frequentists. How does Eliezer intuitively explain Bayes’ theorem? By working through examples using relative frequencies. This is also a relatively common tactic in undergraduate statistics textbooks, though it may only be because undergraduates typically are taught only the frequentist approach to probability.

So the heuristics and biases program doesn’t catalog the various ways that we fail to reason correctly under uncertainty, but it does catalog the various ways we reason incorrectly about probabilities that aren’t in our native representation. This could be because of our native architecture just not handling alternate representations of probability effectively, or it could be because when our native architecture starts having trouble, our brains automatically resort to using the heuristics Tversky and Kahneman were talking about. The latter seems more plausible to me in light of the other ways the brain approximates when it is forced to, but I’m still fairly uncertain. Gigerenzer has his own explanation that unifies the two domains under a specific theory of natural frequentism and has performed further experiments to back it up. He calls his explanation a theory of probabilistic mental models.3 I don’t completely understand Gigerenzer’s theory and his extra evidence seems to equally support the hypothesis that our brains are using heuristics when probabilities aren’t represented as frequencies, but I will say that Gigerenzer’s theory does have elegance going for it. Capturing both groups of phenomena with a unified theory makes Occam smile.

These experiments aren’t the only reason to believe that we’re actually pretty good at reasoning under uncertainty or that we’re natural frequentists; there are theoretical reasons as well. First, consider evolutionary theory. If lower order animals are decent at statistical reasoning, we would probably expect that humans are good as well since we all evolved from the same source. It is possible that a lower order species developed its statistical reasoning capabilities after its evolutionary path diverged from the ancestors of humans, or that statistical reasoning became less important for humans or their recent ancestors and thus evolution committed less resources to the process. But the ability to reason under uncertainty seems so useful, and if any species has the mental capacity to do it, we would expect humans to with their large, adept brains. Gigerenzer summarizes the evidence across species (1993):

Bumblebees, birds, rats, and ants all seem to be good intuitive statisticians, highly sensitive to changes in frequency distributions in their environments, as recent research in foraging behavior indicates (Gallistel, 1990; Real & Caraco, 1986). From sea snails to humans, as John Staddon (1988) argued, the learning mechanisms responsible for habituation, sensitization, and classical and operant conditioning can be described in terms of statistical inference machines. Reading this literature, one wonders why humans seem to do so badly in experiments on statistical reasoning.

Indeed. Should we really expect that bumblebees, birds, rats, and ants are better intuitive statisticians than us? It’s certainly possible, but it doesn’t appear all that likely, a priori.

Theories of the brain from cognitive science provide another reason why we would be adept at reasoning under uncertainty and a reason why would be natural frequentists. The connectionist approach to the study of the human mind suggests that the brain encodes information by making literal physical connections between neurons, represented on the mental level by connections between concepts. So, for example, if you see a dog and notice that it’s black, a connection between the concept “dog” and the concept “black” is made in a very literal sense. If connectionism is basically correct, then probabilistic reasoning shouldn’t be all that difficult for us. For example, if the brain needs to calculate the probability that any given dog is black, it can just count the number of connections between “dog” and “black” and the number of connections between “dog” and colors other than black.4 Voila! Relative frequencies. As Nobel Prize winning economist Vernon Smith puts it (2008, pg 208):

Hayek’s theory5 - that mental categories are based on the experiential relative frequency of coincidence between current and past perceptions—seems to imply that our minds should be good at probability reasoning.

It also suggests that we would be natural frequentists since our brains are quite literally built on relative frequencies.

So both evidence and theory point in the same direction. The research of Tversky and Kahneman, among others, originally showed that humans were fairly bad at reasoning under uncertainty. It turns out much of this is an artifact of how their subjects were asked to think about uncertainty. Having subjects think in terms of frequencies basically eliminates biases in experiments, suggesting that humans are just natural frequentists—their minds are structured to handle probabilities in terms of frequencies rather than in proportions or percentages. Only when we are working with information represented in a form difficult for our native architecture to handle do we appear to be using heuristics. Theoretical considerations from both evolutionary biology and cognitive science buttress both claims—that humans are both natural frequentists and not so bad at handling uncertainty—at least when thinking in terms of frequencies.


Footnotes

1: To any of you who raised an eyebrow, I did it on purpose ;).

2: Just to be clear, I am not arguing that since we are natural frequentists, the frequentist approach to probability is the correct approach.

3: What seems to be the key paper is the second link in the Google search I linked to. I haven’t read it yet, so I won’t really get into his theory here.

4: I acknowledge that this is a very simplified example and a gross simplification of the theory.

5: Friedrich Hayek, another Nobel Prize winning economist, independently developed the connectionist paradigm of the mind culminating in his 1952 book The Sensory Order. I do recommend reading Hayek’s book, but not without a reading group of some sort. It’s short but dense and very difficult to parse—let’s just say Hayek is not known for his prose.

References

Gigerenzer, Gerd. 1993. “The Bounded Rationality of Probabilistic Mental Models.” in Manktelow, K. I., & Over, D. E. eds. Rationality: Psychological and philosophical perspectives. (pp. 284-313). London: Routledge. Preprint available online.

Smith, Vernon L. 2008. Rationality in Economics. Cambridge: Cambridge UP.

Tversky, A., and D. Kahneman. 1983. “Extensional versus Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment.” Psychological Bulletin 90(4):293-315. Available online.