Techniques for probability estimates

Utility maximization often requires determining a probability of a particular statement being true. But humans are not utility maximizers and often refuse to give precise numerical probabilities. Nevertheless, their actions reflect a “hidden” probability. For example, even someone who refused to give a precise probability for Barack Obama’s re-election would probably jump at the chance to take a bet in which ey lost $5 if Obama wasn’t re-elected but won $5 million if he was; such decisions demand that the decider covertly be working off of at least a vague probability.

When untrained people try to translate vague feelings like “It seems Obama will probably be re-elected” into a precise numerical probability, they commonly fall into certain traps and pitfalls that make their probability estimates inaccurate. Calling a probability estimate “inaccurate” causes philosophical problems, but these problems can be resolved by remembering that probability is “subjectively objective”—that although a mind “hosts” a probability estimate, that mind does not arbitrarily determine the estimate, but rather calculates it according to mathematical laws from available evidence. These calculations require too much computational power to use outside the simplest hypothetical examples, but they provide a standard by which to judge real probability estimates. They also suggest tests by which one can judge probabilities as well-calibrated or poorly-calibrated: for example, a person who constantly assigns 90% confidence to eir guesses but only guesses the right answer half the time is poorly calibrated. So calling a probability estimate “accurate” or “inaccurate” has a real philosophical grounding.

There exist several techniques that help people translate vague feelings of probability into more accurate numerical estimates. Most of them translate probabilities from forms without immediate consequences (which the brain supposedly processes for signaling purposes) to forms with immediate consequences (which the brain supposedly processes while focusing on those consequences).



Prepare for Revelation

What would you expect if you believed the answer to your question were about to be revealed to you?

In Belief in Belief, a man acts as if there is a dragon in his garage, but every time his neighbor comes up with an idea to test it, he has a reason why the test wouldn’t work. If he imagined Omega (the superintelligence who is always right) offered to reveal the answer to him, he might realize he was expecting Omega to reveal the answer “No, there’s no dragon”. At the very least, he might realize he was worried that Omega would reveal this, and so re-think exactly how certain he was about the dragon issue.

This is a simple technique and has relatively few pitfalls.


Bet on it

At what odds would you be willing to bet on a proposition?

Suppose someone offers you a bet at even odds that Obama will be re-elected. Would you take it? What about two-to-one odds? Ten-to-one? In theory, the knowledge that money is at stake should make you consider the problem in “near mode” and maximize your chances of winning.

The problem with this method is that it only works when utility is linear with respect to money and you’re not risk-averse. In the simplest case I should be indifferent to a $100,000 bet at 50% odds that a fair coin would come up tails, but in fact I would refuse it; winning $100,000 would be moderately good, but losing $100,000 would put me deeply in debt and completely screw up my life. When these sorts of consideration become paramount, imagining wagers will tend to give inaccurate results.


Convert to a Frequency

How many situations would it take before you expected an event to occur?

Suppose you need to give a probability that the sun will rise tomorrow. “999,999 in a million” doesn’t immediately sound wrong; the sun seems likely to rise, and a million is a very high number. But if tomorrow is an average day, then your probability will be linked to the number of days it will take before you expect that the sun will fail to rise on at least one. A million days is three thousand years; the Earth has existed for far more than three thousand years without the sun failing to rise. Therefore, 999,999 in a million is too low a probability for this occurrence. If you think the sort of astronomical event that might prevent the sun from rising happens only once every three billion years, then you might consider a probability more like 999,999,999,999 in a trillion.

In addition to converting to a frequency across time, you can also convert to a frequency across places or people. What’s the probability that you will be murdered tomorrow? The best guess would be to check the murder rate for your area. What’s the probability there will be a major fire in your city this year? Check how many cities per year have major fires.

This method fails if your case is not typical: for example, if your city is on the losing side of a war against an enemy known to use fire-bombing, the probability of a fire there has nothing to do with the average probability across cities. And if you think the reason the sun might not rise is a supervillain building a high-tech sun-destroying machine, then consistent sunrises over the past three thousand years of low technology will provide little consolation.

A special case of the above failure is converting to frequency across time when considering an event that is known to take place at a certain distance from the present. For example, if today is April 10th, then the probability that we hold a Christmas celebration tomorrow is much lower than the 1365 you get by checking on what percentage of days we celebrate Christmas. In the same way, although we know that the sun will fail to rise in a few billion years when it burns out its nuclear fuel, this shouldn’t affect its chance of rising tomorrow.


Find a Reference Class

How often have similar statements been true?

What is the probability that the latest crisis in Korea escalates to a full-blown war? If there have been twenty crisis-level standoffs in the Korean peninsula in the past 60 years, and only one of them has resulted in a major war, then (war|crisis) = .05, so long as this crisis is equivalent to the twenty crises you’re using as your reference class.

But finding the reference class is itself a hard problem. What is the probability Bigfoot exists? If one makes a reference class by saying that the yeti doesn’t exist, the Loch Ness monster doesn’t exist, and so on, then the Bigfoot partisan might accuse you of assuming the conclusion—after all, the likelihood of these creatures existing is probably similar to and correlated with Bigfoot. The partisan might suggest asking how many creatures previously believed not to exist later turned out to exist—a list which includes real animals like the orangutan and platypus—but then one will have to debate whether to include creatures like dragons, orcs, and Pokemon on the list.

This works best when the reference class is more obvious, as in the Korea example.


Make Multiple Statements

How many statements could you make of about the same uncertainty as a given statement without being wrong once?

Suppose you believe France is larger than Italy. With what confidence should you believe it? If you made ten similar statements (Germany is larger than Austria, Britain is larger than Ireland, Spain is larger than Portugal, et cetera) how many times do you think you would be wrong? A hundred similar statements? If you think you’d be wrong only one time out of a hundred, you can give the statement 99% confidence.

This is the most controversial probability assessment technique; it tends to give lower levels of confidence than the others; for example, Eliezer wants to say there’s a less than one in a million chance the LHC would destroy the world, but doubts he could make a million similar statements and only be wrong once. Komponisto thinks this is a failure of imagination: we imagine ourselves gradually growing tired and making mistakes, whereas this method only works if the accuracy of the millionth statement is exactly the same as the first.

In any case, the technique is only as good as the ability to judge which statements are equally difficult to a given statement. If I start saying things like “Russia is larger than Vatican City! Canada is larger than a speck of dust!” then I may get all the statements right, but it won’t mean much for my Italy-France example—and if I get bogged down in difficult questions like “Burundi is larger than Equatorial Guinea” then I might end up underconfident. In cases where there is an obvious comparison (“Bob didn’t cheat on his test”, “Sue didn’t cheat on her test”, “Alice didn’t cheat on her test”) this problem disappears somewhat.


Imagine Hypothetical Evidence

How would your probabilities adjust given new evidence?

Suppose one day all the religious people and all the atheists get tired of arguing and decide to settle the matter by experiment once and for all. The plan is to roll an n-sided numbered die and have the faithful of all religions pray for the die to land on “1”. The experiment will be done once, with great pomp and ceremony, and never repeated, lest the losers try for a better result. All the resources of the world’s skeptics and security forces will be deployed to prevent any tampering with the die, and we assume their success is guaranteed.

If the experimenters used a twenty-sided die, and the die comes up 1, would this convince you that God probably did it, or would you dismiss the result as a coincidence? What about a hundred-sided die? Million-sided? If a successful result on a hundred-sided die wouldn’t convince you, your probability of God’s existence must be less than one in a hundred; if a million-sided die would convince you, it must be more than one in a million.

This technique has also been denounced as inaccurate, on the grounds that our coincidence detectors are overactive and therefore in no state to be calibrating anything else. It would feel very hard to dismiss a successful result on a thousand-sided die, no matter how low the probability of God is. It might also be difficult to visualize a hypothetical where the experiment can’t possibly be rigged, and it may be unfair to force subjects to imagine a hypothetical that would practically never happen (like the million-sided die landing on one in a world where God doesn’t exist).



These techniques should be experimentally testable; any disagreement over which do or do not work (at least for a specific individual) can be resolved by going through a list of difficult questions, declaring confidence levels, and scoring the results with log odds. Steven’s blog has some good sets of test questions (which I deliberately do not link here so as to not contaminate a possible pool of test subjects); if many people are interested in participating and there’s a general consensus that an experiment would be useful, we can try to design one.