How to come up with verbal probabilities

Unfortunately, we are kludged together, and we can’t just look up our probability estimates in a register somewhere when someone asks us “How sure are you?”.

The usual heuristic for putting a number on the strength of beliefs is to ask “When you’re this sure about something, what fraction of the time do you expect to be right in the long run?”. This is surely better than just “making up” numbers with no feel for what they mean, but still has it’s faults. The big one is that unless you’ve done your calibrating, you may not have a good idea of how often you’d expect to be right.

I can think of a few different heuristics to use when coming up with probabilities to assign.

1) Pretend you have to bet on it. Pretend that someone says “I’ll give you ____ odds, which side do you want?”, and figure out what the odds would have to be to make you indifferent to which side you bet on. Consider the question as if though you were actually going to put money on it. If this question is covered on a prediction market, your answer is given to you.

2) Ask yourself how much evidence someone would have to give you before you’re back to 50%. Since we’re trying to update according to bayes law, knowing how much evidence it takes to bring you to 50% tells you the probability you’re implicitely assigning.

For example, pretend someone said something like “I can guess peoples names by their looks”. If he guesses the first name right, and it’s a common name, you’ll probably write it off as fluke. The second time you’ll probably think he knew the people or is somehow fooling you, but conditional on that, you’d probably say he’s just lucky. By bayes law, this suggests that you put the prior probability of him pulling this stunt at 0.1%<p<3%, and less than 0.1% prior probability of him having his claimed skill. If it takes 4 correct calls to bring you to equally unsure either way, then thats about 0.03^4 if they’re common names, or one in a million1...

There’s a couple neat things about this trick. One is that it allows you to get an idea of what your subconscious level of certainty is before you ever think of it. You can imagine your immediate reaction to “Why yes, my name is Alex, how did you know” as well as your carefully deliberated response to the same data (if they’re much different, be wary of belief in belief). The other neat thing is that it pulls up alternate hypotheses that you find more likely, and how likely you find those to be (eg. “you know these people”).

3) Map out the typical shape of your probability distributions (ie through calibration tests) and then go by how many standard deviations off the mean you are. If you’re asked to give the probability that x<C, you can find your one sigma confidence intervals and then pull up your curve to see what it predicts based on how far out C is2.

4) Draw out your metaprobability distribution, and take the mean.

You may initially have different answers for each question, and in the end you have to decide which to trust when actually placing bets.

I personally tend to lean towards 1 for intermediate probabilities, and 2 then 4 for very unlikely things. The betting model breaks down as risk gets high (either by high stakes or extreme odds), since we bet to maximize a utility function that is not linear in money.

What other techniques do you use, and to how do you weight them?


1: A common name covers about 3% of the population, so p(b|!a) = 0.03^4 for 4 consecutive correct guesses, and p(b|a) ~=1 for sake of simplicity. Since p(a) is small, (1-p(a)) is approximated as 1.

p(a|b) = p(b|a)*p(a)/​p(b) = p(b|a)*p(a)/​(p(b|a)*p(a)+p(b|!a)*(1-p(a)) ⇒ approximately 0.5 = p(a)/​(p(a)+0.03^4) ⇒ p(a) = 0.03^4 ~= 11,000,000

2: The idea came from paranoid debating where Steve Rayhawk assumed a cauchy distribution. I tried to fit some data I had taken myself, but had insufficient statistics to figure out what the real shape is (if you guys have a bunch more data I could try again). It’s also worth noting that the shape of one’s probability distribution can change significantly from question to question so this would only apply in some cases.