Techniques for probability estimates

Utility max­i­miza­tion of­ten re­quires de­ter­min­ing a prob­a­bil­ity of a par­tic­u­lar state­ment be­ing true. But hu­mans are not util­ity max­i­miz­ers and of­ten re­fuse to give pre­cise nu­mer­i­cal prob­a­bil­ities. Nev­er­the­less, their ac­tions re­flect a “hid­den” prob­a­bil­ity. For ex­am­ple, even some­one who re­fused to give a pre­cise prob­a­bil­ity for Barack Obama’s re-elec­tion would prob­a­bly jump at the chance to take a bet in which ey lost $5 if Obama wasn’t re-elected but won $5 mil­lion if he was; such de­ci­sions de­mand that the de­cider covertly be work­ing off of at least a vague prob­a­bil­ity.

When un­trained peo­ple try to trans­late vague feel­ings like “It seems Obama will prob­a­bly be re-elected” into a pre­cise nu­mer­i­cal prob­a­bil­ity, they com­monly fall into cer­tain traps and pit­falls that make their prob­a­bil­ity es­ti­mates in­ac­cu­rate. Cal­ling a prob­a­bil­ity es­ti­mate “in­ac­cu­rate” causes philo­soph­i­cal prob­lems, but these prob­lems can be re­solved by re­mem­ber­ing that prob­a­bil­ity is “sub­jec­tively ob­jec­tive”—that al­though a mind “hosts” a prob­a­bil­ity es­ti­mate, that mind does not ar­bi­trar­ily de­ter­mine the es­ti­mate, but rather calcu­lates it ac­cord­ing to math­e­mat­i­cal laws from available ev­i­dence. Th­ese calcu­la­tions re­quire too much com­pu­ta­tional power to use out­side the sim­plest hy­po­thet­i­cal ex­am­ples, but they provide a stan­dard by which to judge real prob­a­bil­ity es­ti­mates. They also sug­gest tests by which one can judge prob­a­bil­ities as well-cal­ibrated or poorly-cal­ibrated: for ex­am­ple, a per­son who con­stantly as­signs 90% con­fi­dence to eir guesses but only guesses the right an­swer half the time is poorly cal­ibrated. So call­ing a prob­a­bil­ity es­ti­mate “ac­cu­rate” or “in­ac­cu­rate” has a real philo­soph­i­cal ground­ing.

There ex­ist sev­eral tech­niques that help peo­ple trans­late vague feel­ings of prob­a­bil­ity into more ac­cu­rate nu­mer­i­cal es­ti­mates. Most of them trans­late prob­a­bil­ities from forms with­out im­me­di­ate con­se­quences (which the brain sup­pos­edly pro­cesses for sig­nal­ing pur­poses) to forms with im­me­di­ate con­se­quences (which the brain sup­pos­edly pro­cesses while fo­cus­ing on those con­se­quences).



Pre­pare for Revelation

What would you ex­pect if you be­lieved the an­swer to your ques­tion were about to be re­vealed to you?

In Belief in Belief, a man acts as if there is a dragon in his garage, but ev­ery time his neigh­bor comes up with an idea to test it, he has a rea­son why the test wouldn’t work. If he imag­ined Omega (the su­per­in­tel­li­gence who is always right) offered to re­veal the an­swer to him, he might re­al­ize he was ex­pect­ing Omega to re­veal the an­swer “No, there’s no dragon”. At the very least, he might re­al­ize he was wor­ried that Omega would re­veal this, and so re-think ex­actly how cer­tain he was about the dragon is­sue.

This is a sim­ple tech­nique and has rel­a­tively few pit­falls.


Bet on it

At what odds would you be will­ing to bet on a propo­si­tion?

Sup­pose some­one offers you a bet at even odds that Obama will be re-elected. Would you take it? What about two-to-one odds? Ten-to-one? In the­ory, the knowl­edge that money is at stake should make you con­sider the prob­lem in “near mode” and max­i­mize your chances of win­ning.

The prob­lem with this method is that it only works when util­ity is lin­ear with re­spect to money and you’re not risk-averse. In the sim­plest case I should be in­differ­ent to a $100,000 bet at 50% odds that a fair coin would come up tails, but in fact I would re­fuse it; win­ning $100,000 would be mod­er­ately good, but los­ing $100,000 would put me deeply in debt and com­pletely screw up my life. When these sorts of con­sid­er­a­tion be­come paramount, imag­in­ing wa­gers will tend to give in­ac­cu­rate re­sults.


Con­vert to a Fre­quen­cy

How many situ­a­tions would it take be­fore you ex­pected an event to oc­cur?

Sup­pose you need to give a prob­a­bil­ity that the sun will rise to­mor­row. “999,999 in a mil­lion” doesn’t im­me­di­ately sound wrong; the sun seems likely to rise, and a mil­lion is a very high num­ber. But if to­mor­row is an av­er­age day, then your prob­a­bil­ity will be linked to the num­ber of days it will take be­fore you ex­pect that the sun will fail to rise on at least one. A mil­lion days is three thou­sand years; the Earth has ex­isted for far more than three thou­sand years with­out the sun failing to rise. There­fore, 999,999 in a mil­lion is too low a prob­a­bil­ity for this oc­cur­rence. If you think the sort of as­tro­nom­i­cal event that might pre­vent the sun from ris­ing hap­pens only once ev­ery three billion years, then you might con­sider a prob­a­bil­ity more like 999,999,999,999 in a trillion.

In ad­di­tion to con­vert­ing to a fre­quency across time, you can also con­vert to a fre­quency across places or peo­ple. What’s the prob­a­bil­ity that you will be mur­dered to­mor­row? The best guess would be to check the mur­der rate for your area. What’s the prob­a­bil­ity there will be a ma­jor fire in your city this year? Check how many cities per year have ma­jor fires.

This method fails if your case is not typ­i­cal: for ex­am­ple, if your city is on the los­ing side of a war against an en­emy known to use fire-bomb­ing, the prob­a­bil­ity of a fire there has noth­ing to do with the av­er­age prob­a­bil­ity across cities. And if you think the rea­son the sun might not rise is a su­pervillain build­ing a high-tech sun-de­stroy­ing ma­chine, then con­sis­tent sun­rises over the past three thou­sand years of low tech­nol­ogy will provide lit­tle con­so­la­tion.

A spe­cial case of the above failure is con­vert­ing to fre­quency across time when con­sid­er­ing an event that is known to take place at a cer­tain dis­tance from the pre­sent. For ex­am­ple, if to­day is April 10th, then the prob­a­bil­ity that we hold a Christ­mas cel­e­bra­tion to­mor­row is much lower than the 1365 you get by check­ing on what per­centage of days we cel­e­brate Christ­mas. In the same way, al­though we know that the sun will fail to rise in a few billion years when it burns out its nu­clear fuel, this shouldn’t af­fect its chance of ris­ing to­mor­row.


Find a Refer­ence Class

How of­ten have similar state­ments been true?

What is the prob­a­bil­ity that the lat­est crisis in Korea es­ca­lates to a full-blown war? If there have been twenty crisis-level stand­offs in the Korean pen­in­sula in the past 60 years, and only one of them has re­sulted in a ma­jor war, then (war|crisis) = .05, so long as this crisis is equiv­a­lent to the twenty crises you’re us­ing as your refer­ence class.

But find­ing the refer­ence class is it­self a hard prob­lem. What is the prob­a­bil­ity Bigfoot ex­ists? If one makes a refer­ence class by say­ing that the yeti doesn’t ex­ist, the Loch Ness mon­ster doesn’t ex­ist, and so on, then the Bigfoot par­ti­san might ac­cuse you of as­sum­ing the con­clu­sion—af­ter all, the like­li­hood of these crea­tures ex­ist­ing is prob­a­bly similar to and cor­re­lated with Bigfoot. The par­ti­san might sug­gest ask­ing how many crea­tures pre­vi­ously be­lieved not to ex­ist later turned out to ex­ist—a list which in­cludes real an­i­mals like the orangutan and platy­pus—but then one will have to de­bate whether to in­clude crea­tures like drag­ons, orcs, and Poke­mon on the list.

This works best when the refer­ence class is more ob­vi­ous, as in the Korea ex­am­ple.


Make Mul­ti­ple State­ments

How many state­ments could you make of about the same un­cer­tainty as a given state­ment with­out be­ing wrong once?

Sup­pose you be­lieve France is larger than Italy. With what con­fi­dence should you be­lieve it? If you made ten similar state­ments (Ger­many is larger than Aus­tria, Bri­tain is larger than Ire­land, Spain is larger than Por­tu­gal, et cetera) how many times do you think you would be wrong? A hun­dred similar state­ments? If you think you’d be wrong only one time out of a hun­dred, you can give the state­ment 99% con­fi­dence.

This is the most con­tro­ver­sial prob­a­bil­ity as­sess­ment tech­nique; it tends to give lower lev­els of con­fi­dence than the oth­ers; for ex­am­ple, Eliezer wants to say there’s a less than one in a mil­lion chance the LHC would de­stroy the world, but doubts he could make a mil­lion similar state­ments and only be wrong once. Kom­pon­isto thinks this is a failure of imag­i­na­tion: we imag­ine our­selves grad­u­ally grow­ing tired and mak­ing mis­takes, whereas this method only works if the ac­cu­racy of the mil­lionth state­ment is ex­actly the same as the first.

In any case, the tech­nique is only as good as the abil­ity to judge which state­ments are equally difficult to a given state­ment. If I start say­ing things like “Rus­sia is larger than Vat­i­can City! Canada is larger than a speck of dust!” then I may get all the state­ments right, but it won’t mean much for my Italy-France ex­am­ple—and if I get bogged down in difficult ques­tions like “Bu­rundi is larger than Equa­to­rial Guinea” then I might end up un­der­con­fi­dent. In cases where there is an ob­vi­ous com­par­i­son (“Bob didn’t cheat on his test”, “Sue didn’t cheat on her test”, “Alice didn’t cheat on her test”) this prob­lem dis­ap­pears some­what.


Imag­ine Hy­po­thet­i­cal Ev­i­dence

How would your prob­a­bil­ities ad­just given new ev­i­dence?

Sup­pose one day all the re­li­gious peo­ple and all the athe­ists get tired of ar­gu­ing and de­cide to set­tle the mat­ter by ex­per­i­ment once and for all. The plan is to roll an n-sided num­bered die and have the faith­ful of all re­li­gions pray for the die to land on “1”. The ex­per­i­ment will be done once, with great pomp and cer­e­mony, and never re­peated, lest the losers try for a bet­ter re­sult. All the re­sources of the world’s skep­tics and se­cu­rity forces will be de­ployed to pre­vent any tam­per­ing with the die, and we as­sume their suc­cess is guaran­teed.

If the ex­per­i­menters used a twenty-sided die, and the die comes up 1, would this con­vince you that God prob­a­bly did it, or would you dis­miss the re­sult as a co­in­ci­dence? What about a hun­dred-sided die? Million-sided? If a suc­cess­ful re­sult on a hun­dred-sided die wouldn’t con­vince you, your prob­a­bil­ity of God’s ex­is­tence must be less than one in a hun­dred; if a mil­lion-sided die would con­vince you, it must be more than one in a mil­lion.

This tech­nique has also been de­nounced as in­ac­cu­rate, on the grounds that our co­in­ci­dence de­tec­tors are over­ac­tive and there­fore in no state to be cal­ibrat­ing any­thing else. It would feel very hard to dis­miss a suc­cess­ful re­sult on a thou­sand-sided die, no mat­ter how low the prob­a­bil­ity of God is. It might also be difficult to vi­su­al­ize a hy­po­thet­i­cal where the ex­per­i­ment can’t pos­si­bly be rigged, and it may be un­fair to force sub­jects to imag­ine a hy­po­thet­i­cal that would prac­ti­cally never hap­pen (like the mil­lion-sided die land­ing on one in a world where God doesn’t ex­ist).



Th­ese tech­niques should be ex­per­i­men­tally testable; any dis­agree­ment over which do or do not work (at least for a spe­cific in­di­vi­d­ual) can be re­solved by go­ing through a list of difficult ques­tions, declar­ing con­fi­dence lev­els, and scor­ing the re­sults with log odds. Steven’s blog has some good sets of test ques­tions (which I de­liber­ately do not link here so as to not con­tam­i­nate a pos­si­ble pool of test sub­jects); if many peo­ple are in­ter­ested in par­ti­ci­pat­ing and there’s a gen­eral con­sen­sus that an ex­per­i­ment would be use­ful, we can try to de­sign one.