# Probability space has 2 metrics

A met­ric is tech­ni­cally defined as a func­tion from pairs of points to the non ne­gi­t­ive re­als. With the prop­er­ties that and and .

In­tu­itively, a met­ric is a way of mea­sur­ing how similar points are. Which points are nearby which oth­ers. Prob­a­bil­ities can be rep­re­sented in sev­eral differ­ent ways, in­clud­ing the stan­dard range and the log odds . They are re­lated by and and (equa­tions alge­braically equiv­a­lent)

The two met­rics of im­por­tance are the baysian met­ric and the prob­a­bil­ity met­ric .

Sup­pose you have a prior, in log odds, for some propo­si­tion. Sup­pose you up­date on some ev­i­dence that is twice as likely to ap­pear if the propo­si­tion is true, to get a pos­te­rior, in log odds. Then . The met­ric mea­sures how much ev­i­dence you need to move be­tween prob­a­bil­ities.

Sup­pose you have a choice of ac­tions, the first ac­tion will make an event of util­ity hap­pen with prob­a­bil­ity , the other will cause the prob­a­bil­ity of the event to be . How much should you care. .

The first met­ric stretches prob­a­bil­ities near 0 or 1 and is uniform in log odds. The sec­ond squashes all log odds with large ab­solute value to­gether, and is uniform in prob­a­bil­ities. The first is used for baysian up­dates, the sec­ond for ex­pected util­ity calcu­la­tions.

Sup­pose an im­perfect agent rea­soned us­ing a sin­gle met­ric, some­thing in be­tween these two. Some met­ric func­tion less squashed up than but more squashed than around the ends. Sup­pose it crudely sub­sti­tuted this new met­ric into its rea­son­ing pro­cesses when­ever one of the other two met­rics was re­quired.

In de­ci­sion the­ory prob­lems, such an agent would rate small differ­ences in prob­a­bil­ity as more im­por­tant than they re­ally were when fac­ing prob­a­bil­ities near 0 or 1. From the in­side, the differ­ence be­tween no chance and 0.01, would feel far larger than the dis­tance be­tween prob­a­bil­ities 0.46 and 0.47.

How­ever, the met­ric is more squashed than , so mov­ing from a 10000:1 odds to 1000:1 odds seems to re­quire less ev­i­dence than mov­ing from 10:1 to 1:1. When fac­ing small prob­a­bil­ities, such an agent would perform larger baysian up­dates than re­ally nec­es­sary, based on weak ev­i­dence.

Priv­ileg­ing the Hypothesis

As both of these be­hav­iors cor­re­spond to known hu­man bi­ases, could hu­mans be us­ing only a sin­gle met­ric on prob­a­bil­ity space?

• The spec­u­la­tive propo­si­tion that hu­mans might only be us­ing one met­ric rings true and is com­pel­lingly pre­sented.

How­ever, I feel a bit click­baited by the ti­tle, which (to me) im­plies that prob­a­bil­ity-space has only two met­rics (which isn’t true, as the later propo­si­tion de­pends on). Maybe con­sider chang­ing it to “Prob­a­bil­ity space has mul­ti­ple met­rics”, to avoid con­fu­sion?

• Note that the closer the prob­a­bil­ity of some­thing to 0 or to 1, the harder it is eval­u­ate ac­cu­rately. A sim­ple ex­am­ple: start­ing with a fair coin and ob­serv­ing a se­quence of N heads in a row, what is an un­bi­ased es­ti­mate of the coin’s bias? Log odds of N heads are -N when start­ing with a point es­ti­mate of a fair coin, which matches the Bayesian up­dates, so it is rea­son­able to con­clude that the prob­a­bil­ity of heads is 1-2^(-N), but at the level small enough there are so many other fac­tors that can in­terfere, the calcu­la­tion ceases be­ing ac­cu­rate. Maybe the coin has heads on both sides? Maybe your brain makes you see heads when the coin flip out­come is ac­tu­ally tails? Maybe you are only hal­lu­ci­nat­ing the coin flips? So, if you fi­nally get a tail, re­duc­ing the es­ti­mated prob­a­bil­ity of heads, you are able to re­ject mul­ti­ple other un­likely pos­si­bil­ities, as well, and it makes sense that one would need less ev­i­dence when mov­ing from -N to -N+1 for large N than for small N.

• Yes—and this is equiv­a­lent to say­ing that ev­i­dence about prob­a­bil­ity pro­vides Bayesian met­ric ev­i­dence—you need to trans­form it.

• Could you ex­plain your point fur­ther?

• I don’t think I’ve read this view be­fore, or if I have, I’ve for­got­ten it. Thanks for writ­ing this up!

• I think this should have b in­stead of p:

• Fixed, thanks.

• Awe­some idea! I think there might be some­thing here, but I think the differ­ence be­tween “no chance” and “0.01% chance” is more of a dis­crete change from not track­ing some­thing to track­ing it. We might also ex­pect ne­glect of “one in a mil­lion” vs “one in a trillion” in both up­dates and de­ci­sion-mak­ing, which causes a mis­take op­po­site that pre­dicted by this model in the case of de­ci­sion-mak­ing.

• I’m pretty sure this point has been made here be­fore, but, hey, it’s worth re­peat­ing, no? :)

• I like the the­ory. How would we test it?

We have a fairly good idea of how peo­ple weight de­ci­sions based on prob­a­bil­ities via offer­ing differ­ent bets and see­ing which ones get cho­sen.

I don’t know how much quan­tifi­ca­tion has been done on in­cor­rect Bayesian up­dates. Could one sug­gest trades where one is given op­tions one of which has been recom­mended by an “ex­pert” who has made the cor­rect pre­dic­tion to a 50:50 ques­tion on a re­lated topic x times in a row. How much do peo­ple ad­just based on the ev­i­dence of the ex­pert? This doesn’t sound perfect to me, maybe some­one else has a bet­ter ver­sion or maybe peo­ple are already do­ing this re­search?!

• Get a pack of cards in which some cards are blue on both sides, and some are red on one side and blue on the other. Pick a ran­dom card from the pile. If the sub­ject is shown one side of the card, and its blue, they gain a bit of ev­i­dence that the card is blue on both sides. Give them the op­tion to bet on the colour of the other side of the card, be­fore and af­ter they see the first side. In­vert the prospect the­ory curve to get from im­plicit prob­a­bil­ity to bet­ting be­havi­our. The peo­ple should perform a larger up­date in log odds when the pack is mostly one type of card, over when the pack is 50 : 50.