Probability space has 2 metrics

Donald Hobson10 Feb 2019 0:28 UTC

88 points

Probability & Statistics Heuristics & Biases Logic & Mathematics

A metric is technically defined as a function from pairs of points to the non negitive reals. $d : X \times X \to [0, \infty)$ With the properties that $d (x, y) = d (y, x)$ and $d (x, y) = 0 ⟺ x = y$ and $d (x, y) + d (y, z) \geq d (x, z)$ .

Intuitively, a metric is a way of measuring how similar points are. Which points are nearby which others. Probabilities can be represented in several different ways, including the standard $p \in (0, 1)$ range and the log odds $b \in (- \infty, \infty)$ . They are related by $b = log (\frac{p}{1 - p})$ and $e^{b} - 1 = \frac{1}{1 - p}$ and $p = \frac{e^{b}}{e^{b} + 1}$ (equations algebraically equivalent)

The two metrics of importance are the baysian metric $B$ and the probability metric $P$ .

B (b_{1}, b_{2}) = | b_{1} - b_{2} | = ∣ ∣ ∣ ∣ log (\frac{p_{1} (1 - p_{2})}{p_{2} (1 - p_{1})}) ∣ ∣ ∣ ∣

P (p_{1}, p_{2}) = | p_{1} - p_{2} | = ∣ ∣ ∣ \frac{1}{e^{b_{1}} + 1} - \frac{1}{e^{b_{2}} + 1} ∣ ∣ ∣

Suppose you have a prior, $b_{1}$ in log odds, for some proposition. Suppose you update on some evidence that is twice as likely to appear if the proposition is true, to get a posterior, $b_{2}$ in log odds. Then $B (b_{1}, b_{2}) = log (2)$ . The metric $B$ measures how much evidence you need to move between probabilities.

Suppose you have a choice of actions, the first action will make an event of utility $u$ happen with probability $p_{1}$ , the other will cause the probability of the event to be $p_{2}$ . How much should you care. $u P (p_{1}, p_{2})$ .

The first metric stretches probabilities near 0 or 1 and is uniform in log odds. The second squashes all log odds with large absolute value together, and is uniform in probabilities. The first is used for baysian updates, the second for expected utility calculations.

Suppose an imperfect agent reasoned using a single metric, something in between these two. Some metric function less squashed up than $P$ but more squashed than $B$ around the ends. Suppose it crudely substituted this new metric into its reasoning processes whenever one of the other two metrics was required.

In decision theory problems, such an agent would rate small differences in probability as more important than they really were when facing probabilities near 0 or 1. From the inside, the difference between no chance and 0.01, would feel far larger than the distance between probabilities 0.46 and 0.47.

The Allais Paradox

However, the metric is more squashed than $B$ , so moving from a 10000:1 odds to 1000:1 odds seems to require less evidence than moving from 10:1 to 1:1. When facing small probabilities, such an agent would perform larger baysian updates than really necessary, based on weak evidence.

Privileging the Hypothesis

As both of these behaviors correspond to known human biases, could humans be using only a single metric on probability space?

Donald Hobson10 Feb 2019 0:28 UTC

88 points

11 comments1 min readLW link

Probability & Statistics Heuristics & Biases Logic & Mathematics

rossry 11 Feb 2019 6:23 UTC
16 points
The speculative proposition that humans might only be using one metric rings true and is compellingly presented.

However, I feel a bit clickbaited by the title, which (to me) implies that probability-space has only two metrics (which isn’t true, as the later proposition depends on). Maybe consider changing it to “Probability space has multiple metrics”, to avoid confusion?
shminux 10 Feb 2019 3:13 UTC
5 points
Note that the closer the probability of something to 0 or to 1, the harder it is evaluate accurately. A simple example: starting with a fair coin and observing a sequence of N heads in a row, what is an unbiased estimate of the coin’s bias? Log odds of N heads are -N when starting with a point estimate of a fair coin, which matches the Bayesian updates, so it is reasonable to conclude that the probability of heads is 1-2^(-N), but at the level small enough there are so many other factors that can interfere, the calculation ceases being accurate. Maybe the coin has heads on both sides? Maybe your brain makes you see heads when the coin flip outcome is actually tails? Maybe you are only hallucinating the coin flips? So, if you finally get a tail, reducing the estimated probability of heads, you are able to reject multiple other unlikely possibilities, as well, and it makes sense that one would need less evidence when moving from -N to -N+1 for large N than for small N.
- Davidmanheim 14 Feb 2019 10:38 UTC
  5 points
  Parent
  Yes—and this is equivalent to saying that evidence about probability provides Bayesian metric evidence—you need to transform it.
  - shminux 14 Feb 2019 15:39 UTC
    2 points
    Parent
    Could you explain your point further?
Alexei 10 Feb 2019 7:47 UTC
4 points
I don’t think I’ve read this view before, or if I have, I’ve forgotten it. Thanks for writing this up!
Lukas Finnveden 10 Feb 2019 22:01 UTC
3 points
$P (p 1, p 2) = | p 1 - p 2 | =∣ \frac{1}{e^{p_{1}} + 1} - \frac{1}{e^{p_{2}} + 1} ∣ ∣$
I think this should have b instead of p: $P (p 1, p 2) = | p 1 - p 2 | =∣ \frac{1}{e^{b_{1}} + 1} - \frac{1}{e^{b_{2}} + 1} ∣ ∣ ∣$
- Donald Hobson 11 Feb 2019 10:55 UTC
  1 point
  Parent
  Fixed, thanks.
Charlie Steiner 10 Feb 2019 20:42 UTC
3 points
Awesome idea! I think there might be something here, but I think the difference between “no chance” and “0.01% chance” is more of a discrete change from not tracking something to tracking it. We might also expect neglect of “one in a million” vs “one in a trillion” in both updates and decision-making, which causes a mistake opposite that predicted by this model in the case of decision-making.
Sniffnoy 10 Feb 2019 4:43 UTC
2 points
I’m pretty sure this point has been made here before, but, hey, it’s worth repeating, no? :)
Bucky 11 Feb 2019 22:29 UTC
1 point
I like the theory. How would we test it?
We have a fairly good idea of how people weight decisions based on probabilities via offering different bets and seeing which ones get chosen.
I don’t know how much quantification has been done on incorrect Bayesian updates. Could one suggest trades where one is given options one of which has been recommended by an “expert” who has made the correct prediction to a 50:50 question on a related topic x times in a row. How much do people adjust based on the evidence of the expert? This doesn’t sound perfect to me, maybe someone else has a better version or maybe people are already doing this research?!
- Donald Hobson 11 Feb 2019 22:50 UTC
  8 points
  Parent
  Get a pack of cards in which some cards are blue on both sides, and some are red on one side and blue on the other. Pick a random card from the pile. If the subject is shown one side of the card, and its blue, they gain a bit of evidence that the card is blue on both sides. Give them the option to bet on the colour of the other side of the card, before and after they see the first side. Invert the prospect theory curve to get from implicit probability to betting behaviour. The people should perform a larger update in log odds when the pack is mostly one type of card, over when the pack is 50 : 50.