Five views of Bayes’ Theorem

I think rearranging Bayes’ Theorem sheds light on it in interesting ways.

P(A|B) = P(B|A) P(A)/​P(B)

This is the most common form: you want to know the conditional probability of A given B, but what you actually know is the probability of B given A (and your priors on A and B). Bayes lets you swap the two events.

One way to think about this one: to swap around the conditional, multiply by a correction factor of P(A)/​P(B) to change the “units” from “some sort of probability of B” to “some sort of probability of A”. (This is literally a unit conversion if P(A) and P(B) are measures over two different spaces!)

P(A|B) P(B) = P(B|A) P(A)

These are two ways of writing the joint probability P(A, B), corresponding to two ways of sampling (A, B) sequentially: B and then A, or A and then B. The first thing you sample comes from your prior, and the second thing comes from a conditional distribution that depends on the first thing.

P(A|B)/​P(B|A) = P(A)/​P(B)

A lot of the time, people say things like “A is pretty likely given B, but B isn’t quite as likely given A.” Or: “many Bs are As, but not that many As are Bs.” These are equivalent to saying “A is more likely than B, a priori” or “there are more As than Bs”.

(I actually wrote this post because people kept saying the first thing around me and I had to think for a moment each time to realize it was the same as the second thing.)

P(A|B)/​P(A) = P(B|A)/​P(B)

Let’s say both sides are equal to three. That means:

A is three times more likely than priors if B.

But also:

B is three times more likely than priors if A.

So if people in San Francisco are three times as likely to own crypto as the general population, then people who own crypto are three times as likely to live in SF.

This almost implies that “the relative risk of A given B is the relative risk of B given A”, but “relative risk” is apparently defined as P(A|B)/​P(A|¬B) rather than P(A|B)/​P(A). Well, at least it’s approximately true when A and B are both rare.

P(B) = P(A) P(B|A)/​P(A|B)

This one’s weird. I guess it’s useful if you sampled B after A, but then realized you want to know the prior probability of B? And for some reason P(A|B) is what you have lying around?

Leave a comment if you know a cute intuition for this one!