a visual explanation of Bayesian updating

Link post

As a teaser here is the visual version of Bayesian updating:

svg

But in order to understand that figure we need to go through the prior and likelihood!

You find me standing in a basketball court ready to shoot some hoops. What do you believe about my performance before I take a shot?. There are no good Null hypothesis here unless you happen to have a lot of knowledge about the average human basket ball performance!, and even so, why do you care whether I am significant different from the average?, You can fall back to the new statistics which is almost as good as the Bayesian approach, it but does not answer what you should believe before I take a shot.

The Beta distribution is a popular prior for binary events, when the two parameter ( and ) are equal to 1, it is uniform. Since you my dear reader have no concept about my basket skills you assume a comes from a distribution, formally:

Where is my probability of scoring, the distribution looks like this:

svg

Completely Uniform, a great prior when you are totally oblivious.

I take a shot and miss (), the likelihood of a miss looks like this:

svg

(if you are extra currious, you can brush up on the math behind all the binary distributions here)

Notice that:

  • , the likelihood that I always miss is 1

  • , the likelihood that I miss half the time is 0.5

  • , the likelihood that I always hit is 0, which is obvious as I can’t score all the time if I just missed.

Notice that these likelihoods and not probabilities, but how likely the data are for different values of , so it is twice as likely:

That the data was generated by compared to .

Bayesian Updating Math

Here is Bayes theorem for the Bernoulli distribution with a Beta prior, where the parameter is 1 when I score and 0 otherwise:

For technical reason , the probability of the data, is difficult to calculate, it is however ‘just a normalization constant’ because it does not depend on which is my scoring probability, thus we can simply drop it and get an unnormalized posterior:

An unnormalized posterior is simply a density function that does not sum to 1, when we plot it, it looks ‘correct’ except we have screwed up the numbers on the y axis.

Visual Bayesian Updating

So now we have a ‘square’ prior and we have a triangle likelihood , if we multiply them together we get the unnormalized posterior, so we do:

Which intuitively can be taught of as: the square makes everything equally likely, so the likelihood will dominate the posterior, or in dodgy math:

Here is the Figure:

svg

Try to put your finger on the figure check that is 1 for the square and 0.5 for the triangle and is thus in the unnormalized posterior

I shoot again and score!

Now we use the previous posterior as the new prior, but because we score we get an ‘opposite triangle’ which is the likelihood of

Again we multiply the prior triangle by the likelihood triangle and get a blob centered on 0.5 as the posterior:

svg

Notice how the posterior is peaked at , this is because the two triangles at the center have an unnormalized posterior density of where at edges such as they have

I shoot again and score!

So now again the previous blob posterior is our new prior, which we multiply by the ‘I scored triangle’ resulting in a blob that has a mode above 0.5, which makes sense as I made 23 shots:

svg

While this may seem like a cute toy example it’s a totally valid way of solving a Bayesian posterior, and is the way all most popular bayesian books (Gelman[1], Kruschke[2] and McElreath[3]) introduce the concept!

Bayesian Updating using Conjugation

In the case of the Bernoulli events we can actually solve the posterior easily because the Beta is conjugated to the Bernoulli, conjugation is simply fancy statistics speak for it having a simple mathematical form, and that form is also a Beta distribution, thus you can update the beta distribution using this simple rule:

So we Started with a prior with

Then we got a miss, z=0

Then we got a hit, z=1

Then we got a hit, z=1

We can plot the posterior

svg

Notice how the this posterior has the exact same shape as the one we got via updating, the only different is the numbers on the y-axis.

(Hi, if you made it this far please comment, if there were something that was not well explained, I care more about my statistics communication skills than my ego, so negative feedback is very welcome)


  1. ↩︎

    Gelman, Hill and Vehtari, “Regression and Other Stories”

  2. ↩︎

    Richard McElreath “Statistical Rethinking”

  3. ↩︎

    John Kruschke “Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan 2nd Edition”