Very Short Introduction to Bayesian Model Comparison

At least within Bayesian probability, there is a single unique unambiguously-correct answer to “how should we penalize for model complexity?”: calculate the probability of each model, given the data. This is Hard to compute in general, which is why there’s a whole slew of of other numbers which approximate it in various ways.

Here’s how it works. Want to know whether model 1 or model 2 is more consistent with the data? Then compute and . Using Bayes’ rule:

where Z is the normalizer. If we’re just comparing two models, then we can get rid of that annoying Z by computing odds for the two models:

In English: posterior relative odds of the two models is equal to prior odds times the ratio of likelihoods. That likelihood ratio is the Bayes factor: it directly describes the update in the relative odds of the two models, due to the data. Calculating the Bayes factor—i.e. for each model—is the main challenge of Bayesian model comparison.

Example

20 coin flips yield 16 heads and 4 tails. Is the coin biased?

Here we have two models:

  • Model 1: coin unbiased

  • Model 2: coin has some unknown probability of coming up heads (we’ll use a uniform prior on for simplicity)

The second model has one free parameter (the bias) which we can use to fit the data, but it’s more complex and prone to over-fitting. When we integrate over that free parameter, it will fit the data poorly over most of the parameter space—thus the “penalty” associated with free parameters in general.

In this example, the integral is exactly tractable (it’s a dirichlet-multinomial model), and we get:

So the Bayes factor is (.048)/​(.0046) ~ 10, in favor of a biased coin. In practice, I’d say unbiased coins are at least 10x more likely than biased coins in day-to-day life a priori, so we might still think the coin is unbiased. But if we were genuinely unsure to start with, then this would be pretty decent evidence in favor.