Bayes’ rule =/​= Bayesian inference

Related to: Bayes’ Theorem Illustrated, What is Bayesianism?, An Intuitive Explanation of Bayes’ Theorem

(Bayes’ theorem is something Bayesians need to use more often than Frequentists do, but Bayes’ theorem itself isn’t Bayesian. This post is meant to be a light introduction to the difference between Bayes’ theorem and Bayesian data analysis.)

Bayes’ Theorem

Bayes’ theorem is just a way to get (e.g.) p(B|A) from p(A|B) and p(B). The classic example of Bayes’ theorem is diagnostic testing. Suppose someone either has the disease (D+) or does not have the disease (D-) and either tests positive (T+) or tests negative (T-). If we knew the sensitivity P(T+|D+), specificity P(T-|D-) and disease prevalence P(D+), then we could get the positive predictive value P(D+|T+) using Bayes’ theorem:

For example, suppose we know the sensitivity=0.9, specificity=0.8 and disease prevalence is 0.01. Then,


This answer is not Bayesian or frequentist; it’s just correct.

Diagnostic testing study

Typically we will not know P(T+|D+) or P(T-|D-). We would consider these unknown parameters. Let’s denote them by Θsens and Θspec. For simplicity, let’s assume we know the disease prevalence P(D+) (we often have a lot of data on this).

Suppose 1000 subjects with the disease were tested, and 900 of them tested positive. Suppose 1000 disease-free subjects were tested and 200 of them tested positive. Finally, suppose 1% of the population has the disease.

Frequentist approach

Estimate the 2 parameters (sensitivity and specificity) using their sample values (sample proportions) and plug them in to Bayes’ formula above. This results in a point estimate for P(D+|T+) of 0.043. A standard error or confidence interval could be obtained using the delta method or bootstrapping.

Even though Bayes’ theorem was used, this is not a Bayesian approach.

Bayesian approach

The Bayesian approach is to specify prior distributions for all unknowns. For example, we might specify independent uniform(0,1) priors for Θsens and Θspec. However, we should expect the test to do at least as good as guessing (guessing would mean randomly selecting 1% of people and calling them T+). In addition, we expect Θsens>1-Θspec. So, I might go with a Beta(4,2.5) distribution for Θsens and Beta(2.5,4) for Θspec:

Using these priors + the data yields a posterior distribution for P(D+|T+) with posterior median 0.043 and 95% credible interval (0.038, 0.049). In this case, the Bayesian and frequentist approaches have the same results (not surprising since the priors are relatively flat and there are a lot of data). However, the methodology is quite different.

Example that illustrates benefit of Bayesian data analysis

(example edited to focus on credible/​confidence intervals)

Suppose someone shows you what looks like a fair coin (you confirm head on one side tails on the other) and makes the claim: “This coin will land with heads up 90% of the time”

Suppose the coin is flipped 5 times and lands with heads up 4 times.

Frequentist approach

“A 95% confidence interval for the Binomial parameter is (.38, .99) using the Agresti-Coull method.” Because 0.9 is within the confidence limits, the usual conclusion would be that we do not have enough evidence to rule it out.

Bayesian approach

“I don’t believe you. Based on experience and what I know about the laws of physics, I think it’s very unlikely that your claim is accurate. I feel very confident that the probability is close to 0.5. However, I don’t want to rule out something a little bit unusual (like a probability of 0.4). Thus, my prior for the probability of heads is a Beta(30,30) distribution.”

After seeing the data, we update our belief about the binomial parameter. The 95% credible interval for it is (0.40, 0.64). Thus, a value of 0.9 is still considered extremely unlikely.

This illustrates the idea that, from a Bayesian perspective, implausible claims require more evidence than plausible claims. Frequentists have no formal way of including that type of prior information.