Frequentist vs Bayesian breakdown: interpretation vs inference

Academian30 Aug 2011 15:58 UTC

36 points

Suppose we have two different human beings, Connor and Diane, who agree to interpret their subjective anticipations as probabilities, thereby commonly earning them the title “Bayesian”. On a particular project or venture, they might disagree on Trick A or Trick B to decide the next step in the project. It might be that Trick A is commonly labelled a “Frequentist inference method” and B is a “Bayesian inference method”. Why might they disagree?

As far as I can see, there are 3 disagreements that get labelled “Bayesian vs Frequentist” debates, and conflating them is a problem:

(1) Whether to interpret all subjective anticipations as probabilities.

(2) Whether to interpret all probabilities as subjective anticipations.

(3) Whether, on a particular project, to use Statistical Trick B instead of Statistical Trick A to infer the best course of action, when B is commonly labelled a “Bayesian method” and A is a “Frequentist method”.

(Regarding 3, UC Berkeley professor Michael Jordan offers a good heuristic for how statistical tricks get labelled as Bayesisn or Frequentist, in terms of which terms in a loss function one treats as fixed or variable. I recommend watching the first twenty minutes of his video lecture on this if you’re not familiar.)

The question “is Connor a Bayesian or a Frequentist?” is commonly posed as though Connor’s position on 1, 2, and 3 must be either “yes, yes, yes” or “no, no, no”. I don’t believe this is so often the case. For example, my position is:

(1) - Yes. Insofar as we have subjective anticipations, I agree normatively that they should behave and update as probabilities.

(2) - Don’t care much. Expressions like P(X|Y) and P(X and Y) are useful for denoting both subjective anticipations and proportions of a whole, and in particular, proportions of real future events. Whether to use the word “probability” is a terminological question. Personally I try to reserve the word “probability” for when they mean subjective anticipations, and say “proportion” when they mean proportions of real future, but this is word choice. Unfortunately this word choice is strongly associated and confused with positions on (1) and (3).

(3) - It depends. In statistical inference, we commonly consider data sets x, world models M, and parameters θ that specify the model M more precisely. I consider the separation of belief into M and θ to be purely formal. When guessing the next data set y, one considers expressions of the form P(x|M,θ) in some way. If I’m already very confident in a specific world model M, and expect θ to actually vary from situation to situation, I’ll probably try to estimate the parameters θ from x in a way that has the best expected success rate across all possible data sets M would generate. You might say here that I “trust the model more than the data” (though what I really don’t trust are the changing model parameters), and this is a trick commonly referred to as “Frequentist”. If I’m not confident in the model M, or expect the parameters θ to the be the same in many future situations, I’ll probably try to estimate M,θ from x in a way that has the best expected success rate assuming x. You might say here that I “trust the data more than the model”, and label this a “Bayesian” trick.

Throughout (3), since my position in (1) is not changing, a member of the Bayes Tribe will say I’m “really a Bayesian all along”, but I don’t want to continue with this conflation of position names. It’s true that if I use the “Frequentist trick”, it will be because I’ve updated in favor of it, i.e. my subjective confidence levels in the various theory elements are appropriate for it.

… But from now on, when term “Bayesian” or “Frequentist” arises in a debate, my plan is to taboo the terms immediately, and proceed to either dissolve the issue into (1), (2), and (3) above, or change the conversation if people don’t have the energy or interest for that length of conversation.

Do people agree with this breakdown? I think I could be persuaded otherwise and would of course appreciate it if I were :)

ETA: I think the wisdom to treat beliefs as anticipation controllers and update our confidences based on evidence might be too precious to alienate people from it with the label “Bayesian”, especially if the label is as ambiguous as my breakdown has found it to be.

What links here?

Academian30 Aug 2011 15:58 UTC

36 points

4 comments2 min readLW link Archive

Bayes' Theorem

pragmatist 30 Aug 2011 20:23 UTC
9 points
I like issues (2) and (3) in your breakdown, but I don’t think (1) captures an important aspect of the Bayesian/frequentist debate. I don’t really associate frequentism with a denial of probabilism (the claim that the degrees of belief of a rational agent obey the probability calculus). I do think there is an interesting disagreement in the vicinity of (1) about how degrees of belief should be set.

My model of a frequentist is someone who thinks relative frequency should be treated as an expert function: If rf(X) is the relative frequency with which propositions like X are true in some appropriate reference class, then P(X | rf(X) = x) = x. This seems to me the most natural interpretation of the claim that probabilities are just relative frequencies. My frequentist doesn’t answer “no” to (1). She does think that subjective anticipations obey the probability calculus, and this is because relative frequencies obey the calculus and subjective anticipations should be guided by knowledge of relative frequencies. So she treats relative frequency as an expert function, which means she tries to maximize her calibration.

The Bayesian does not think the rational agent should always try to maximize calibration. There are situations where one should be willing to sacrifice calibration for discrimination. Eliezer has a good example of this in A Technical Explanation of Technical Explanation. Here’s my understanding of the difference: The Bayesian treats the truth function (the function that assigns 1 to truths and 0 to falsehoods) as an expert function, and this is is incompatible with treating relative frequency as an expert function. Trying to estimate truth can lead you to intentionally sacrifice calibration for discrimination; trying to maximize calibration cannot.

So maybe (1) should be supplemented with something like this:

(1′) If the answer to (1) is “yes”, whether subjective anticipations should always be guided by beliefs about relative frequencies.
Hyena 30 Aug 2011 20:49 UTC
2 points
Yes; a million times yes.

I have felt for years that this whole debate was really about linguistic fuzziness and people believing that they can take substantial positions in a debate internal to the language in which it is expressed. And, of course, believing that they have to.

Personally, though, I just use the term “Bayesian” because of the theorem and will continue to do that.
Oscar_Cunningham 30 Aug 2011 16:42 UTC
2 points

I’ll probably try to estimate the parameters θ from x in a way that has the best expected success rate across all possible data sets M would generate.

Isn’t this a Bayesian method? The phrase “best expected” seems like a decent hint that it is. A Frequentist method would try and guaranty something like (minimax) how good your estimate is if θ changes across trials in a maximally inconvenient way.

I know you said not to claim that you were a Bayesian all along, but it seems to me that calculating a risk that depends on θ is just plain the wrong thing to do.
jsteinhardt 3 Sep 2011 4:50 UTC
1 point
Also agree with this breakdown.

I also think hardly anyone labeling themselves a “frequentist” would disagree with your position on (1).