Believing others’ priors

Meet the Bayesians

In one way of look­ing at Bayesian rea­son­ers, there are a bunch of pos­si­ble wor­lds and a bunch of peo­ple, who start out with some guesses about what pos­si­ble world we’re in. Every­one knows ev­ery­one else’s ini­tial guesses. As ev­i­dence comes in, agents change their guesses about which world they’re in via Bayesian up­dat­ing.

The Bayesi­ans can share in­for­ma­tion just by shar­ing how their be­liefs have changed.

“Bob ini­tially thought that last Mon­day would be sunny with prob­a­bil­ity 0.8, but now he thinks it was sunny with prob­a­bil­ity 0.9, so he must have has seen ev­i­dence that he judges as 4/​9ths as likely if it wasn’t sunny than if it was”

If they have the same pri­ors, they’ll con­verge to the same be­liefs. But if they don’t, it seems they can agree to dis­agree. This is a bit frus­trat­ing, be­cause we don’t want peo­ple to ig­nore our very con­vinc­ing ev­i­dence just be­cause they’ve got­ten away with hav­ing a stupid weird prior.

What can we say about which pri­ors are per­mis­si­ble? Robin Han­son offers an ar­gu­ment that we must ei­ther (a) be­lieve our prior was cre­ated by a spe­cial pro­cess that cor­re­lated it with the truth more than ev­ery­one else’s or (b) our prior must be the same as ev­ery­one else’s.

Meet the pre-Bayesians

How does that ar­gu­ment go? Roughly, Han­son de­scribes a slightly more nu­anced set of rea­son­ers: the pre-Bayesi­ans. The pre-Bayesi­ans are not only un­cer­tain about what world they’re in, but also about what ev­ery­one’s pri­ors are.

Th­ese un­cer­tain­ties can be tan­gled to­gether (the joint dis­tri­bu­tion doesn’t have to fac­torise into their be­liefs about ev­ery­one’s pri­ors and their be­liefs about wor­lds). Facts about the world can change their opinions about what prior as­sign­ments peo­ple have.

Han­son then im­poses a pre-ra­tio­nal­ity con­di­tion: if you find out what pri­ors ev­ery­one has, you should agree with your prior about how likely differ­ent wor­lds are. In other words, you should trust your prior in the fu­ture. Once you have this con­di­tion, it seems that it’s im­pos­si­ble to both (a) be­lieve that some other peo­ple’s pri­ors were gen­er­ated in a way that makes them as likely to be good as yours and (b) have differ­ent pri­ors from those peo­ple.

Let’s dig into the sort of things this pre-ra­tio­nal­ity con­di­tion com­mits you to.

Con­sider the class of wor­lds where you are gen­er­ated by a ma­chine that ran­domly gen­er­ates a prior and sticks it in your head. The pre-ra­tio­nal­ity rule says that wor­lds where this ran­domly-gen­er­ated prior de­scribes the world well are more likely than wor­lds where it is a poor de­scrip­tion.

So if I pop out with a very cer­tain be­lief that I have eleven toes, such that no amount of vi­sual ev­i­dence that I have ten toes can shake my faith, the pre-prior should in­deed place more weight on those wor­lds where I have eleven toes and var­i­ous op­ti­cal trick­ery con­spires to make it look like I have ten.

If this seems wor­ry­ing to you, con­sider that you may be ask­ing too much of this pre-ra­tio­nal­ity con­di­tion. After all, if you have a weird prior, you have a weird prior. In the ma­chine-gen­er­at­ing-ran­dom-pri­ors world, you already be­lieve that your prior is a good fit for the world. That’s what it is to have a prior. Yes, ac­cord­ing to our ac­tual pos­te­ri­ors it seems like there should be no cor­re­la­tion be­tween these ran­dom pri­ors and the world they’re in, but ask­ing the pre-ra­tio­nal­ity con­di­tion to make our ac­tual be­liefs win out seems like a pretty illicit move.

Another worry is that it seems there’s some spooky ac­tion-at-a-dis­tance go­ing on be­tween the pre-ra­tio­nal­ity con­di­tion and the as­sign­ment of pri­ors. Once ev­ery­one has their pri­ors, the pre-ra­tio­nal­ity con­di­tion is pow­er­less to change them. So how is the pre-ra­tio­nal­ity con­di­tion mak­ing it so that ev­ery­one has the same prior?

I claim that ac­tu­ally, this pre­sen­ta­tion of the pre-Bayesian proof is not quite right. Ac­cord­ing to me, if I’m a Bayesian and be­lieve our pri­ors are equally good, then we must have the same pri­ors. If I’m a pre-Bayesian and be­lieve our pri­ors are equally good, then I must be­lieve that your prior av­er­ages out to mine. This lat­ter move is open to the pre-Bayesian (who has un­cer­tainty about pri­ors) but not to the Bayesian (who knows the pri­ors).

I’ll make an ar­gu­ment purely within Bayesi­anism for be­liev­ing in equally good pri­ors to hav­ing the same prior, and then we’ll see how be­lief in pri­ors comes in for a pre-Bayesian.

Bayesian prior equality

To get this off the ground, I want to make pre­cise the claim of be­liev­ing some­one’s pri­ors are as good as yours. I’m go­ing to look at 3 ways of do­ing this. Note that Han­son doesn’t sug­gest a par­tic­u­lar one, so he doesn’t have to ac­cept any of these as what he means, and that might change how well my ar­gu­ment works.

Let’s sup­pose my prior is p and yours is q. Note, these are fixed func­tions, not refer­ences point­ing at my prior and your prior. In the Bayesian frame­work, we just have our pri­ors, end of story. We don’t rea­son about cases where our pri­ors were differ­ent.

Let’s sup­pose score is a strictly proper scor­ing rule (if you don’t know what that means, I’ll ex­plain in a mo­ment). score takes in a prob­a­bil­ity dis­tri­bu­tion over a ran­dom vari­able and an ac­tual value for that ran­dom vari­able. It gives more points the more of the prob­a­bil­ity dis­tri­bu­tion’s mass is near the ac­tual value. For it to be strictly proper, I uniquely max­imise my ex­pected score by re­port­ing my true prob­a­bil­ity dis­tri­bu­tion. That is is uniquely max­imised when f = p.

Let’s also sup­pose my pos­te­rior is p|B, that is (us­ing no­ta­tion a bit loosely) my prior prob­a­bil­ity con­di­tioned on some back­ground in­for­ma­tion B.

Here are some at­tempts to pre­cisely claim some­one’s prior is as good as mine:

  1. For all X, .

  2. For all X, .

  3. For all X, .

(1) says that, ac­cord­ing to my prior, your prior is as good as mine. By the defi­ni­tion of a proper scor­ing rule, this means that your prior is the same as mine.

(2) says that, ac­cord­ing to my pos­te­rior, the pos­te­rior you’d have with my cur­rent in­for­ma­tion is as good as the pos­te­rior I have. By the defi­ni­tion of the proper scor­ing rule, this means that your pos­te­rior is equal to my pos­te­rior. This is a bit broader than (1), and al­lows your prior to have already “priced in” some in­for­ma­tion that I now have.

(3) says that given what we know now, your prior was as good as mine.

That rules out q = p|B. That would be a prior that’s bet­ter than mine: it’s just what you get from mine when you’re already cer­tain you’ll ob­serve some ev­i­dence (like an ap­ple fal­ling in 1663). Ob­serv­ing that ev­i­dence doesn’t change your be­liefs.

In gen­eral, it can’t be the case that you pre­dicted B as more likely than me, which can be seen by tak­ing X = B.

On fu­ture events, your prior can match my prior, or di­verge from my pos­te­rior equally as far as my prior, but in the op­po­site di­rec­tion.

I don’t re­ally like 3, be­cause while it ac­cepts that your prior was as good as mine in the past, it can think that af­ter you up­date your prior you’ll still be worse than me.

That leaves us with 1 and 2 then. If 1 or 2 are our pre­cise no­tion, then it fol­lows quickly that we have com­mon pri­ors.

This is just a no­tion of log­i­cal con­sis­tency though; I don’t have room for be­liev­ing that our prior-gen­er­at­ing pro­cesses make yours as likely to be true as mine. It’s just that if the prob­a­bil­ity dis­tri­bu­tion that hap­pens to be your prior ap­pears to me as good as the prob­a­bil­ity dis­tri­bu­tion that hap­pens to be my prior, they are the same prob­a­bil­ity dis­tri­bu­tion.

Pre-Bayesian prior equality

How to make pre-Bayesian claim that your prior is as good as mine?

Here let, pᵢ be my prior as a refer­ence, rather than as a con­crete prob­a­bil­ity dis­tri­bu­tion. Claims about pᵢ are claims about my prior, no mat­ter what func­tion that ac­tu­ally ends up be­ing. So for ex­am­ple, claiming that pᵢ scores well is claiming that as we look at differ­ent wor­lds, we see it is likely that my prior is a well-adapted prior for that spe­cific world. In con­trast, a claim that p scores well would be a claim that the ac­tual world looks a lot like p.

Similarly, pⱼ is your prior as a refer­ence. Let p be a vec­tor as­sign­ing a prior to each agent.

Let f be my pre-prior. That is, my ini­tial be­liefs over com­bi­na­tions of wor­lds and prior as­sign­ments. Similarly to above, let f|B be my pre-pos­te­rior (a bit of an awk­ward term, I ad­mit).

For ease of ex­po­si­tion (and I don’t think en­tirely un­rea­son­ably), I’m go­ing to imag­ine that I know my prior pre­cisely. That is f(w, p) = 0 if pᵢ ≠ p.

Here are some ways of mak­ing the be­lief that your prior is as good as mine pre­cise in the pre-Bayesian frame­work.

  1. For all X, .

  2. For all X, .

  3. For all X, .

On the LHS, the ex­pec­ta­tion uses p rather than f, be­cause of the pre-ra­tio­nal­ity con­di­tion. Know­ing my prior, my up­dated pre-prior agrees with it about the prob­a­bil­ity of the ground events. But I still don’t know your prior, so I have to use f on the RHS to “ex­pect” over the event and your prior it­self.

(1) says that, ac­cord­ing to my pre-prior, your prior is as good as mine in ex­pec­ta­tion. The proper scor­ing rule says that my prior is the unique max­i­mum for a fixed func­tion. But I could, in prin­ci­ple, be­lieve that your prior is bet­ter adapted to each world than my prior, but I’m still not cer­tain which world we’re in (or what your prior is), so I can’t up­date my be­liefs.

Given the equal­ity, I can’t want to switch pri­ors with you in gen­eral, but I could think you have a prior that’s more cor­re­lated with truth than mine in some cases and less so in oth­ers.

(2) says that, ac­cord­ing to my pre-pos­te­rior, your prior con­di­tioned on my info is, in ex­pec­ta­tion, as good as my prior con­di­tioned on my info.

I like this bet­ter than (1). Ev­i­dence in the real world leads me to be­liefs about the prior pro­duc­tion mechanisms (like genes, nur­ture and so on). Th­ese don’t seem to give a good rea­son for my in­nate be­liefs to be bet­ter than any­one else’s. There­fore, I be­lieve your prior is prob­a­bly as good as mine on av­er­age.

But note, I don’t ac­tu­ally know what your prior is. It’s just that I be­lieve we prob­a­bly share similar pri­ors. The spooky ac­tion-at-a-dis­tance is elimi­nated. This is just (again) a claim about con­sis­tent be­liefs: if I be­lieve that your prior got gen­er­ated in a way that made it as good as mine, then I must be­lieve it’s not too di­ver­gent from mine.

  1. says that, given what we now know, I think your prior is no bet­ter or worse than mine in ex­pec­ta­tion. This is about as un­palat­able in the pre-Bayesian as the Bayesian case.

So, on ei­ther (1) or (2), I be­lieve that your prior will, on av­er­age, do as well as mine. I may not be sure what your prior is, but cases where it’s far bet­ter will be matched by cases where it’s far worse. Even know­ing that your prior performs ex­actly as well as mine, I might not know ex­actly which prior you have. I know that all the places it does worse will be matched by an equal weight of places where it does bet­ter, so I can’t ap­peal to my prior as a good rea­son for us to di­verge.