# Believing others’ priors

## Meet the Bayesians

In one way of look­ing at Bayesian rea­son­ers, there are a bunch of pos­si­ble wor­lds and a bunch of peo­ple, who start out with some guesses about what pos­si­ble world we’re in. Every­one knows ev­ery­one else’s ini­tial guesses. As ev­i­dence comes in, agents change their guesses about which world they’re in via Bayesian up­dat­ing.

The Bayesi­ans can share in­for­ma­tion just by shar­ing how their be­liefs have changed.

“Bob ini­tially thought that last Mon­day would be sunny with prob­a­bil­ity 0.8, but now he thinks it was sunny with prob­a­bil­ity 0.9, so he must have has seen ev­i­dence that he judges as 4/​9ths as likely if it wasn’t sunny than if it was”

If they have the same pri­ors, they’ll con­verge to the same be­liefs. But if they don’t, it seems they can agree to dis­agree. This is a bit frus­trat­ing, be­cause we don’t want peo­ple to ig­nore our very con­vinc­ing ev­i­dence just be­cause they’ve got­ten away with hav­ing a stupid weird prior.

What can we say about which pri­ors are per­mis­si­ble? Robin Han­son offers an ar­gu­ment that we must ei­ther (a) be­lieve our prior was cre­ated by a spe­cial pro­cess that cor­re­lated it with the truth more than ev­ery­one else’s or (b) our prior must be the same as ev­ery­one else’s.

## Meet the pre-Bayesians

How does that ar­gu­ment go? Roughly, Han­son de­scribes a slightly more nu­anced set of rea­son­ers: the pre-Bayesi­ans. The pre-Bayesi­ans are not only un­cer­tain about what world they’re in, but also about what ev­ery­one’s pri­ors are.

Th­ese un­cer­tain­ties can be tan­gled to­gether (the joint dis­tri­bu­tion doesn’t have to fac­torise into their be­liefs about ev­ery­one’s pri­ors and their be­liefs about wor­lds). Facts about the world can change their opinions about what prior as­sign­ments peo­ple have.

Han­son then im­poses a pre-ra­tio­nal­ity con­di­tion: if you find out what pri­ors ev­ery­one has, you should agree with your prior about how likely differ­ent wor­lds are. In other words, you should trust your prior in the fu­ture. Once you have this con­di­tion, it seems that it’s im­pos­si­ble to both (a) be­lieve that some other peo­ple’s pri­ors were gen­er­ated in a way that makes them as likely to be good as yours and (b) have differ­ent pri­ors from those peo­ple.

Let’s dig into the sort of things this pre-ra­tio­nal­ity con­di­tion com­mits you to.

Con­sider the class of wor­lds where you are gen­er­ated by a ma­chine that ran­domly gen­er­ates a prior and sticks it in your head. The pre-ra­tio­nal­ity rule says that wor­lds where this ran­domly-gen­er­ated prior de­scribes the world well are more likely than wor­lds where it is a poor de­scrip­tion.

So if I pop out with a very cer­tain be­lief that I have eleven toes, such that no amount of vi­sual ev­i­dence that I have ten toes can shake my faith, the pre-prior should in­deed place more weight on those wor­lds where I have eleven toes and var­i­ous op­ti­cal trick­ery con­spires to make it look like I have ten.

If this seems wor­ry­ing to you, con­sider that you may be ask­ing too much of this pre-ra­tio­nal­ity con­di­tion. After all, if you have a weird prior, you have a weird prior. In the ma­chine-gen­er­at­ing-ran­dom-pri­ors world, you already be­lieve that your prior is a good fit for the world. That’s what it is to have a prior. Yes, ac­cord­ing to our ac­tual pos­te­ri­ors it seems like there should be no cor­re­la­tion be­tween these ran­dom pri­ors and the world they’re in, but ask­ing the pre-ra­tio­nal­ity con­di­tion to make our ac­tual be­liefs win out seems like a pretty illicit move.

Another worry is that it seems there’s some spooky ac­tion-at-a-dis­tance go­ing on be­tween the pre-ra­tio­nal­ity con­di­tion and the as­sign­ment of pri­ors. Once ev­ery­one has their pri­ors, the pre-ra­tio­nal­ity con­di­tion is pow­er­less to change them. So how is the pre-ra­tio­nal­ity con­di­tion mak­ing it so that ev­ery­one has the same prior?

I claim that ac­tu­ally, this pre­sen­ta­tion of the pre-Bayesian proof is not quite right. Ac­cord­ing to me, if I’m a Bayesian and be­lieve our pri­ors are equally good, then we must have the same pri­ors. If I’m a pre-Bayesian and be­lieve our pri­ors are equally good, then I must be­lieve that your prior av­er­ages out to mine. This lat­ter move is open to the pre-Bayesian (who has un­cer­tainty about pri­ors) but not to the Bayesian (who knows the pri­ors).

I’ll make an ar­gu­ment purely within Bayesi­anism for be­liev­ing in equally good pri­ors to hav­ing the same prior, and then we’ll see how be­lief in pri­ors comes in for a pre-Bayesian.

## Bayesian prior equality

To get this off the ground, I want to make pre­cise the claim of be­liev­ing some­one’s pri­ors are as good as yours. I’m go­ing to look at 3 ways of do­ing this. Note that Han­son doesn’t sug­gest a par­tic­u­lar one, so he doesn’t have to ac­cept any of these as what he means, and that might change how well my ar­gu­ment works.

Let’s sup­pose my prior is p and yours is q. Note, these are fixed func­tions, not refer­ences point­ing at my prior and your prior. In the Bayesian frame­work, we just have our pri­ors, end of story. We don’t rea­son about cases where our pri­ors were differ­ent.

Let’s sup­pose score is a strictly proper scor­ing rule (if you don’t know what that means, I’ll ex­plain in a mo­ment). score takes in a prob­a­bil­ity dis­tri­bu­tion over a ran­dom vari­able and an ac­tual value for that ran­dom vari­able. It gives more points the more of the prob­a­bil­ity dis­tri­bu­tion’s mass is near the ac­tual value. For it to be strictly proper, I uniquely max­imise my ex­pected score by re­port­ing my true prob­a­bil­ity dis­tri­bu­tion. That is is uniquely max­imised when f = p.

Let’s also sup­pose my pos­te­rior is p|B, that is (us­ing no­ta­tion a bit loosely) my prior prob­a­bil­ity con­di­tioned on some back­ground in­for­ma­tion B.

Here are some at­tempts to pre­cisely claim some­one’s prior is as good as mine:

1. For all X, .

2. For all X, .

3. For all X, .

(1) says that, ac­cord­ing to my prior, your prior is as good as mine. By the defi­ni­tion of a proper scor­ing rule, this means that your prior is the same as mine.

(2) says that, ac­cord­ing to my pos­te­rior, the pos­te­rior you’d have with my cur­rent in­for­ma­tion is as good as the pos­te­rior I have. By the defi­ni­tion of the proper scor­ing rule, this means that your pos­te­rior is equal to my pos­te­rior. This is a bit broader than (1), and al­lows your prior to have already “priced in” some in­for­ma­tion that I now have.

(3) says that given what we know now, your prior was as good as mine.

That rules out q = p|B. That would be a prior that’s bet­ter than mine: it’s just what you get from mine when you’re already cer­tain you’ll ob­serve some ev­i­dence (like an ap­ple fal­ling in 1663). Ob­serv­ing that ev­i­dence doesn’t change your be­liefs.

In gen­eral, it can’t be the case that you pre­dicted B as more likely than me, which can be seen by tak­ing X = B.

On fu­ture events, your prior can match my prior, or di­verge from my pos­te­rior equally as far as my prior, but in the op­po­site di­rec­tion.

I don’t re­ally like 3, be­cause while it ac­cepts that your prior was as good as mine in the past, it can think that af­ter you up­date your prior you’ll still be worse than me.

That leaves us with 1 and 2 then. If 1 or 2 are our pre­cise no­tion, then it fol­lows quickly that we have com­mon pri­ors.

This is just a no­tion of log­i­cal con­sis­tency though; I don’t have room for be­liev­ing that our prior-gen­er­at­ing pro­cesses make yours as likely to be true as mine. It’s just that if the prob­a­bil­ity dis­tri­bu­tion that hap­pens to be your prior ap­pears to me as good as the prob­a­bil­ity dis­tri­bu­tion that hap­pens to be my prior, they are the same prob­a­bil­ity dis­tri­bu­tion.

## Pre-Bayesian prior equality

How to make pre-Bayesian claim that your prior is as good as mine?

Here let, pᵢ be my prior as a refer­ence, rather than as a con­crete prob­a­bil­ity dis­tri­bu­tion. Claims about pᵢ are claims about my prior, no mat­ter what func­tion that ac­tu­ally ends up be­ing. So for ex­am­ple, claiming that pᵢ scores well is claiming that as we look at differ­ent wor­lds, we see it is likely that my prior is a well-adapted prior for that spe­cific world. In con­trast, a claim that p scores well would be a claim that the ac­tual world looks a lot like p.

Similarly, pⱼ is your prior as a refer­ence. Let p be a vec­tor as­sign­ing a prior to each agent.

Let f be my pre-prior. That is, my ini­tial be­liefs over com­bi­na­tions of wor­lds and prior as­sign­ments. Similarly to above, let f|B be my pre-pos­te­rior (a bit of an awk­ward term, I ad­mit).

For ease of ex­po­si­tion (and I don’t think en­tirely un­rea­son­ably), I’m go­ing to imag­ine that I know my prior pre­cisely. That is f(w, p) = 0 if pᵢ ≠ p.

Here are some ways of mak­ing the be­lief that your prior is as good as mine pre­cise in the pre-Bayesian frame­work.

1. For all X, .

2. For all X, .

3. For all X, .

On the LHS, the ex­pec­ta­tion uses p rather than f, be­cause of the pre-ra­tio­nal­ity con­di­tion. Know­ing my prior, my up­dated pre-prior agrees with it about the prob­a­bil­ity of the ground events. But I still don’t know your prior, so I have to use f on the RHS to “ex­pect” over the event and your prior it­self.

(1) says that, ac­cord­ing to my pre-prior, your prior is as good as mine in ex­pec­ta­tion. The proper scor­ing rule says that my prior is the unique max­i­mum for a fixed func­tion. But I could, in prin­ci­ple, be­lieve that your prior is bet­ter adapted to each world than my prior, but I’m still not cer­tain which world we’re in (or what your prior is), so I can’t up­date my be­liefs.

Given the equal­ity, I can’t want to switch pri­ors with you in gen­eral, but I could think you have a prior that’s more cor­re­lated with truth than mine in some cases and less so in oth­ers.

(2) says that, ac­cord­ing to my pre-pos­te­rior, your prior con­di­tioned on my info is, in ex­pec­ta­tion, as good as my prior con­di­tioned on my info.

I like this bet­ter than (1). Ev­i­dence in the real world leads me to be­liefs about the prior pro­duc­tion mechanisms (like genes, nur­ture and so on). Th­ese don’t seem to give a good rea­son for my in­nate be­liefs to be bet­ter than any­one else’s. There­fore, I be­lieve your prior is prob­a­bly as good as mine on av­er­age.

But note, I don’t ac­tu­ally know what your prior is. It’s just that I be­lieve we prob­a­bly share similar pri­ors. The spooky ac­tion-at-a-dis­tance is elimi­nated. This is just (again) a claim about con­sis­tent be­liefs: if I be­lieve that your prior got gen­er­ated in a way that made it as good as mine, then I must be­lieve it’s not too di­ver­gent from mine.

1. says that, given what we now know, I think your prior is no bet­ter or worse than mine in ex­pec­ta­tion. This is about as un­palat­able in the pre-Bayesian as the Bayesian case.

So, on ei­ther (1) or (2), I be­lieve that your prior will, on av­er­age, do as well as mine. I may not be sure what your prior is, but cases where it’s far bet­ter will be matched by cases where it’s far worse. Even know­ing that your prior performs ex­actly as well as mine, I might not know ex­actly which prior you have. I know that all the places it does worse will be matched by an equal weight of places where it does bet­ter, so I can’t ap­peal to my prior as a good rea­son for us to di­verge.

• A bit of a side point:

So if I pop out with a very cer­tain be­lief that I have eleven toes, such that no amount of vi­sual ev­i­dence that I have ten toes can shake my faith, the pre-prior should in­deed place more weight on those wor­lds where I have eleven toes and var­i­ous op­ti­cal trick­ery con­spires to make it look like I have ten.

We all have these blind spots. The one about free will is very com­mon. Physics says there is no such thing, all your de­ci­sions are ei­ther pre­de­ter­mined, or ran­dom or chaotic, yet most of this site is about mak­ing ra­tio­nal de­ci­sions. We all be­lieve in 11 toes.

• Sup­pose I re­al­ized the truth of what you say, and came to be­lieve that I have no free will. How would I go about act­ing on this be­lief?

Isn’t the an­swer to that ques­tion “this ques­tion is in­co­her­ent; you might do some things as a de­ter­minis­tic con­se­quence of com­ing to hold this new be­lief, but you have no abil­ity to choose any ac­tions differ­ently on that ba­sis”?

And if there is no way for me to act on a be­lief, in what sense can that be­lief be said to have con­tent?

• I tried to ex­press my thoughts on the topic in one of the posts. In­stead of think­ing of our­selves as agents with free will, we can think of agents as ob­servers of the world un­fold­ing be­fore them, con­tin­u­ously ad­just­ing their in­ter­nal mod­els of it. In this ap­proach there is no such thing as a log­i­cal coun­ter­fac­tual, all seem­ing coun­ter­fac­tu­als are ar­ti­facts of the ob­server’s map not re­flect­ing the ter­ri­tory faith­fully enough, so two differ­ent agents look like the same agent able to make sep­a­rate de­ci­sions. I am acutely aware of the irony and the ul­ti­mate fu­til­ity (or, at least, in­con­sis­tency) of one col­lec­tion of quan­tum fields (me) try­ing to con­vince ( =change the state of) an­other col­lec­tion of quan­tum fields (you) as if there was any­thing other than phys­i­cal pro­cesses in­volved, but it’s not like I have a choice in the mat­ter.

• Physics says there is no such thing, all your de­ci­sions are ei­ther pre­de­ter­mined, or ran­dom or chaotic

No, free will is not a topic cov­ered in physics text­books or lec­tures. You are ap­peal­ing to an im­plicit defi­ni­tion of free will that liber­tar­i­ans don’t ac­cept.

• it’s not cov­ered, no, it’s not a physics topic at all. I wish I could imag­ine the mechanism of top-down cau­sa­tion that cre­ates non-illu­sion­ary free will that is im­plicit in MIRI’s work. The most earnest non-BS at­tempt I have seen so far is Scott Aaron­son’s The Ghost in the Quan­tum Tur­ing Ma­chine, and he ba­si­cally con­cedes that, bar­ring an ex­otic freebit mechanism, we are all quan­tum au­toma­tons.

• I think Sean Car­roll does a pretty good job, e.g. in Free Will Is As Real As Base­ball.

• I like that post a lot, too. Put­ting com­pat­i­bil­ism into the con­text of mod­ern physics. One does not need to worry about physics when deal­ing with hu­man in­ter­ac­tions, and vice versa. Those are differ­ent lev­els of ab­strac­tion, differ­ent mod­els. The prob­lem arises when one goes out­side the do­main of val­idity of a given model with­out re­al­iz­ing it. And I see the AI al­ign­ment work as cross­ing that bound­ary. Coun­ter­fac­tu­als are a perfectly fine con­cept when look­ing to make bet­ter de­ci­sions. They are a hin­drance when try­ing to prove the­o­rems about de­ci­sion mak­ing. Hence my origi­nal point about blind spots.

• You are telescop­ing a bunch of is­sues there. It is not at all clear that top-down cau­sa­tion is needed for liber­tar­ian free will, for in­stance. And MIRI says free will is illu­sory.

• I would like to see at least some ideas as to how agency can arise with­out top-down cau­sa­tion.

• You are as­sum­ing some re­la­tion­ship be­tween agency and free will that has not been spelt out. Also, an en­tirely woo-free no­tion of agency is a ubiquitous topic on this site, as has been pointed out to you be­fore.

• I must have missed it or it didn’t make sense to me...

• In my view, agency is sort of like life—it’s hard to define it­self, but the re­sults are fairly ob­vi­ous. Life tends to spread to fill all pos­si­ble niches. Agents tend to steer the world to­ward cer­tain states. But this only shows that you don’t need top-down cau­sa­tion to no­tice an agent (ig­nor­ing how, hy­po­thet­i­cally, “no­tice” is a top-down pro­cess; what it means is that “is this an agent” is a fairly ob­jec­tive and non-agenty de­ci­sion prob­lem).

How can you af­fect lower lev­els by do­ing things on higher lev­els? By “do­ing things on a higher level”, what you are re­ally do­ing is chang­ing the lower level so that it ap­pears a cer­tain way on a higher level.

If what you say is cor­rect, we should ex­pect MIRI to claim that two atom-level iden­ti­cal com­put­ers could nonethe­less differ in Agency. I strongly pre­dict the op­po­site—MIRI’s view­point is re­duc­tion­ist and phys­i­cal­ist, to the best of my knowl­edge.

• I am not deny­ing that the more com­plex an an­i­mal is, the more agency it ap­pears to pos­sess. On the other hand, the more we know about an agent, the less agency it ap­pears to pos­sess. What an­cients thought of as agent-driven be­hav­ior, we see as nat­u­ral phe­nom­ena not as­so­ci­ated with free will or de­ci­sion mak­ing. We still tend to an­thro­po­mor­phize nat­u­ral phe­nom­ena a lot (e.g. evolu­tion, for­tune), of­ten im­plic­itly as­sign­ing agency to them with­out re­al­iz­ing or ad­mit­ting it. Tele­ol­ogy can at times be a use­ful model, of course, even in physics, but es­pe­cially in pro­gram­ming.

It is also ought to be ob­vi­ous, but isn’t, that there is a big dis­con­nect be­tween “I de­cide” and “I am an al­gorithm”. You can of­ten read here and even in MIRI pa­pers that agents can act con­trary to their pro­gram­ming (that’s where coun­ter­fac­tu­als show up). A quote from Abram Dem­ski:

Sup­pose you know that you take the $10. How do you rea­son about what would hap­pen if you took the$5 in­stead?

Your pre­dic­tion, as far as I can tell, has been falsified. An agent mag­i­cally steps away from its own pro­gram­ming by think­ing about coun­ter­fac­tu­als.

• It’s been pro­grammed to think about coun­ter­fac­tu­als.

• Either you are right that “you know that you take the $10”, or you are mis­taken about this knowl­edge, not both, un­less you sub­scribe to the model of chang­ing a pro­gram­ming cer­tainty. • 1. Are you say­ing that the idea of a coun­ter­fac­tual in­her­ently re­quires Tran­scend­ing Pro­gram­ming, or that think­ing about per­sonal coun­ter­fac­tu­als re­quires ig­nor­ing the fact that you are pro­grammed? 2. Coun­ter­fac­tu­als aren’t real. They do not cor­re­spond to log­i­cal pos­si­bil­ities. That is what the word means—they are “counter” to “fact”. But in the ab­sence of perfect self-knowl­edge, and even in the knowl­edge that one is fully de­ter­minis­tic, you can still not know what you are go­ing to do. So you’re not re­quired to think about some­thing that you know would re­quire Tran­scend­ing Pro­gram­ming, even if it is ob­jec­tively the case that you would have to Tran­scend Pro­gram­ming to do that in re­al­ity. • I posted about it be­fore here. Log­i­cal Coun­ter­fac­tu­als are low-res. I think you are say­ing the same thing here. And yes, an­a­lyz­ing one’s own de­ci­sion-mak­ing al­gorithms and ad­just­ing them can be very use­ful. How­ever, Ab­tam’s state­ment, as I un­der­stand it, does not have the ex­plicit qual­ifier of in­com­plete knowl­edge of self. Quite the op­po­site, it says “Sup­pose you know that you take the$10”, not “You start with a first ap­prox­i­ma­tion that you take $10 and then ex­plore fur­ther”. • You’re right—I didn’t see my con­fu­sion be­fore, but Dem­ski’s views don’t ac­tu­ally make much sense to me. The agent knows for cer­tain that it will take$X? How can it know that with­out simu­lat­ing its de­ci­sion pro­cess? But if “simu­late what my de­ci­sion pro­cess is, then use that as the ba­sis for coun­ter­fac­tu­als” is part of the de­ci­sion pro­cess, you’d get in­finite regress. (Pos­si­ble con­nec­tion to fixed points?)

I don’t think Dem­ski is say­ing that the agent would mag­i­cally jump from tak­ing $X to tak­ing$Y. I think he’s say­ing that agents which fully un­der­stand their own be­hav­ior would be trapped by this knowl­edge be­cause they can no longer form “rea­son­able” coun­ter­fac­tu­als. I don’t think he’d claim that Agent­hood can over­ride fun­da­men­tal physics, and I don’t see how you’re ar­gu­ing that his be­liefs, un­be­knownst to him, are based on the as­sump­tion that Agent­hood can over­ride fun­da­men­tal physics.

• I can­not read his mind, odds are, I mis­in­ter­preted what he meant. But if MIRI doesn’t think that coun­ter­fac­tu­als as they ap­pear to be (“I could have made a differ­ent de­ci­sion but didn’t, by choice”) are fun­da­men­tal, then I would ex­pect a care­ful anal­y­sis of that is­sue some­where. Maybe I missed it. I have posted on a re­lated topic some five months ago, and had some in­ter­est­ing feed­back from jes­si­cata (Jes­sica Tailor of MIRI) in the com­ments.