Radical Probabilism

This is an ex­panded ver­sion of my talk. I as­sume a high de­gree of fa­mil­iar­ity with Bayesian prob­a­bil­ity the­ory.

Toward a New Tech­ni­cal Ex­pla­na­tion of Tech­ni­cal Ex­pla­na­tion—an at­tempt to con­vey the prac­ti­cal im­pli­ca­tions of log­i­cal in­duc­tion—was one of my most-ap­pre­ci­ated posts, but I don’t re­ally get the feel­ing that very many peo­ple have re­ceived the up­date. Granted, that post was spec­u­la­tive, sketch­ing what a new tech­ni­cal ex­pla­na­tion of tech­ni­cal ex­pla­na­tion might look like. I think I can do a bit bet­ter now.

If the im­plied pro­ject of that post had re­ally been com­pleted, I would ex­pect new prac­ti­cal prob­a­bil­is­tic rea­son­ing tools, ex­plic­itly vi­o­lat­ing Bayes’ law. For ex­am­ple, we might ex­pect:

  • A new ver­sion of in­for­ma­tion the­ory.

    • An up­date to the “pre­dic­tion=com­pres­sion” maxim, ei­ther re­pairing it to in­cor­po­rate the new cases, or ex­plic­itly deny­ing it and pro­vid­ing a good in­tu­itive ac­count of why it was wrong.

    • A new ac­count of con­cepts such as mu­tual in­for­ma­tion, al­low­ing for the fact that vari­ables have be­hav­ior over think­ing time; for ex­am­ple, vari­ables may ini­tially be very cor­re­lated, but lose cor­re­la­tion as our pic­ture of each vari­able be­comes more de­tailed.

  • New ways of think­ing about episte­mol­ogy.

    • One thing that my post did man­age to do was to spell out the im­por­tance of “mak­ing ad­vanced pre­dic­tions”, a facet of episte­mol­ogy which Bayesian think­ing does not do jus­tice to.

    • How­ever, I left as­pects of the prob­lem of old ev­i­dence open, rather than giv­ing a com­plete way to think about it.

  • New prob­a­bil­is­tic struc­tures.

    • Bayesian Net­works are one re­ally nice way to cap­ture the struc­ture of prob­a­bil­ity dis­tri­bu­tions, mak­ing them much eas­ier to rea­son about. Is there any­thing similar for the new, wider space of prob­a­bil­is­tic rea­son­ing which has been opened up?

Un­for­tu­nately, I still don’t have any of those things to offer. The aim of this post is more hum­ble. I think what I origi­nally wrote was too am­bi­tious for di­dac­tic pur­poses. Where the pre­vi­ous post aimed to com­mu­ni­cate the in­sights of log­i­cal in­duc­tion by sketch­ing broad im­pli­ca­tions, I here aim to com­mu­ni­cate the in­sights in them­selves, fo­cus­ing on the de­tailed differ­ences be­tween clas­si­cal Bayesian rea­son­ing and the new space of ways to rea­son.

Rather than talk­ing about log­i­cal in­duc­tion di­rectly, I’m mainly go­ing to ex­plain things in terms of a very similar philos­o­phy which Richard Jeffrey in­vented—ap­par­ently start­ing with his phd dis­ser­ta­tion in the 50s, al­though I’m un­able to get my hands on it or other early refer­ences to see how fleshed-out the view was at that point. He called this philos­o­phy rad­i­cal prob­a­bil­ism. Un­like log­i­cal in­duc­tion, rad­i­cal prob­a­bil­ism ap­pears not to have any roots in wor­ries about log­i­cal un­cer­tainty or bounded ra­tio­nal­ity. In­stead it ap­pears to be mo­ti­vated sim­ply by a de­sire to gen­er­al­ize, and a re­fusal to ac­cept un­jus­tified as­sump­tions. Nonethe­less, it car­ries most of the same in­sights.

Rad­i­cal Prob­a­bil­ism has not been very con­cerned with com­pu­ta­tional is­sues, and so con­struct­ing an ac­tual al­gorithm (like the log­i­cal in­duc­tion al­gorithm) has not been a fo­cus. (How­ever, there have been some de­vel­op­ments—see his­tor­i­cal notes at the end.) This could be seen as a weak­ness. How­ever, for the pur­pose of com­mu­ni­cat­ing the core in­sights, I think this is a strength—there are fewer tech­ni­cal de­tails to com­mu­ni­cate.

A ter­minolog­i­cal note: I will use “rad­i­cal prob­a­bil­ism” to re­fer to the new the­ory of ra­tio­nal­ity (treat­ing log­i­cal in­duc­tion as merely a spe­cific way to flesh out Jeffrey’s the­ory). I’m more con­flicted about how to re­fer to the older the­ory. I’m tempted to just use the term “Bayesian”, im­ply­ing that the new the­ory is non-Bayesian—this high­lights its re­jec­tion of Bayesian up­dates. How­ever, rad­i­cal prob­a­bil­ism is Bayesian in the most im­por­tant sense. Bayesi­anism is not about Bayes’ Law. Bayesi­anism is, at core, about the sub­jec­tivist in­ter­pre­ta­tion of prob­a­bil­ity. Rad­i­cal prob­a­bil­ism is, if any­thing, much more sub­jec­tivist.

How­ever, this choice of ter­minol­ogy makes for a con­fu­sion which read­ers (and my­self) will have to care­fully avoid: con­fu­sion be­tween Bayesian prob­a­bil­ity the­ory and Bayesian up­dates. The way I’m us­ing the term, a Bayesian need not en­dorse Bayesian up­dates.

In any case, I’ll de­fault to Jeffrey’s term for the op­pos­ing view­point: dog­matic prob­a­bil­ism. (I will oc­ca­sion­ally fall into call­ing it “clas­si­cal Bayesi­anism” or similar.)

What Is Dog­matic Prob­a­bil­ism?

Dog­matic Prob­a­bil­ism is the doc­trine that the con­di­tional prob­a­bil­ity is also how we up­date prob­a­bil­is­tic be­liefs: any ra­tio­nal change in be­liefs should be ex­plained by a Bayesian up­date.

We can un­pack this a lit­tle:

  1. (Dy­namic Belief:) A ra­tio­nal agent is un­der­stood to have differ­ent be­liefs over time—call these

  2. (Static Ra­tion­al­ity:) At any one time, a ra­tio­nal agent’s be­liefs are prob­a­bil­is­ti­cally co­her­ent (obey the Kol­mogorov ax­ioms, or a similar ax­iom­a­ti­za­tion of prob­a­bil­ity the­ory).

  3. (Em­piri­cism:) Rea­sons for chang­ing be­liefs across time are given en­tirely by ob­ser­va­tions—that is, propo­si­tions which the agent learns.

  4. (Dog­ma­tism of Per­cep­tion:) Ob­ser­va­tions are be­lieved with prob­a­bil­ity one, once learned.

  5. (Rigidity:) Upon ob­serv­ing a propo­si­tion , con­di­tional prob­a­bil­ities are un­mod­ified.

The as­sump­tions minus em­piri­cism im­ply that an up­date on ob­serv­ing is a Bayesian up­date: if we start with and up­date on to get , then must equal 1, and . So we must have . Then, em­piri­cism says that this is the only kind of up­date we can pos­si­bly have.

What Is Rad­i­cal Prob­a­bil­ism?

Rad­i­cal prob­a­bil­ism ac­cepts as­sump­tions #1 and #2, but re­jects the rest. (Log­i­cal In­duc­tion need not fol­low ax­iom #2, ei­ther, since be­liefs at any given time only ap­prox­i­mately fol­low the prob­a­bil­ity laws—how­ever, it’s not nec­es­sary to dis­cuss this com­pli­ca­tion here. Jeffrey’s philos­o­phy did not at­tempt to tackle such things.)

Jeffrey seemed un­com­fortable with up­dat­ing to 100% on any­thing, mak­ing dog­ma­tism of per­cep­tion un­ten­able. A similar view is already pop­u­lar on LessWrong, but it seems that no one here took the im­pli­ca­tion and de­nied Bayesian up­dates as a re­sult. (Bayesian up­dates have been ques­tioned for other rea­sons, of course.) This is a bit of an em­barass­ment. But fans of Bayesian up­dates read­ing this are more likely to ac­cept that zero and one are prob­a­bil­ities, rather than give up Bayes.

For­tu­nately, this isn’t ac­tu­ally the crux. Rad­i­cal prob­a­bil­ism is a pure gen­er­al­iza­tion of or­tho­dox Bayesi­anism; you can have zero and one as prob­a­bil­ities, and still be a rad­i­cal prob­a­bil­ist. The real fun be­gins not with the re­jec­tion of dog­ma­tism of per­cep­tion, but with the re­jec­tion of rigidity and em­piri­cism.

This gives us a view in which a ra­tio­nal up­date from to can be al­most any­thing. (You still can’t up­date from to .) Sim­ply put, you are al­lowed to change your mind. This doesn’t make you ir­ra­tional.

Yet, there are still some ra­tio­nal­ity con­straints. In fact, we can say a lot about how ra­tio­nal agents think in this model. In place of as­sump­tions #3-#5, we as­sume ra­tio­nal agents can­not be Dutch Booked.

Rad­i­cal Prob­a­bil­ism and Dutch Books

Re­ject­ing the Dutch Book for Bayesian Updates

At this point, if you’re fa­mil­iar with the philos­o­phy of prob­a­bil­ity the­ory, you might be think­ing: wait a minute, isn’t there a Dutch Book ar­gu­ment for Bayesian up­dates? If rad­i­cal prob­a­bil­ism ac­cepts the val­idity of Dutch Book ar­gu­ments, shouldn’t it thereby be forced into Bayesian up­dates?

No!

As it turns out, there is a ma­jor flaw in the Dutch Book for Bayesian up­dates. The ar­gu­ment as­sumes that the bookie knows how the agent will up­date. (I en­courage the in­ter­ested reader to read the SEP sec­tion on di­achronic Dutch Book ar­gu­ments for de­tails.) Nor­mally, a Dutch Book ar­gu­ment re­quires the bookie to be ig­no­rant. It’s no sur­prise if a bookie can take our lunch money by get­ting us to agree to bets when the bookie knows some­thing we don’t know. So what’s ac­tu­ally es­tab­lished by these ar­gu­ments is: if you know how you’re go­ing to up­date, then your up­date had bet­ter be Bayesian.

Ac­tu­ally, that’s not quite right: the ar­gu­ment for Bayesian up­dates also still as­sumes dog­ma­tism of per­cep­tion. If we re­lax that as­sump­tion, all we can re­ally ar­gue for is rigidity: if you know how you are go­ing to up­date, then your up­date had bet­ter be rigid.

This leads to a gen­er­al­ized up­date rule, called Jeffrey up­dat­ing (or Jeffrey con­di­tion­ing).

Gen­er­al­ized Updates

Jeffrey up­dates keep the rigidity as­sump­tion, but re­ject dog­ma­tism of per­cep­tion. So, we’re chang­ing the prob­a­bil­ity of some sen­tence to , with­out chang­ing any . There’s only one way to do this:

In other words, the Jeffrey up­date in­ter­po­lates lin­early be­tween the Bayesian up­date on and the Bayesian up­date on . This gen­er­al­izes Bayesian up­dates to al­low for un­cer­tain ev­i­dence: we’re not sure we just saw some­one duck be­hind the cor­ner, but we’re 40% sure.

If this way of up­dat­ing seems a bit ar­bi­trary to you, Jeffrey would agree. It offers only a small gen­er­al­iza­tion of Bayes. Jeffrey wants to open up much broader space:

Clas­sify­ing up­dates by as­sump­tions made.

As I’ve already said, the rigidity as­sump­tion can only be jus­tified if the agent knows how it will up­date. Philoso­phers like to say the agent has a plan for up­dat­ing: “If I saw a UFO land in my yard and lit­tle green men come out, I would be­lieve I was hal­lu­ci­nat­ing.” This is some­thing we’ve worked out ahead of time.

A non-rigid up­date, on the other hand, means you don’t know how you’d re­act: “If I saw a con­vinc­ing proof of P=NP, I wouldn’t know what to think. I’d have to con­sider it care­fully.” I’ll call non-rigid up­dates fluid up­dates.

For me, fluid up­dates are pri­mar­ily about hav­ing longer to think, and reach­ing bet­ter con­clu­sions as a re­sult. That’s be­cause my main mo­ti­va­tion for ac­cept­ing a rad­i­cal-prob­a­bil­ist view is log­i­cal un­cer­tainty. Without such a mo­ti­va­tion, I can’t re­ally imag­ine be­ing very in­ter­ested. I bog­gle at the fact that Jeffrey ar­rived at this view with­out such a mo­ti­va­tion.

Dog­matic Prob­a­bil­ist: All I can say is: why??

Richard Jeffrey: I’ve ex­plained to you how the Dutch Book for Bayesian up­dates fails. What more do you want? My view is sim­ply what you get when you re­move the faulty as­sump­tions and keep the rest.

Dog­matic Prob­a­bil­ist (DP): I un­der­stand that, but why should any­one be in­ter­ested in this the­ory? OK, sure, I CAN make Jeffrey up­dates with­out get­ting Dutch Booked. But why ever would I? If I see a cloth in dim light­ing, and up­date to 80% con­fi­dent the cloth is red, I up­date in that way be­cause of the ev­i­dence which I’ve seen, which is it­self fully con­fi­dent. How could it be any other way?

Richard Jeffrey (RJ): Tell me one pe­ice of in­for­ma­tion you’re ab­solutely cer­tain of in such a situ­a­tion.

DP: I’m cer­tain I had that ex­pe­rience, of look­ing at the cloth.

RJ: Surely you aren’t 100% sure you were look­ing at cloth. It’s merely very prob­a­ble.

DP: Fine then. The ex­pe­rience of look­ing at … what I was look­ing at.

RJ: I’ll grant you that tau­tolo­gies have prob­a­bil­ity one.

DP: It’s not a tau­tol­ogy… it’s the fact that I had an ex­pe­rience, rather than none!

RJ: OK, but you are try­ing to defend the po­si­tion that there is some ob­ser­va­tion, which you con­di­tion on, which ex­plains your 80% con­fi­dence the cloth is red. Con­di­tion­ing on “I had an ex­pe­rience, rather than none” won’t do that. What propo­si­tion are you con­fi­dent in, which ex­plains your less-con­fi­dent up­dates?

DP: The pho­tons hit­ting my reti­nas, which I di­rectly ex­pe­rience.

RJ: Surely not. You don’t have any de­tailed knowl­edge of that.

DP: OK, fine, the in­di­vi­d­ual rods and cones.

RJ: I doubt that. Within the retina, be­fore any mes­sage gets sent to the brain, these get put through an op­po­nent pro­cess which sharp­ens the con­trast and col­ors. You’re not per­ceiv­ing rods and cones di­rectly, but rather a prob­a­bil­is­tic guess at light con­di­tions based on rod and cone ac­ti­va­tion.

DP: The out­put of that pro­cess, then.

RJ: Again I doubt it. You’re en­gag­ing in in­ner-outer ho­cus pocus.* There is no clean di­vid­ing line be­fore which a sig­nal is ex­ter­nal, and af­ter which that sig­nal has been “ob­served”. The op­tic nerve is a noisy chan­nel, warp­ing the sig­nal. And the out­put of the op­tic nerve it­self gets pro­cessed at V1, so the rest of your vi­sual pro­cess­ing doesn’t get di­rect ac­cess to it, but rather a pro­cessed ver­sion of the in­for­ma­tion. And all this pro­cess­ing is noisy. Nowhere is any­thing cer­tain. Every­thing is a guess. If, any­where in the brain, there were a sharp 100% ob­ser­va­tion, then the nerves car­ry­ing that sig­nal to other parts of the brain would rapidly turn it into a 99% ob­ser­va­tion, or a 90% ob­ser­va­tion...

DP: I be­gin to sus­pect you are try­ing to de­scribe hu­man fal­li­bil­ity rather than ideal ra­tio­nal­ity.

RJ: Not so! I’m de­scribing how to ra­tio­nally deal with un­cer­tain ob­ser­va­tions. The source of this un­cer­tainty could be any­thing. I’m merely giv­ing hu­man ex­am­ples to es­tab­lish that the the­ory has prac­ti­cal in­ter­est for hu­mans. The the­ory it­self only throws out un­nec­es­sary as­sump­tions from the usual the­ory of ra­tio­nal­ity—as we’ve already dis­cussed.

DP: (sigh...) OK. I’m still never go­ing to de­sign an ar­tifi­cial in­tel­li­gence to have un­cer­tain ob­ser­va­tions. It just doesn’t seem like some­thing you do on pur­pose. But let’s grant, pro­vi­sion­ally, that ra­tio­nal agents could do so and still be called ra­tio­nal.

RJ: Great.

DP: So what’s this about giv­ing up rigidity??

RJ: It’s the same story: it’s just an­other as­sump­tion we don’t need.

DP: Right, but then how do we up­date?

RJ: How­ever we want.

DP: Right, but how? I want a con­struc­tive story for where my up­dates come from.

RJ: Well, if you pre­com­mit to up­date in a pre­dictable fash­ion, you’ll be Dutch-Book­able un­less it’s a rigid fash­ion.

DP: So you ad­mit it! Up­dates need to be rigid!

RJ: By no means!

DP: But up­dates need to come from some­where. Whether you know it or not, there’s some mechanism in your brain which pro­duces the up­dates.

RJ: Whether you know it or not is a crit­i­cal fac­tor. Up­dates you can’t an­ti­ci­pate need not be Bayesian.

DP: Right, but… the point of episte­mol­ogy is to give guidance about form­ing ra­tio­nal be­liefs. So you should provide some for­mula for up­dat­ing. But any for­mula is pre­dictable. So a for­mula has to satisfy the rigidity con­di­tion. So it’s got to be a Bayesian up­date, or at least a Jeffrey up­date. Right?

RJ: I see the con­fu­sion. But episte­mol­ogy does not have to re­duce things to a strict for­mula in or­der to provide use­ful ad­vice. Rad­i­cal prob­a­bil­ism can still say many use­ful things. In­deed, I think it’s more use­ful, since it’s closer to real hu­man ex­pe­rience. Hu­mans can’t always ac­count for why they change their minds. They’ve up­dated, but they can’t give any ac­count of where it came from.

DP: OK… but… I’m sure as hell never de­sign­ing an ar­tifi­cial in­tel­li­gence that way.

I hope you see what I mean. It’s all ter­ribly un­in­ter­est­ing to a typ­i­cal Bayesian, es­pe­cially with the de­sign of ar­tifi­cial agents in mind. Why con­sider un­cer­tainty about ev­i­dence? Why study up­dates which don’t obey any con­crete up­date rules? What would it even mean for an ar­tifi­cial in­tel­li­gence to be de­signed with such up­dates?

In the light of log­i­cal un­cer­tainty, how­ever, it all be­comes well-mo­ti­vated. Up­dates are un­pre­dictable not be­cause there’s no rule be­hind them—nor be­cause we lack knowl­edge of what ex­actly that rule is—but be­cause we can’t always an­ti­ci­pate the re­sults of com­pu­ta­tions be­fore we finish run­ning them. There are up­dates with­out cor­re­spond­ing ev­i­dence be­cause we can think longer to reach bet­ter con­clu­sions, and do­ing so does not re­duce to Bayesian con­di­tion­ing on the out­put of some com­pu­ta­tion. This doesn’t im­ply un­cer­tain ev­i­dence in ex­actly Jeffrey’s sense, but it does give us cases where we up­date spe­cific propo­si­tions to con­fi­dence lev­els other than 100%, and want to know how to move other be­liefs in re­sponse. For ex­am­ple, we might ap­ply a heuris­tic to de­ter­mine that some num­ber is very very likely to be prime, and up­date on this in­for­ma­tion.

Still, I’m very im­pressed with Jeffrey for reach­ing so many of the right con­clu­sions with­out this mo­ti­va­tion.

Other Ra­tion­al­ity Properties

So far, I’ve em­pha­sized that fluid up­dates “can be al­most any­thing”. This makes it sound as if there are es­sen­tially no ra­tio­nal­ity con­straints at all! How­ever, this is far from true. We can es­tab­lish some very im­por­tant prop­er­ties via Dutch Book.

Convergence

No sin­gle up­date can be con­demned as ir­ra­tional. How­ever, if you keep chang­ing your mind again and again with­out ever set­tling down, that is ir­ra­tional. Ra­tional be­liefs are re­quired to even­tu­ally move less and less, con­verg­ing to a sin­gle value.

Proof: If there ex­ists a point which your be­liefs for­ever os­cillate around (that is, your be­lief falls above in­finitely of­ten, and falls be­low in­finitely of­ten, for some ) then a bookie can make money off of you as fol­lows: when your be­lief is be­low , the bookie makes a bet in fa­vor of the propo­si­tion in ques­tion, at odds. When your be­lief is above , the bookie offers to can­cel that bet for a small fee. The bookie earns the fee with cer­tainty, since your be­liefs are sure to swing down even­tu­ally (al­low­ing the bet to be placed) and are sure to swing up some time af­ter that (al­low­ing the fee to be col­lected). What’s more, the bookie can do this again and again and again, turn­ing you into a money pump.

If there ex­ists no such , then your be­liefs must con­verge to some value.

Caveat: this is the proof in the con­text of log­i­cal in­duc­tion. There are other ways to es­tab­lish con­ver­gence in other for­mal­iza­tions of rad­i­cal prob­a­bil­ism.

In any case, this is re­ally im­por­tant. This isn’t just a nice ra­tio­nal­ity prop­erty. It’s a nice ra­tio­nal­ity prop­erty which dog­matic pro­bil­ists don’t have. Lack of a con­ver­gence guaran­tee is one of the main crit­i­cisms Fre­quen­tists make of Bayesian up­dates. And it’s a good cri­tique!

Con­sider a sim­ple coin-toss­ing sce­nario, in which we have two hy­pothe­ses: posits that the prob­a­bil­ity of heads is , and posits that the prob­a­bil­ity of heads is . The prior places prob­a­bil­ity on both of these hy­pothe­ses. The only prob­lem is that the true coin prob­a­bil­ity is . What hap­pens? The prob­a­bil­ities and will os­cillate for­ever with­out con­verg­ing.

Proof: The quan­tity will take a ran­dom walk as we keep flip­ping the fair coin. A ran­dom walk re­turns to zero in­finitely of­ten (a phe­nomenon known as gam­bler’s ruin). At each such point, ev­i­dence is evenly bal­anced be­tween the two hy­pothe­ses, so we’ve re­turned to the prior. Then, the next flip is ei­ther heads or tails. This re­sults in a prob­a­bil­ity of for one of the hy­pothe­ses, and for the other. This se­quence of events hap­pens in­finitely of­ten, so and keep ex­pe­rienc­ing changes of size at least , never set­tling down.

Now, the ob­jec­tion to Bayesian up­dates here isn’t just that os­cillat­ing for­ever looks ir­ra­tional. Bayesian up­dates are sup­posed to help us pre­dict the data well; in par­tic­u­lar, you might think they’re sup­posed to help us min­i­mize log-loss. But here, we would be do­ing much bet­ter if be­liefs would con­verge to­ward . The prob­lem is, Bayes takes each new bit of ev­i­dence just as se­ri­ously as the last. Really, though, a ra­tio­nal agent in this situ­a­tion should be say­ing: “Ugh, this again! If I send my prob­a­bil­ity up, it’ll come crash­ing right back down some time later. I should skip all the has­sle and keep my prob­a­bil­ity close to where it is.”

In other words, a ra­tio­nal agent should be look­ing out for Dutch Books against it­self, in­clud­ing the non-con­ver­gence Dutch Book. Its prob­a­bil­ities should be ad­justed to avoid such Dutch Books.

DP: Why should I be both­ered by this ex­am­ple? If my prior is as you de­scribe it, I as­sign liter­ally zero prob­a­bil­ity to the world you de­scribe—I know the coin isn’t fair. I’m fine with my in­fer­ence pro­ce­dure dis­play­ing patholog­i­cal be­hav­ior in a uni­verse I’m ab­solutely con­fi­dent I’m not in.

RJ: So you’re fine with an in­fer­ence pro­ce­dure which performs abysmally in the real world?

DP: What? Of course not.

RJ: But the real world can­not pos­si­bly be in your hy­poth­e­sis space. It’s too big. You can’t ex­plic­itly write it down.

DP: Physi­cists seem to be mak­ing good progress.

RJ: Sure, but those aren’t hy­pothe­ses which you can di­rectly use to an­ti­ci­pate your ex­pe­riences. They re­quire too much com­pu­ta­tion. Any­thing that can fit in your head, can’t be the real world.

DP: You’re deal­ing with hu­man frailty again.

RJ: On the con­trary. Even ideal­ized agents can’t fit in­side a uni­verse they can perfectly pre­dict. To see the con­tra­dic­tion, just let two of them play rock-pa­per-scis­sors with each other. Any­thing that can an­ti­ci­pate what you ex­pect, and then do some­thing else, can’t be in your hy­poth­e­sis space. But let me try a differ­ent an­gle of at­tack. Bayesi­anism is sup­posed to be the philos­o­phy of sub­jec­tive prob­a­bil­ity. Here, you’re ar­gu­ing as if the prior rep­re­sented an ob­jec­tive fact about how the uni­verse is. It isn’t, and can’t be.

DP: I’ll deal with both of those points at once. I don’t re­ally need to as­sume that the ac­tual uni­verse is within my hy­poth­e­sis space. Con­struct­ing a prior over a set of hy­pothe­ses guaran­tees you this: if there is a best el­e­ment in that class, you will con­verge to it. In the coin-flip ex­am­ple, I don’t have the ob­jec­tive uni­verse in my set of hy­pothe­ses un­less I can perfectly pre­dict ev­ery coin-flip. But the sub­jec­tive hy­poth­e­sis which treats the coin as fair is the best of its kind. In the rock-pa­per-scis­sors ex­am­ple, ra­tio­nal play­ers would similarly con­verge to­ward treat­ing each other’s moves as ran­dom, with prob­a­bil­ity on each move.

RJ: Good. But you’ve set up the punch­line for me: if there is no best el­e­ment, you lack a con­ver­gence guaran­tee.

DP: But it seems as if good pri­ors usu­ally do have a best el­e­ment. Us­ing Laplace’s rule of suc­ces­sion, I can pre­dict coins of any bias with­out di­ver­gence.

RJ: What if the coin lands as fol­lows: 5 heads in a row, then 25 tails, then 125 heads, and so on, each run last­ing for the next power of five. Then you di­verge again.

DP: Ok, sure… but if the coin flips might not be in­de­pen­dent, then I should have hy­pothe­ses like that in my prior.

RJ: I could keep try­ing to give ex­am­ples which break your prior, and you could keep try­ing to patch it. But we have agreed on the im­por­tant thing: good pri­ors should have the con­ver­gence prop­erty. At least you’ve agreed that this is a de­sir­able prop­erty not always achieved by Bayes.

DP: Sure.

In the end, I’m not sure who would win the coun­terex­am­ple/​patch game: it’s quite pos­si­ble that there gen­eral pri­ors with con­ver­gence guaran­tees. No com­putable prior has con­ver­gence guaran­tees for “suffi­ciently rich” ob­serv­ables (ie, ob­serv­ables in­clud­ing log­i­cal com­bi­na­tions of ob­serv­ables). How­ever, that’s a the­o­rem with a lot of caveats. In par­tic­u­lar, Solomonoff In­duc­tion isn’t com­putable, so might be im­mune to the cri­tique. And we can cer­tainly get rid of the prob­lem by re­strict­ing the ob­serv­ables, EG by con­di­tion­ing on their se­quen­tial or­der rather than just their truth. Yet, I sus­pect all such solu­tions will ei­ther be re­ally dumb, or un­com­putable.

So there’s work to be done here.

But, in gen­eral (ie with­out any spe­cial prior which does guaran­tee con­ver­gence for re­stricted ob­ser­va­tion mod­els), a Bayesian re­lies on a re­al­iz­abil­ity (aka grain-of-truth) as­sump­tion for con­ver­gence, as it does for some other nice prop­er­ties. Rad­i­cal prob­a­bil­ism de­mands these prop­er­ties with­out such an as­sump­tion.

So much for tech­ni­cal de­tails. Another point I want to make is that con­ver­gence points at a no­tion of “ob­jec­tivity” for the rad­i­cal prob­a­bil­ist. Although the in­di­vi­d­ual up­dates a rad­i­cal prob­a­bil­ist makes can go all over the place, the be­liefs must even­tu­ally set­tle down to some­thing. The goal of rea­son­ing is to set­tle down to that an­swer as quickly as pos­si­ble. Up­dates may ap­pear ar­bi­trary from the out­side, but in­ter­nally, they are always mov­ing to­ward this goal.

This point is fur­ther em­pha­sized by the next ra­tio­nal­ity prop­erty: con­ser­va­tion of ex­pected ev­i­dence.

Con­ser­va­tion of Ex­pected Evidence

The law of con­ser­va­tion of ex­pected ev­i­dence is a dearly be­loved Bayesian prin­ci­ple. You’ll be glad to hear that it sur­vives un­scathed:

In the above, is your cur­rent be­lief in some propo­si­tion ; is some fu­ture be­lief about (so I’m as­sum­ing ); and is the ex­pected value op­er­a­tor ac­cord­ing to your cur­rent be­liefs. So what the equa­tion says is: your cur­rent be­liefs equal your ex­pected value of your fu­ture be­liefs. This is just like the usual for­mu­la­tion of no-ex­pected-net-up­date, ex­cept we no longer take the ex­pec­ta­tion with re­spect to ev­i­dence, since a non-Bayesian up­date may not be grounded in ev­i­dence.

Proof: Sup­pose . One of the two num­bers is higher, and the other lower. Sup­pose is the lower num­ber. Then a bookie can buy a cer­tifi­cate pay­ing $ on day ; we will will­ingly sell the bookie this for $. The bookie can also sell us a cer­tifi­cate pay­ing $1 if , for a price of $. At time , the bookie gains $ due to the first cer­tifi­cate. It can then buy the sec­ond cer­tifi­cate back from us for $, us­ing the win­nings. Over­all, the bookie has now paid $ to us, but we have paid the bookie $, which we as­sumed was greater. So the bookie prof­its the differ­ence.

If is the lower num­ber in­stead, the same strat­egy works, re­vers­ing all buys and sells.

The key idea here is that both a di­rect bet on and a bet on will be worth later, so they’d bet­ter have the same price now, too.

I see this prop­erty as be­ing even more im­por­tant for a rad­i­cal prob­a­bil­ist than it is for a dog­matic prob­a­bil­ist. For a dog­matic prob­a­bil­ist, it’s a con­se­quence of Bayesian con­di­tional prob­a­bil­ity. For a rad­i­cal prob­a­bil­ist, it’s a ba­sic con­di­tion on ra­tio­nal up­dates. With up­dates be­ing so free to go in any di­rec­tion, it’s an im­por­tant an­chor-point.

Another name for this law is the mar­t­in­gale prop­erty. This is a prop­erty of many stochas­tic pro­cesses, such as Brow­n­ian mo­tion. From wikipe­dia:

In prob­a­bil­ity the­ory, a mar­t­in­gale is a se­quence of ran­dom vari­ables (i.e., a stochas­tic pro­cess) for which, at a par­tic­u­lar time, the con­di­tional ex­pec­ta­tion of the next value in the se­quence, given all prior val­ues, is equal to the pre­sent value.

It’s im­por­tant that a se­quence of ra­tio­nal be­liefs have this prop­erty. Other­wise, fu­ture be­liefs are differ­ent from cur­rent be­liefs in a pre­dictable way, and we would be bet­ter off up­dat­ing ahead of time.

Ac­tu­ally, that’s not im­me­di­ately ob­vi­ous, right? The bookie in the Dutch Book ar­gu­ment doesn’t make money by up­dat­ing to the fu­ture be­lief faster than the agent, but rather, by play­ing the agent’s be­liefs off of each other.

This leads me to a stronger prop­erty, which has the mar­t­in­gale prop­erty as an im­me­di­ate con­se­quence (strong self trust):

Again I’m as­sum­ing . The idea here is sup­posed to be: if you knew your own fu­ture be­lief, you would be­lieve it already. Fur­ther­more, you be­lieve and are perfectly cor­re­lated: the only way you’d have high con­fi­dence in would be if it were very prob­a­bly true, and the only way you’d have low con­fi­dence would be for it to be very prob­a­bly false.

I won’t try to prove this one. In fact, be wary: this ra­tio­nal­ity con­di­tion is a bit too strong. The con­di­tion holds true in the rad­i­cal-prob­a­bil­ism for­mal­iza­tion of Di­achronic Co­her­ence and Rad­i­cal Prob­a­bil­ism by Brian Skyrms, so long as (see sec­tion 6 for state­ment and proof). How­ever, Log­i­cal In­duc­tion ar­gues per­sua­sively that this con­di­tion is un­de­sir­able in spe­cific cases, and re­places it with a slightly weaker con­di­tion (see sec­tion 4.12).

Nonethe­less, for sim­plic­ity, I’ll pro­ceed as if strong self trust were pre­cisely true.

At the end of the pre­vi­ous sec­tion, I promised that the cur­rent sec­tion would fur­ther illu­mi­nate my re­mark:

The goal of rea­son­ing is to set­tle down to that an­swer as quickly as pos­si­ble. Up­dates may ap­pear ar­bi­trary from the out­side, but in­ter­nally, they are always mov­ing to­ward this goal.

The way rad­i­cal prob­a­bil­ism al­lows just about any change when be­liefs shift from to may make its up­dates seem ir­ra­tional. How can the up­date be any­thing, and still be called ra­tio­nal? Doesn’t that mean a rad­i­cal prob­a­bil­ist is open to garbage up­dates?

No. A rad­i­cal prob­a­bil­ist doesn’t sub­jec­tively think all up­dates are equally ra­tio­nal. A rad­i­cal prob­a­bil­ist trusts the pro­gres­sion of their own think­ing, and also does not yet know the out­come of their own think­ing; this is why I as­serted ear­lier that a fluid up­date can be just about any­thing (bar­ring the trans­for­ma­tion of a zero into a pos­i­tive prob­a­bil­ity). How­ever, this does not mean that a rad­i­cal prob­a­bil­ist would ac­cept a psychedelic pill which ar­bi­trar­ily mod­ified their be­liefs.

Sup­pose a rad­i­cal prob­a­bil­ist has a se­quence of be­liefs . If they thought hard for a while, they could up­date to . On the other hand, if they took the psychedelic pill, their be­liefs would be mod­ified to be­come . The se­quence would be abruptly dis­rupted, and go off the rails:

The rad­i­cal prob­a­bil­ist does not trust what­ever they be­lieve next. Rather, the rad­i­cal prob­a­bil­ist has a con­cept of vir­tu­ous epistemic pro­cess, and is will­ing to be­lieve the next out­put of such a pro­cess. Dis­rup­tions to the epistemic pro­cess do not get this sort of trust with­out rea­son. (For those fa­mil­iar with The Abo­li­tion of Man, this con­cept is very rem­i­nis­cent of his “Tao”.)

On the other hand, a rad­i­cal prob­a­bil­ist could trust a differ­ent pro­cess. One per­son, , might trust that an­other per­son, , is bet­ter-in­formed about any sub­ject:

This says that trusts on any sub­ject if they’ve had the same amount of time to think. This leaves open the ques­tion of what thinks if has had longer to think. In the ex­treme case, it might be that thinks is bet­ter no mat­ter how long has to think:

On the other hand, and can both be perfectly ra­tio­nal by the stan­dards of rad­i­cal prob­a­bil­ism and not trust each other at all. might not trust ’s opinion no mat­ter how long thinks.

(Note, how­ever, that you do get even­tual agree­ment on mat­ters where good feed­back is available—much like in dog­matic Bayesi­anism, it’s difficult for two Bayesi­ans to dis­agree about em­piri­cal pre­dic­tions for long.)

This means you can’t nec­es­sar­ily re­place one “vir­tu­ous epistemic pro­cess” with an­other. and might both be perfectly ra­tio­nal by the stan­dards of rad­i­cal prob­a­bil­ism, and yet the dis­rupted se­quence would not be, be­cause does not nec­es­sar­ily trust or sub­se­quent s.

Real­is­ti­cally, we can be in this kind of po­si­tion and not even know what con­sti­tutes a vir­tu­ous rea­son­ing pro­cess by our stan­dards. We gen­er­ally think that we can “do philos­o­phy” and reach bet­ter con­clu­sions. But we don’t have a clean speci­fi­ca­tion of our own think­ing pro­cess. We don’t know ex­actly what counts as a vir­tu­ous con­tinu­a­tion of our think­ing vs a dis­rup­tion.

This has some im­pli­ca­tions for AI al­ign­ment, but I won’t try to spell them out here.

Calibration

One more ra­tio­nal­ity prop­erty be­fore we move on.

One could be for­given for read­ing Eliezer’s A Tech­ni­cal Ex­pla­na­tion of Tech­ni­cal Ex­pla­na­tion and com­ing to be­lieve that Bayesian rea­son­ers are cal­ibrated. Eliezer goes so far as to sug­gest that we define prob­a­bil­ity in terms of cal­ibra­tion, so that what it means to say “90% prob­a­bil­ity” is that, in cases where you say 90%, the thing hap­pens 9 out of 10 times.

How­ever, the truth is that cal­ibra­tion is a ne­glected prop­erty in Bayesian prob­a­bil­ity the­ory. Bayesian up­dates do not help you learn to be cal­ibrated, any more than they help your be­liefs to be con­ver­gent.

We can make a sort of Dutch Book ar­gu­ment for cal­ibra­tion: if things hap­pen 9-out-of-ten times when the agent says 80%, then a bookie can place bets with the agent at 85:15 odds and profit in the long run. (Note, how­ever, that this is a bit differ­ent from typ­i­cal Dutch Book ar­gu­ments: it’s a strat­egy in which the bookie risks some money, rather than just get­ting a sure gain. What I can say is that Log­i­cal In­duc­tion treats this as a valid Dutch Book, and so, we get a cal­ibra­tion prop­erty in that for­mal­ism. I’m not sure about other for­mal­i­sa­tions of Rad­i­cal Prob­a­bil­ism.)

The in­tu­ition is similar to con­ver­gence: even lack­ing a hy­poth­e­sis to ex­plain it, a ra­tio­nal agent should even­tu­ally no­tice “hey, when I say 80%, the thing hap­pens 90% of the time!”. It can then im­prove its be­liefs in fu­ture cases by ad­just­ing up­wards.

This illus­trates “meta-prob­a­bil­is­tic be­liefs”: a rad­i­cal prob­a­bil­ist can have in­formed opinions about the be­liefs them­selves. By de­fault, a clas­si­cal Bayesian doesn’t have be­liefs-about-be­liefs ex­cept as a re­sult of learn­ing about the world and rea­son­ing about them­selves as a part of the world, which is prob­le­matic in the clas­si­cal Bayesian for­mal­ism. It is pos­si­ble to add sec­ond-or­der prob­a­bil­ities, third-or­der, etc. But cal­ibra­tion is a case which col­lapses all those lev­els, illus­trat­ing how the rad­i­cal prob­a­bil­ist can han­dle all of this more nat­u­rally.

I’m struck by the way cal­ibra­tion is some­thing Bayesi­ans ob­vi­ously want. The set of peo­ple who ad­vo­cate ap­ply­ing Bayes Law and the set of peo­ple who look at cal­ibra­tion charts for their own prob­a­bil­ities has a very sig­nifi­cant over­lap. Yet, Bayes’ Law does not give you cal­ibra­tion. It makes me feel like more peo­ple should have no­ticed this sooner and made a big­ger deal about it.

Bayes From a Distance

Be­fore any more tech­ni­cal de­tails about rad­i­cal prob­a­bil­ism, I want to take a step back and give one in­tu­ition for what’s go­ing on here.

We can see rad­i­cal prob­a­bil­ism as what a dog­matic Bayesian looks like if you can’t see all the de­tails.

The Ra­tion­al­ity of Acquaintances

Imag­ine you have a room­mate who is perfectly ra­tio­nal in the dog­matic sense: this room­mate has low-level ob­ser­va­tions which are 100% con­fi­dent, and performs a perfect Bayesian up­date on those ob­ser­va­tions.

How­ever, ob­serv­ing your room­mate, you can’t track all the de­tails of this. You talk to your room­mate about some im­por­tant be­liefs, but you can’t track ev­ery lit­tle Bayesian up­date—that would mean track­ing ev­ery sen­sory stim­u­lus.

From your per­spec­tive, your room­mate has con­stantly shift­ing be­liefs, which can’t quite be ac­counted for. If you are par­tic­u­larly puz­zled by a shift in be­lief, you can dis­cuss rea­sons. “I up­dated against get­ting a cat be­cause I ob­served a hair­ball in our neigh­bor’s apart­ment.” Yet, none of the ev­i­dence dis­cussed is it­self 100% con­fi­dent—it’s at least a lit­tle bit re­moved from low-level sense-data, and at least a lit­tle un­cer­tain.

Yet, this is not a big ob­sta­cle to view­ing your room­mate’s be­liefs as ra­tio­nal. You can eval­u­ate these be­liefs on their own mer­its.

I’ve heard this model called Bayes-with-a-side-chan­nel. You have an agent up­dat­ing via Bayes, but part of the ev­i­dence is hid­den. You can’t give a for­mula for changes in be­lief over time, but you can still as­sert that they’ll fol­low con­ser­va­tion of ex­pected ev­i­dence, and some other ra­tio­nal­ity con­di­tions.

What Jeffrey pro­poses is that we al­low these dy­nam­ics with­out nec­es­sar­ily posit­ing a side-chan­nel to ex­plain the un­pre­dictable up­dates. This has an anti-re­duc­tion­ist fla­vor to it: up­dates do not have to re­duce to ob­ser­va­tions. But why should we be re­duc­tion­ist in that way? Why would sub­jec­tive be­lief up­dates need to re­duce to ob­ser­va­tions?

(Note that Bayes-with-a-side-chan­nel does not im­ply con­di­tions such as con­ver­gence and cal­ibra­tion; so, Jeffrey’s the­ory of ra­tio­nal­ity is more de­mand­ing.)

Wet­ware Bayes

Of course, Jeffrey would say that our re­la­tion­ship with our­selves is much like the room­mate in my story. Our be­liefs move around, and while we can of­ten give some ac­count of why, we can’t give a full ac­count in terms of things we’ve learned with 100% con­fi­dence. And it’s not sim­ply be­cause we’re a Bayesian rea­soner who lacks in­tro­spec­tive ac­cess to the low-level in­for­ma­tion. The na­ture of our wet­ware is such that there isn’t re­ally any place you can point to and say “this is a 100% known ob­ser­va­tion”. Jeffrey would go on to point out that there’s no clean di­vid­ing line be­tween ex­ter­nal and in­ter­nal, so you can’t re­ally draw a bound­ary be­tween ex­ter­nal event and in­ter­nal ob­ser­va­tion-of-that-event.

(I would re­mark that Jeffrey doesn’t ex­actly give us a way to han­dle that prob­lem; he just offers an ab­strac­tion which doesn’t chafe on that as­pect of re­al­ity so badly.)

Rather than imag­in­ing that there are perfect ob­ser­va­tions some­where in the ner­vous sys­tem, we can in­stead imag­ine that a sen­sory stim­u­lus ex­erts a kind of “ev­i­den­tial pres­sure” which can be less than 100%. Th­ese ev­i­den­tial pres­sures can also come from within the brain, as is the case with log­i­cal up­dates.

But Where Do Up­dates Come From?

Dog­matic prob­a­bil­ism raises the all-im­por­tant ques­tion “where do pri­ors come from?”—but once you an­swer that, ev­ery­thing else is sup­posed to be set­tled. There have been many de­bates about what con­sti­tutes a ra­tio­nal prior.

Q. How can I find the pri­ors for a prob­lem?
A. Many com­monly used pri­ors are listed in the Hand­book of Chem­istry and Physics.

Q. Where do pri­ors origi­nally come from?
A. Never ask that ques­tion.

Q. Uh huh. Then where do sci­en­tists get their pri­ors?
A. Pri­ors for sci­en­tific prob­lems are es­tab­lished by an­nual vote of the AAAS. In re­cent years the vote has be­come frac­tious and con­tro­ver­sial, with wide­spread ac­rimony, fac­tional po­lariza­tion, and sev­eral out­right as­sas­si­na­tions. This may be a front for in­fight­ing within the Bayes Coun­cil, or it may be that the dis­putants have too much spare time. No one is re­ally sure.

Q. I see. And where does ev­ery­one else get their pri­ors?
A. They down­load their pri­ors from Kazaa.

Q. What if the pri­ors I want aren’t available on Kazaa?
A. There’s a small, clut­tered an­tique shop in a back alley of San Fran­cisco’s Chi­na­town. Don’t ask about the bronze rat.

-- Eliezer Yud­kowsky, An In­tu­itive Ex­pla­na­tion of Bayes’ Theorem

Rad­i­cal prob­a­bil­ists put less em­pha­sis on the prior, since a rad­i­cal prob­a­bil­ist can effec­tively “de­cide to have a differ­ent prior” (up­dat­ing their be­liefs as if they’d swapped out one prior for an­other). How­ever, they face a similarly large prob­lem of where up­dates come from.

We are given a pic­ture in which be­liefs are like a small par­ti­cle in a fluid, re­act­ing to all sorts of forces (some strong and some weak). Its lo­ca­tion grad­u­ally shifts as a re­sult of Brow­n­ian mo­tion. Pre­sum­ably, the in­ter­est­ing work is be­ing done be­hind the scenes, by what­ever is gen­er­at­ing these up­dates. Yet, Jeffrey’s pic­ture seems to mainly be about the dance of the par­ti­cle, while the fluid around it re­mains a mys­tery.

A full an­swer to that ques­tion is be­yond the scope of this post. (Log­i­cal In­duc­tion offers one fully de­tailed an­swer to that ques­tion.) How­ever, I do want to make a few re­marks on this prob­lem.

  • It might at first seem strange for be­liefs to be so rad­i­cally malle­able to ex­ter­nal pres­sures. But, ac­tu­ally, this is already the fa­mil­iar Bayesian pic­ture: ev­ery­thing hap­pens due to ex­ter­nally-driven up­dates.

  • Bayesian up­dates don’t re­ally an­swer the ques­tion of where up­dates come from, ei­ther. They take it as given that there are some “ob­ser­va­tions”. Rad­i­cal prob­a­bil­ism sim­ply al­lows for a more gen­eral sort of feed­back for learn­ing.

  • An or­tho­dox prob­a­bil­ist might an­swer this challenge by say­ing some­thing like: when we de­sign an agent, we de­sign sen­sors for it. Th­ese are con­nected in such a way as to feed in sen­sory ob­ser­va­tions. A rad­i­cal prob­a­bil­ist can similarly say: when we de­sign an agent, we get to de­cide what sort of feed­back it uses to im­prove its be­liefs.

The next sec­tion will give some prac­ti­cal, hu­man ex­am­ples of non-Bayesian up­dates.

Vir­tual Evidence

Bayesian up­dates are path-in­de­pen­dent: it does not mat­ter in what or­der you ap­ply them. If you first learn and then learn , your up­dated prob­a­bil­ity dis­tri­bu­tion is . If you learn these facts the other way around, it’s still .

Jeffrey up­dates are path-de­pen­dent. Sup­pose my prob­a­bil­ity dis­tri­bu­tion is as fol­lows:

A¬A
B30%20%
¬B20%30%

I then ap­ply the Jeffrey up­date P(B)=60%:

A¬A
B36%24%
¬B 16%24%

Now I ap­ply P(A)=60%:

A¬A
B41.54%20%
¬B 18.46%20%

Since this is asym­met­ric, but the ini­tial dis­tri­bu­tion was sym­met­ric, ob­vi­ously this would turn out differ­ently if we had ap­plied the Jeffrey up­dates in a differ­ent or­der.

Jeffrey con­sid­ered this to be a bug—al­though he seems fine with path-de­pen­dence un­der some cir­cum­stances, he used ex­am­ples like the above to mo­ti­vate a differ­ent way of han­dling un­cer­tain ev­i­dence, which I’ll call vir­tual ev­i­dence. (Judea Pearl strongly ad­vo­cated vir­tual ev­i­dence over Jeffrey’s rule near the be­gin­ning of Prob­a­bil­is­tic Rea­son­ing in In­tel­li­gent Sys­tems (Sec­tion 2.2.2 and 2.3.3), in what can eas­ily be read as a cri­tique of Jeffrey’s the­ory—if one does not re­al­ize that Jeffrey is largely in agree­ment with Pearl. I thor­oughly recom­mend Pearl’s dis­cus­sion of the de­tails.)

Re­call the ba­sic anatomy of a Bayesian up­date:

The idea of vir­tual ev­i­dence is to use ev­i­dence ‘e’ which is not an event in our event space. We’re just act­ing as if there were ev­i­dence ‘e’ which jus­tifies our up­date. Terms such as P(e), P(e&h), P(e|h), P(h|e), and so on are not given the usual prob­a­bil­is­tic in­ter­pre­ta­tion; they just stand as a con­ve­nient no­ta­tion for the up­date. All we need to know is the like­li­hood func­tion for the up­date. We then mul­ti­ply our prob­a­bil­ities by the like­li­hood func­tion as usual, and nor­mal­ize. P(e) is easy to find, since it’s just what­ever fac­tor makes ev­ery­thing sum to one at the end. This is good, since it isn’t clear what P(e) would mean for a vir­tual event.

Ac­tu­ally, we can sim­plify even fur­ther. All we re­ally need to know is the like­li­hood ra­tio: the ra­tio be­tween the two num­bers in the like­li­hood func­tion. (I will illus­trate this with an ex­am­ple soon). How­ever, it may some­times be eas­ier to find the whole like­li­hood func­tion in prac­tice.

Let’s look at the path-de­pen­dence ex­am­ple again. As be­fore, we start with:

A¬A
B30%20%
¬B20%30%

I want to ap­ply a Jeffrey up­date which makes P(B)=60%. How­ever, let’s rep­re­sent the up­date via vir­tual ev­i­dence this time. Cur­rently, P(B)=50%. To take it to 60%, we need to see vir­tual ev­i­dence with a 60:40 like­li­hood ra­tio, such as P(B|E)=60%, P(¬B|E)=40%. This gives us the same up­date as be­fore:

A¬A
B36%24%
¬B 16%24%

(Note that we would have got­ten the same re­sult with a like­li­hood func­tion of P(B|E)=3%, P(¬B|E)=2%, since 60:40 is the same as 3:2. That’s what I meant when I said that only the ra­tio mat­ters.)

But now we want to ap­ply the same up­date to A as we did to B. So now we up­date on vir­tual ev­i­dence P(A|E)=60%, P(¬A|E)=40%. This gives us the fol­low­ing (ap­prox­i­mately):

A¬A
B43%19%
¬B 19%19%

As you can see, the re­sult is quite sym­met­ric. In gen­eral, vir­tual ev­i­dence up­dates will be path-in­de­pen­dent, be­cause mul­ti­pli­ca­tion is com­mu­ta­tive (and the nor­mal­iza­tion step of up­dat­ing doesn’t mess with this com­mu­ta­tivity).

So, vir­tual ev­i­dence is a re­for­mu­la­tion of Jeffrey up­dates with a lot of ad­van­tages:

  • Un­like raw Jeffrey up­dates, vir­tual ev­i­dence is path-in­de­pen­dent.

  • You don’t have to de­cide right away what you’re up­dat­ing to; you just have to de­cide the strength and di­rec­tion of the up­date.

  • I don’t fully dis­cuss this here, but Pearl ar­gues per­sua­sively that it’s eas­ier to tell when a vir­tual-ev­i­dence up­date is ap­pro­pri­ate than when a Jeffrey up­date is ap­pro­pri­ate.

Be­cause of these fea­tures, vir­tual ev­i­dence is much more use­ful for in­te­grat­ing in­for­ma­tion from mul­ti­ple sources.

In­te­grat­ing Ex­pert Opinions

Sup­pose you have an an­cient arte­fact. You want to know whether this arte­fact was made by an­cient aliens. You have some friends who are also cu­ri­ous about an­cient aliens, so you en­list their help.

You ask one friend who is a met­al­lur­gist. After perform­ing ex­per­i­ments (the de­tails of which you don’t un­der­stand), the met­al­lur­gist isn’t sure, but gives 80% that the tests would turn out that way if it were of ter­res­trial ori­gin, and 20% for met­als of non-ter­res­trial ori­gin. (Let’s pre­tend that an­cient aliens would 100% use met­als of non-Earth ori­gin, and that an­cient hu­mans would 100% use Earth met­als.)

You then ask a sec­ond friend, who is an an­thro­pol­o­gist. The an­thro­pol­o­gist uses cul­tural signs, iden­ti­fy­ing the style of the art and writ­ing. Based on that in­for­ma­tion, the an­thro­pol­o­gist es­ti­mates that it’s half as likely to be of ter­res­trial ori­gin as alien.

How do we in­te­grate this in­for­ma­tion? Ac­cord­ing to Jeffrey and Pearl, we can ap­ply the vir­tual ev­i­dence for­mula if we think the two ex­pert judge­ments are in­de­pen­dent. What ‘in­de­pen­dence’ means for vir­tual ev­i­dence is a bit murky, since the ev­i­dence is not part of our prob­a­bil­ity calcu­lus, so we can’t ap­ply the usual prob­a­bil­is­tic defi­ni­tion. How­ever, Pearl ar­gues per­sua­sively that this con­di­tion is eas­ier to eval­u­ate in prac­tice than the rigidity con­di­tion which gov­erns the ap­pli­ca­bil­ity of Jeffrey up­dates. (He also gives an ex­am­ple where rigidity is vi­o­lated, so a naive Jeffrey up­date gives a non­sen­si­cal re­sult but where vir­tual ev­i­dence can still be eas­ily ap­plied to get a cor­rect re­sult.)

The in­for­ma­tion pro­vided by the an­thro­pol­o­gist and the met­al­lur­gist seem to be quite in­de­pen­dent types of in­for­ma­tion (at least, if we ig­nore the fact that both ex­perts are bi­ased by an in­ter­est in an­cient aliens), so let’s ap­ply the vir­tual ev­i­dence rule. The like­li­hood ra­tio from the met­al­lur­gist was 80:20, which sim­plifies to 4:1. The like­li­hood ra­tio from the an­thro­pol­o­gist was 1:2. That makes the com­bined like­li­hood vec­tor 2:1 in fa­vor of ter­res­trial ori­gin. We would then com­bine this with our prior; for ex­am­ple, if we had a prior of 3:1 in fa­vor of a ter­res­trial ori­gin, our pos­te­rior would be 6:1 in fa­vor.

(Note that we also have to think that the vir­tual ev­i­dence is in­de­pen­dent of our prior in­for­ma­tion.)

So, vir­tual ev­i­dence offers a prac­ti­cal way to in­te­grate in­for­ma­tion when we can­not quan­tify ex­actly what the ev­i­dence was—a con­di­tion which is es­pe­cially likely when con­sult­ing ex­perts. This illus­trates the util­ity of the bayes-with-a-side-chan­nel model men­tioned ear­lier; we are able to deal effec­tively with ev­i­dence, even when the ex­act na­ture of the ev­i­dence is hid­den to us.

A few notes on how we gath­ered ex­pert in­for­ma­tion in our hy­po­thet­i­cal ex­am­ple.

  • We asked for like­li­hood ra­tios, rather than pos­te­rior prob­a­bil­ities. This al­lows us to com­bine the in­for­ma­tion as vir­tual ev­i­dence.

  • In the case of the met­al­lur­gist, it makes sense to ask for like­li­hood ra­tios, since the met­al­lur­gist is un­likely to have good prior in­for­ma­tion about the arte­fact. Ask­ing only for like­li­hoods al­lows us to fac­tor out any effect from this poor prior (and in­stead use our own prior, which may still be poor, but has the ad­van­tage of be­ing ours).

  • In the case of the an­thro­pol­o­gist, how­ever, it doesn’t make as much sense—if we trust their ex­per­tise, we’re likely to think the an­thro­pol­o­gist has a good prior about arte­facts. It might have made more sense to ask for the an­thro­pol­o­gist’s pos­te­rior, take it as our own, and then ap­ply a vir­tual-ev­i­dence up­date to in­te­grate the met­al­lur­gist’s re­port. (How­ever, if we weren’t able to prop­erly com­mu­ni­cate our own prior in­for­ma­tion to the an­thro­pol­o­gist, it would be ig­nored in such an ap­proach.)

  • In the case of the met­al­lur­gist, it felt more nat­u­ral to give a full like­li­hood func­tion, rather than a like­li­hood ra­tio. It makes sense to know the prob­a­bil­ity of test re­sult given a par­tic­u­lar sub­stance. It would have made even more sense if the like­li­hood func­tion were a func­tion of each metal the arte­fact could be made of, rather than just “ter­res­trial” or “ex­trater­res­trial”—us­ing broad cat­e­gories al­lows the met­al­lur­gist’s prior about spe­cific sub­stances to creep in, which might be un­for­tu­nate.

  • In the case of the an­thro­pol­o­gist, how­ever, it didn’t make sense to give a full like­li­hood func­tion. “The prob­a­bil­ity that the arte­fact would look ex­actly the way it looks as­sum­ing that it’s made by hu­mans” is very very low, and seems quite difficult and un­nat­u­ral to eval­u­ate. It seems much eas­ier to come up with a like­li­hood ra­tio, com­par­ing the prob­a­bil­ity of ter­res­trial and ex­trater­res­trial ori­gin.

Why did Pearl de­vote sev­eral sec­tions to vir­tual ev­i­dence, in a book which is oth­er­wise a bible for dog­matic prob­a­bil­ists? I think the main rea­son is the close anal­ogy to the math­e­mat­ics of Bayesian net­works. The mes­sage-pass­ing al­gorithm which makes Bayesian net­works effi­cient is al­most ex­actly the vir­tual ev­i­dence pro­ce­dure I’ve de­scribed. If we think of each node as an ex­pert try­ing to in­te­grate in­for­ma­tion from its neigh­bors, then the effi­ciency of Bayes nets comes from the fact that they can use vir­tual ev­i­dence to up­date on like­li­hood func­tions rather than need­ing to know about the ev­i­dence in de­tail. This may have even been one source of in­spira­tion for Pearl’s be­lief prop­a­ga­tion al­gorithm?

Can Dog­matic Prob­a­bil­ists Use Vir­tual Ev­i­dence?

OK, so we’ve put Jeffrey’s rad­i­cal up­dates into a more palat­able form—one which bor­rows the struc­ture and no­ta­tion of clas­si­cal Bayesian up­dates.

Does this mean or­tho­dox Bayesi­ans can join the party, and use vir­tual ev­i­dence to ac­com­plish ev­ery­thing a rad­i­cal prob­a­bil­ist can do?

No.

Vir­tual ev­i­dence aban­dons the ra­tio for­mula.

One of the long­stand­ing ax­ioms of clas­si­cal Bayesian thought is the ra­tio for­mula for con­di­tional prob­a­bil­ity that Bayes him­self in­tro­duced:

Vir­tual ev­i­dence, as an up­dat­ing prac­tice, holds that can be use­fully defined in cases where the ra­tio can­not be use­fully defined. In­deed, vir­tual ev­i­dence treats Bayes’ Law (which is usu­ally a de­rived the­o­rem) as more fun­da­men­tal than the ra­tio for­mula (which is usu­ally taken as a defi­ni­tion).

Granted, dog­matic prob­a­bil­ism as I defined it at the be­gin­ning of this post does not ex­plic­itly as­sume the ra­tio for­mula. But the as­sump­tion is so in­grained that I as­sume most read­ers took to mean the ra­tio.

Still, even so, we can con­sider a ver­sion of dog­matic prob­a­bil­ism which re­jects the ra­tio for­mula. Couldn’t they use vir­tual ev­i­dence?

Vir­tual ev­i­dence re­quires prob­a­bil­ity func­tions to take ar­gu­ments which aren’t part of the event space.

Even aban­don­ing the ra­tio for­mula, still, it’s hard to see how a dog­matic prob­a­bil­ist could use vir­tual ev­i­dence with­out aban­don­ing the Kol­mogorov ax­ioms as the foun­da­tion of prob­a­bil­ity the­ory. The Kol­mogorov ax­ioms make prob­a­bil­ities a func­tion of events; and events are taken from a pre-defined event space. Vir­tual ev­i­dence con­structs new events at will, and does not in­clude them in an over­ar­ch­ing event space (so that, for ex­am­ple, vir­tual ev­i­dence can be defined—so that is mean­ingful for all from the event space—with­out events like be­ing mean­ingful, as would be re­quired for a sigma-alge­bra).

I left some wig­gle room in my defi­ni­tion, say­ing that a dog­matic prob­a­bil­ist might en­dorse the Kol­mogorov ax­ioms “or a similar ax­iom­a­ti­za­tion of prob­a­bil­ity the­ory”. But even the Jeffrey-Bolker ax­ioms, which are pretty liberal, don’t al­low enough flex­i­bil­ity for this!

Rep­re­sent­ing Fluid Updates

A fi­nal point about vir­tual ev­i­dence and Jeffrey up­dates.

Near the be­gin­ning of this es­say, I gave a pic­ture in which Jeffrey up­dates gen­er­al­ize Bayesian up­dates, but fluid up­dates gen­er­al­ize things even fur­ther, open­ing up the space of pos­si­bil­ities when rigidity does not hold.

How­ever, I should point out that any up­date is a Jeffrey up­date on a suffi­ciently fine par­ti­tion.

So far, for sim­plic­ity, I’ve fo­cused on bi­nary par­ti­tions: we’re judg­ing be­tween H and ¬H, rather than a larger set such as . How­ever, we can gen­er­al­ize ev­ery­thing to ar­bi­trar­ily sized par­ti­tions, and will of­ten want to do so. I noted that a larger set might have been bet­ter when ask­ing the met­al­lur­gist about the arte­fact, since it’s eas­ier to judge the prob­a­bil­ity of test re­sults given spe­cific met­als rather than broad cat­e­gories.

If we make a par­ti­tion large enough to cover ev­ery pos­si­ble com­bi­na­tion of events, then a Jeffrey up­date is now just a com­pletely ar­bi­trary shift in prob­a­bil­ity. Or, al­ter­na­tively, we can rep­re­sent ar­bi­trary shifts via vir­tual ev­i­dence, by con­vert­ing to like­li­hood-ra­tio for­mat.

So, these up­dates are com­pletely gen­eral af­ter all.

Granted, there might not be any point to see­ing things that way.

Non-Se­quen­tial Prediction

One ad­van­tage of rad­i­cal prob­a­bil­ism is that it offers a more gen­eral frame­work for statis­ti­cal learn­ing the­ory. I already men­tioned, briefly, that it al­lows one to do away with the re­al­iz­abil­ity/​grain-of-truth as­sump­tion. This is very im­por­tant, but not what I’m go­ing to dwell on here. In­stead I’m go­ing to talk about non-se­quen­tial pre­dic­tion, which is a benefit of log­i­cal in­duc­tion which I think has been un­der-em­pha­sized so far.

In­for­ma­tion the­ory—in par­tic­u­lar, al­gorith­mic in­for­ma­tion the­ory—in par­tic­u­lar, Solomonoff in­duc­tion—is re­stricted to a se­quen­tial pre­dic­tion frame. This means there’s a very rigid ob­ser­va­tion model: ob­ser­va­tions are a se­quence of to­kens and you always ob­serve the nth to­ken af­ter ob­serv­ing to­kens one through n-1.

Granted, you can fit lots of things into a se­quen­tial pre­dic­tion model. How­ever, it is a flaw the oth­er­wise close re­la­tion­ship be­tween Bayesian prob­a­bil­ity and in­for­ma­tion the­ory. You’ll run into this if you try to re­late in­for­ma­tion the­ory and logic. Can you give an in­for­ma­tion-the­o­retic in­tu­ition for the laws of prob­a­bil­ity that deal with log­i­cal com­bi­na­tions, such as P(A or B) + P(A and B) = P(A) + P(B)?

I’ve com­plained about this be­fore, offer­ing a the­o­rem which (some­what) prob­le­ma­tizes the situ­a­tion, and sug­gest­ing that peo­ple should no­tice whether or not they’re mak­ing se­quen­tial-pre­dic­tion style as­sump­tions. I al­most in­cluded re­lated as­sump­tions in my defi­ni­tion of dog­matic prob­a­bil­ism at the be­gin­ning of this post, but ul­ti­mately it makes more sense to con­trast rad­i­cal prob­a­bil­ism to the more gen­eral doc­trine of Bayesian up­dates.

Se­quen­tial pre­dic­tion cares only about the ac­cu­racy of be­liefs at the mo­ment of ob­ser­va­tion; the ac­cu­racy of the full dis­tri­bu­tion over the fu­ture is re­duced to the ac­cu­racy about each next bit as it is ob­served.

If in­for­ma­tion is com­ing in “in any old way” rather than ac­cord­ing to the as­sump­tions of se­quen­tial pre­dic­tion, then we can con­struct prob­le­matic cases for Solomonoff in­duc­tion. For ex­am­ple, if we con­di­tion the nth bit to be 1 (or 0) when a the­o­rem prover proves (or re­futes) the nth sen­tence of Peano ar­ith­metic, then Solomonoff in­duc­tion will never as­sign pos­i­tive prob­a­bil­ity to hy­pothe­ses con­sis­tent with Peano ar­ith­metic, and will there­fore do poorly on this pre­dic­tion task. This is de­spite the fact that there are com­putable pro­grams which do bet­ter at this pre­dic­tion task; for ex­am­ple, the same the­o­rem prover run­ning just a lit­tle bit faster can have highly ac­cu­rate be­liefs at the mo­ment of ob­ser­va­tion.

In non-se­quen­tial pre­dic­tion, how­ever, we care about ac­cu­racy at ev­ery mo­ment, rather than just at the mo­ment of ob­ser­va­tion. Run­ning the same the­o­rem prover, just one step faster, doesn’t do very well on that met­ric. It al­lows you to get things right just in time, but you won’t have any clue about what prob­a­bil­ities to as­sign be­fore that. We don’t just want the right con­clu­sion; we want to get there as fast as pos­si­ble, and (in a sub­tle sense) via a ra­tio­nal path

Part of the difficulty of non-se­quen­tial pre­dic­tion is how to score it. Bayes loss ap­plied to your pre­dic­tions at the mo­ment of ob­ser­va­tion, in a se­quen­tial pre­dic­tion set­ting, seems quite use­ful. Bayes loss ap­plied to all your be­liefs, at ev­ery mo­ment does not seem very use­ful.

Rad­i­cal prob­a­bil­ism gives us a way to eval­u­ate the ra­tio­nal­ity of non-se­quen­tial pre­dic­tions—namely, how vuln­er­a­ble the se­quence of be­lief dis­tri­bu­tions was to los­ing money via some se­quence of bets.

Sadly, I’m not yet aware of any ap­pro­pri­ate gen­er­al­iza­tion of in­for­ma­tion the­ory—at least not one that’s very in­ter­est­ing. (You can in­dex in­for­ma­tion by time, to ac­count for the way prob­a­bil­ities stift over time… but that does not come with a nice the­ory of com­mu­ni­ca­tion or com­pres­sion, which are fun­da­men­tal to clas­si­cal in­for­ma­tion the­ory.) This is why I ob­jected to pre­dic­tion=com­pres­sion in the dis­cus­sion sec­tion of Alk­jash’s talk.

To sum­ma­rize, se­quen­tial pre­dic­tion makes three crit­i­cal as­sump­tions which may not be true in gen­eral:

  • It as­sumes ob­ser­va­tions will always in­form us about one of a set of ob­serv­able vari­ables. In gen­eral, Bayesian up­dates can in­stead in­form us about any event, in­clud­ing com­plex log­i­cal com­bi­na­tions (such as “ei­ther the first bit is 1, or the sec­ond bit is 0”).

  • It as­sumes these ob­ser­va­tions will be made in a spe­cific se­quence, whereas in gen­eral up­dates could come in in any or­der.

  • It as­sumes that what we care about is the ac­cu­racy of be­lief at the time of ob­ser­va­tion; in gen­eral, we may care about the ac­cu­racy of be­liefs at other times.

The only way I cur­rently know how to get the­o­ret­i­cal benefits similar to those of Solomonoff in­duc­tion while avoid­ing all three of these as­sump­tions is rad­i­cal prob­a­bil­ism (in par­tic­u­lar, as for­mal­ized by log­i­cal in­duc­tion).

(The con­nec­tion be­tween this sec­tion and rad­i­cal prob­a­bil­ism is no­tably weaker than the other parts of this es­say. I think there is a lot of low-hang­ing fruit here, flesh­ing out the space of pos­si­ble prop­er­ties, the re­la­tion­ship be­tween var­i­ous prob­lems and var­i­ous as­sump­tions, try­ing to gen­er­al­ize in­for­ma­tion the­ory, clar­ify­ing our con­cept of ob­ser­va­tion mod­els, et cetera.)

Mak­ing the Meta-Bayesian Update

In Pas­cal’s Mug­gle (long ver­sion, short ver­sion) Eliezer dis­cusses situ­a­tions in which he would be forced to make a non-Bayesian up­date:

But if I ac­tu­ally see strong ev­i­dence for some­thing I pre­vi­ously thought was su­per-im­prob­a­ble, I don’t just do a Bayesian up­date, I should also ques­tion whether I was right to as­sign such a tiny prob­a­bil­ity in the first place—whether the sce­nario was re­ally as com­plex, or un­nat­u­ral, as I thought. In real life, you are not ever sup­posed to have a prior im­prob­a­bil­ity of 10-100 for some fact dis­t­in­guished enough to be writ­ten down, and yet en­counter strong ev­i­dence, say 1010 to 1, that the thing has ac­tu­ally hap­pened. If some­thing like that hap­pens, you don’t do a Bayesian up­date to a pos­te­rior of 10-90. In­stead you ques­tion both whether the ev­i­dence might be weaker than it seems, and whether your es­ti­mate of prior im­prob­a­bil­ity might have been poorly cal­ibrated, be­cause ra­tio­nal agents who ac­tu­ally have well-cal­ibrated pri­ors should not en­counter situ­a­tions like that un­til they are ten billion days old. Now, this may mean that I end up do­ing some non-Bayesian up­dates: I say some hy­poth­e­sis has a prior prob­a­bil­ity of a quadrillion to one, you show me ev­i­dence with a like­li­hood ra­tio of a billion to one, and I say ‘Guess I was wrong about that quadrillion to one thing’ rather than be­ing a Mug­gle about it.

At the risk of be­ing too cutesy, I want to make two re­lated points:

  • At the ob­ject level, rad­i­cal prob­a­bil­ism offers a frame­work in which we can make these sorts of non-Bayesian up­dates. We can en­counter some­thing which makes us ques­tion our whole way of think­ing. It also al­lows us to sig­nifi­cantly re­vise that way of think­ing, with­out mod­el­ing the situ­a­tion as some­thing ex­treme like self-mod­ifi­ca­tion (or even some­thing very out of the or­di­nary).

  • At the meta level, up­dat­ing to rad­i­cal prob­a­bil­ism is it­self one of these non-Bayesian up­dates. Of course, if you were re­ally a hard-wired dog­matic prob­a­bil­ist at core, you would be un­able to make such an up­date (ex­cept per­haps if we model it as self-mod­ifi­ca­tion). But, since you are already us­ing rea­son­ing which is ac­tu­ally closer in spirit to rad­i­cal prob­a­bil­ism, you can start to model your­self in this way and us­ing rad­i­cal-prob­a­bil­ist ideas to guide fu­ture up­dates.

So, I wanted to use this penul­ti­mate sec­tion for some ad­vice about mak­ing the leap.

It All Adds Up to Normality

Rad­i­cal Prob­a­bil­ism is not a li­cense to up­date how­ever you want, nor even an in­vi­ta­tion to mas­sively change the way you up­date. It is pri­mar­ily a new way to un­der­stand what you are already do­ing. Yes, it’s pos­si­ble that view­ing things through this lense (rather than the more nar­row lense of dog­matic prob­a­bil­ism) will change the way you see things, and as a con­se­quence, change the way you do things. How­ever, you are not (usu­ally) mak­ing some sort of mis­take by en­gag­ing in the sort of Bayesian rea­son­ing you are fa­mil­iar with—there is no need to aban­don large por­tions of your think­ing.

In­stead, try to no­tice or­di­nary up­dates you make which are not perfectly un­der­stood as Bayesian up­dates.

  • Cal­ibra­tion cor­rec­tions are not well-mod­eled as Bayesian up­dates. If you say to your­self “I’ve been over­con­fi­dent in similar situ­a­tions”, and lower your prob­a­bil­ity, your shift is bet­ter-un­der­stood as a fluid up­date.

  • Many in­stances of “out­side view” are not well-mod­eled in a Bayesian up­date frame­work. You’ve prob­a­bly seen out­side view ex­plained as prior prob­a­bil­ity. How­ever, you of­ten take the out­side view on one of your own ar­gu­ments, e.g. “I’ve of­ten made ar­gu­ments like this and been wrong”. This kind of re­flec­tion doesn’t fit well in the frame­work of Bayesian up­dates, but fits fine in a rad­i­cal-prob­a­bil­ist pic­ture.

  • It is of­ten war­ranted to down­grade the prob­a­bil­ity of a hy­poth­e­sis with­out hav­ing an al­ter­na­tive in mind to up­grade. You can start to find a hy­poth­e­sis sus­pi­cious with­out hav­ing any bet­ter way of pre­dict­ing ob­ser­va­tions. For ex­am­ple, a se­quence of sur­pris­ing events might stick out to you as ev­i­dence that your hy­poth­e­sis is wrong, even though your hy­poth­e­sis is still the best way that you know to try and pre­dict the data. This is hard to for­mal­ize as a Bayesian up­date. Changes in prob­a­bil­ity be­tween hy­pothe­ses always re­main bal­anced. It’s true that you move the prob­a­bil­ity to a “not the hy­pothe­ses I know” cat­e­gory which bal­ances the prob­a­bil­ity loss, but it’s not true that this cat­e­gory earned the in­creased prob­a­bil­ity by pre­dict­ing the data bet­ter. In­stead, you used a set of heuris­tics which have worked well in the past to de­cide when to move prob­a­bil­ities around.

Don’t Pre­dictably Vio­late Bayes

Again, this is not a li­cense to vi­o­late Bayes’ Rule when­ever you feel like it.

A rad­i­cal prob­a­bil­ist should obey Bayes’ Law in ex­pec­ta­tion, in the fol­low­ing sense:

If some ev­i­dence E or ¬E is bound to be ob­served by time m>n, then the fol­low­ing should hold:

And the same for ¬E. In other words, you should not ex­pect your up­dated be­liefs to differ from your con­di­tional prob­a­bil­ities on av­er­age.

(You should sus­pect from the fact that I’m not prov­ing this one that I’m play­ing a bit fast and loose—whether this law holds may de­pend on the for­mal­iza­tion of rad­i­cal prob­a­bil­ism, and it prob­a­bly needs some ex­tra con­di­tions I haven’t stated, such as P(E)>0.)

And re­mem­ber, ev­ery up­date is a Bayesian up­date, with the right vir­tual ev­i­dence.

Ex­change Vir­tual Evidence

Play around with the epistemic prac­tice Jeffrey sug­gests. I sus­pect some of you already do some­thing similar, just not nec­es­sar­ily call­ing it by this name or look­ing so closely at what you’re do­ing.

Don’t Be So Real­ist About Your Own Utility Function

Note that the pic­ture here is quite com­pat­i­ble with what I said in An Ortho­dox Case Against Utility Func­tions. Your util­ity func­tion need not be com­putable, and there need not be some­thing in your on­tol­ogy which you can think of your util­ity as a func­tion of. All you need are util­ity ex­pec­ta­tions, and the abil­ity to up­date those ex­pec­ta­tions. Rad­i­cal Prob­a­bil­ism adds a fur­ther twist: you don’t need to be able to pre­dict those up­dates ahead of time; in­deed, you prob­a­bly can’t. Your val­ues aren’t tied to a func­tion, but rather, are tied to your trust in the on­go­ing pro­cess of rea­son­ing which re­fines and ex­tends those val­ues (very much like the self-trust dis­cussed in the sec­tion on con­ser­va­tion of ex­pected ev­i­dence).

Not So Rad­i­cal After All

And re­mem­ber, ev­ery up­date is a Bayesian up­date, with the right vir­tual ev­i­dence.

Recom­mended Reading

Di­achronic Co­her­ence and Rad­i­cal Prob­a­bil­ism, Brian Skyrms

  • This pa­per is re­ally nice in that it con­structs Rad­i­cal Prob­a­bil­ism from the ground up, rather than start­ing with reg­u­lar prob­a­bil­ity the­ory and re­lax­ing it. It pro­vides a view in which di­achronic co­her­ence is foun­da­tional, and reg­u­lar one-time-slice prob­a­bil­is­tic co­her­ence is de­rived. Like log­i­cal in­duc­tion, it rests on a mar­ket metaphor. It also briefly cov­ers the ar­gu­ment that rad­i­cal-prob­a­bil­ism be­liefs must have a con­ver­gence prop­erty.

Rad­i­cal Prob­a­bil­ism and Bayesian Con­di­tion­ing, Richard Bradley

  • This is a more thor­ough com­par­i­son of rad­i­cal prob­a­bil­ism to stan­dard bayesian prob­a­bil­ism, which breaks down the de­par­ture care­fully, while cov­er­ing the fun­da­men­tals of rad­i­cal prob­a­bil­ism. In ad­di­tion to Bayesian con­di­tion­ing and Jeffrey con­di­tion­ing, it in­tro­duces Adams con­di­tion­ing, a new type of con­di­tion­ing which will be valid in many cases (for the same sort of rea­son as why Jeffrey con­di­tion­ing or Bayesian con­di­tion­ing can be valid). He con­tends that there are, nonethe­less, many more ways to up­date be­yond these; and, he illus­trates this with a pur­ported ex­am­ple where none of those up­dates seems to be the cor­rect one.

Episte­mol­ogy Prob­a­bi­lized, Richard Jeffrey

  • The man him­self. This es­say fo­cuses mainly on how to up­date on like­li­hood ra­tios rather than di­rectly perform­ing Jeffrey up­dates (what I called vir­tual ev­i­dence). The mo­ti­va­tions are rather prac­ti­cal—up­dat­ing on ex­pert ad­vice when you don’t know pre­cisely what ob­ser­va­tions lead to that ad­vice.

I was a Teenage Log­i­cal Pos­i­tivist (Now a Sep­tu­a­ge­nar­ian Rad­i­cal Prob­a­bil­ist), Richard Jeffrey.

  • Richard Jeffrey re­flects on his life and philos­o­phy.

Prob­a­bil­is­tic Rea­son­ing in In­tel­li­gent Sys­tems, Judea Pearl.

  • See es­pe­cially chap­ter 2, es­pe­cially 2.2.2 and 2.3.3.

Log­i­cal In­duc­tion, Garrabrant et al.

*: Jeffrey ac­tu­ally used this phrase. See I was a Teenage Log­i­cal Pos­i­tivist, linked above.