# What is Bayesianism?

You've probably seen the word 'Bayesian' used a lot on this site, but may be a bit uncertain of what exactly we mean by that. You may have read the intuitive explanation, but that only seems to explain a certain math formula. There's a wiki entry about "Bayesian", but that doesn't help much. And the LW usage seems different from just the "Bayesian and frequentist statistics" thing, too. As far as I can tell, there's no article explicitly defining what's meant by Bayesianism. The core ideas are sprinkled across a large amount of posts, 'Bayesian' has its own tag, but there's not a single post that explicitly comes out to make the connections and say "this is Bayesianism". So let me try to offer my definition, which boils Bayesianism down to three core tenets.

We’ll start with a brief ex­am­ple, illus­trat­ing Bayes’ the­o­rem. Sup­pose you are a doc­tor, and a pa­tient comes to you, com­plain­ing about a headache. Fur­ther sup­pose that there are two rea­sons for why peo­ple get headaches: they might have a brain tu­mor, or they might have a cold. A brain tu­mor always causes a headache, but ex­ceed­ingly few peo­ple have a brain tu­mor. In con­trast, a headache is rarely a symp­tom for cold, but most peo­ple man­age to catch a cold ev­ery sin­gle year. Given no other in­for­ma­tion, do you think it more likely that the headache is caused by a tu­mor, or by a cold?

If you thought a cold was more likely, well, that was the an­swer I was af­ter. Even if a brain tu­mor caused a headache ev­ery time, and a cold caused a headache only one per cent of the time (say), hav­ing a cold is so much more com­mon that it’s go­ing to cause a lot more headaches than brain tu­mors do. Bayes’ the­o­rem, ba­si­cally, says that if cause A might be the rea­son for symp­tom X, then we have to take into ac­count both the prob­a­bil­ity that A caused X (found, roughly, by mul­ti­ply­ing the fre­quency of A with the chance that A causes X) and the prob­a­bil­ity that any­thing else caused X. (For a thor­ough math­e­mat­i­cal treat­ment of Bayes’ the­o­rem, see Eliezer’s In­tu­itive Ex­pla­na­tion.)

There should be noth­ing sur­pris­ing about that, of course. Sup­pose you’re out­side, and you see a per­son run­ning. They might be run­ning for the sake of ex­er­cise, or they might be run­ning be­cause they’re in a hurry some­where, or they might even be run­ning be­cause it’s cold and they want to stay warm. To figure out which one is the case, you’ll try to con­sider which of the ex­pla­na­tions is true most of­ten, and fits the cir­cum­stances best.

Core tenet 1: Any given ob­ser­va­tion has many differ­ent pos­si­ble causes.

Ac­knowl­edg­ing this, how­ever, leads to a some­what less in­tu­itive re­al­iza­tion. For any given ob­ser­va­tion, how you should in­ter­pret it always de­pends on pre­vi­ous in­for­ma­tion. Sim­ply see­ing that the per­son was run­ning wasn’t enough to tell you that they were in a hurry, or that they were get­ting some ex­er­cise. Or sup­pose you had to choose be­tween two com­pet­ing sci­en­tific the­o­ries about the mo­tion of planets. A the­ory about the laws of physics gov­ern­ing the mo­tion of planets, de­vised by Sir Isaac New­ton, or a the­ory sim­ply stat­ing that the Fly­ing Spaghetti Mon­ster pushes the planets for­wards with His Noodly Ap­pendage. If these both the­o­ries made the same pre­dic­tions, you’d have to de­pend on your prior knowl­edge—your prior, for short—to judge which one was more likely. And even if they didn’t make the same pre­dic­tions, you’d need some prior knowl­edge that told you which of the pre­dic­tions were bet­ter, or that the pre­dic­tions mat­ter in the first place (as op­posed to, say, the­o­ret­i­cal el­e­gance).

Or take the de­bate we had on 9/​11 con­spir­acy the­o­ries. Some peo­ple thought that un­ex­plained and oth­er­wise sus­pi­cious things in the offi­cial ac­count had to mean that it was a gov­ern­ment con­spir­acy. Others con­sid­ered their prior for “the gov­ern­ment is ready to con­duct mas­sively risky op­er­a­tions that kill thou­sands of its own cit­i­zens as a pub­lic­ity stunt”, judged that to be over­whelm­ingly un­likely, and thought it far more prob­a­ble that some­thing else caused the sus­pi­cious things.

Again, this might seem ob­vi­ous. But there are many well-known in­stances in which peo­ple for­get to ap­ply this in­for­ma­tion. Take su­per­nat­u­ral phe­nom­ena: yes, if there were spirits or gods in­fluenc­ing our world, some of the things peo­ple ex­pe­rience would cer­tainly be the kinds of things that su­per­nat­u­ral be­ings cause. But then there are also countless of mun­dane ex­pla­na­tions, from co­in­ci­dences to men­tal di­s­or­ders to an over­ac­tive imag­i­na­tion, that could cause them to per­ceived. Most of the time, pos­tu­lat­ing a su­per­nat­u­ral ex­pla­na­tion shouldn’t even oc­cur to you, be­cause the mun­dane causes already have lots of ev­i­dence in their fa­vor and su­per­nat­u­ral causes have none.

Core tenet 2: How we in­ter­pret any event, and the new in­for­ma­tion we get from any­thing, de­pends on in­for­ma­tion we already had.

Sub-tenet 1: If you ex­pe­rience some­thing that you think could only be caused by cause A, ask your­self “if this cause didn’t ex­ist, would I re­gard­less ex­pect to ex­pe­rience this with equal prob­a­bil­ity?” If the an­swer is “yes”, then it prob­a­bly wasn’t cause A.

This re­al­iza­tion, in turn, leads us to

Core tenet 3: We can use the con­cept of prob­a­bil­ity to mea­sure our sub­jec­tive be­lief in some­thing. Fur­ther­more, we can ap­ply the math­e­mat­i­cal laws re­gard­ing prob­a­bil­ity to choos­ing be­tween differ­ent be­liefs. If we want our be­liefs to be cor­rect, we must do so.

The fact that any­thing can be caused by an in­finite amount of things ex­plains why Bayesi­ans are so strict about the the­o­ries they’ll en­dorse. It isn’t enough that a the­ory ex­plains a phe­nomenon; if it can ex­plain too many things, it isn’t a good the­ory. Re­mem­ber that if you’d ex­pect to ex­pe­rience some­thing even when your sup­posed cause was un­true, then that’s no ev­i­dence for your cause. Like­wise, if a the­ory can ex­plain any­thing you see—if the the­ory al­lowed any pos­si­ble event—then noth­ing you see can be ev­i­dence for the the­ory.

At its heart, Bayesi­anism isn’t any­thing more com­plex than this: a mind­set that takes three core tenets fully into ac­count. Add a sprin­kle of ideal­ism: a perfect Bayesian is some­one who pro­cesses all in­for­ma­tion perfectly, and always ar­rives at the best con­clu­sions that can be drawn from the data. When we talk about Bayesi­anism, that’s the ideal we aim for.

Fully in­ter­nal­ized, that mind­set does tend to color your thought in its own, pe­cu­liar way. Once you re­al­ize that all the be­liefs you have to­day are based—in a mechanis­tic, lawful fash­ion—on the be­liefs you had yes­ter­day, which were based on the be­liefs you had last year, which were based on the be­liefs you had as a child, which were based on the as­sump­tions about the world that were em­bed­ded in your brain while you were grow­ing in your mother’s womb… it does make you ques­tion your be­liefs more. Won­der about whether all of those pre­vi­ous be­liefs re­ally cor­re­sponded max­i­mally to re­al­ity.

And that’s ba­si­cally what this site is for: to help us be­come good Bayesi­ans.

• is there a sim­ple ex­pla­na­tion of the con­flict be­tween bayesi­anism and fre­quen­tial­ism? I have sort of a feel for it from read­ing back­ground ma­te­ri­als but a spe­cific ex­am­ple where they yield differ­ent pre­dic­tions would be awe­some. has such already been posted be­fore?

• Eliezer’s views as ex­pressed in Blue­berry’s links touch on a key iden­ti­fy­ing char­ac­ter­is­tic of fre­quen­tism: the ten­dency to think of prob­a­bil­ities as in­her­ent prop­er­ties of ob­jects. More con­cretely, a pure fre­quen­tist (a be­ing as rare as a pure Bayesian) treats prob­a­bil­ities as proper only to out­comes of a re­peat­able ran­dom ex­per­i­ment. (The defi­ni­tion of such a thing is pretty tricky, of course.)

What does that mean for fre­quen­tist statis­ti­cal in­fer­ence? Well, it’s for­bid­den to as­sign prob­a­bil­ities to any­thing that is de­ter­minis­tic in your model of re­al­ity. So you have es­ti­ma­tors, which are func­tions of the ran­dom data and thus ran­dom them­selves, and you as­sess how good they are for your pur­pose by look­ing at their sam­pling dis­tri­bu­tions. You have con­fi­dence in­ter­val pro­ce­dures, the end­points of which are ran­dom vari­ables, and you as­sess the sam­pling prob­a­bil­ity that the in­ter­val con­tains the true value of the pa­ram­e­ter (and the width of the in­ter­val, to avoid patholog­i­cal in­ter­vals that have noth­ing to do with the data). You have statis­ti­cal hy­poth­e­sis test­ing, which cat­e­go­rizes a sim­ple hy­poth­e­sis as “re­jected” or “not re­jected” based on a pro­ce­dure as­sessed in terms of the sam­pling prob­a­bil­ity of an er­ror in the cat­e­go­riza­tion. You have, ba­si­cally, any­thing you can come up with, pro­vided you jus­tify it in terms of its sam­pling prop­er­ties over in­finitely re­peated ran­dom ex­per­i­ments.

• Here is a more gen­eral defi­ni­tion of “pure fre­quen­tism” (which in­cludes fre­quen­tists such as Re­ichen­bach):

Con­sider an as­ser­tion of prob­a­bil­ity of the form “This X has prob­a­bil­ity p of be­ing a Y.” A fre­quen­tist holds that this as­ser­tion is mean­ingful only if the fol­low­ing con­di­tions are met:

1. The speaker has already speci­fied a de­ter­mi­nate set X of things that ac­tu­ally have or will ex­ist, and this set con­tains “this X”.

2. The speaker has already speci­fied a de­ter­mi­nate set Y con­tain­ing all things that have been or will be Ys.

The as­ser­tion is true if the pro­por­tion of el­e­ments of X that are also in Y is pre­cisely p.

A few re­marks:

1. The as­ser­tion would mean some­thing differ­ent if the speaker had speci­fied differ­ent sets X and Y, even though X and Y aren’t men­tioned ex­plic­itly in the as­ser­tion.

2. If no such sets had been speci­fied in the pre­ced­ing dis­course, the as­ser­tion by it­self would be mean­ingless.

3. How­ever, the speaker has com­plete free­dom in what to take as the set X con­tain­ing “this X”, so long as X con­tains X. In par­tic­u­lar, the other el­e­ments don’t have to be ex­actly like X, or be gen­er­ated by ex­actly the same re­peat­able pro­ce­dure, or any­thing like that. There are prac­ti­cal con­straints on X, though. For ex­am­ple, X should be an in­ter­est­ing set.

4. [ETA:] An im­por­tant dis­tinc­tion be­tween Bayesi­anism and Fre­quen­tism is this: Note that, ac­cord­ing to the above, the cor­rect prob­a­bil­ity has noth­ing to do with the state of knowl­edge of the speaker. Once the sets X and Y are de­ter­mined, there is an ob­jec­tive fact of the mat­ter re­gard­ing the pro­por­tion of things in X that are also in Y. The speaker is ob­jec­tively right or wrong in as­sert­ing that this pro­por­tion is p, and that right­ness or wrong­ness had noth­ing to do with what the speaker knew. It had only to do with the ob­jec­tive fre­quency of el­e­ments of Y among the el­e­ments of X.

• I’m sorry to see such wrong­headed views of fre­quen­tism here. Fre­quen­tists also as­sign prob­a­bil­ities to events where the prob­a­bil­is­tic in­tro­duc­tion is en­tirely based on limited in­for­ma­tion rather than a literal ran­domly gen­er­ated phe­nomenon. If Fisher or Ney­man was ever ac­tu­ally read by peo­ple pur­port­ing to un­der­stand fre­quen­tist/​Bayesian is­sues, they’d have a rad­i­cally differ­ent idea. Read­ers to this blog should take it upon them­selves to check out some of the vast over­sim­plifi­ca­tions… And I’m sorry but Re­ichen­bach’s fre­quen­tism has very lit­tle to do with fre­quen­tist statis­tics--. Re­ichen­bach, a philoso­pher, had an idea that propo­si­tions had fre­quen­tist prob­a­bil­ities. So sci­en­tific hy­pothe­ses—which would not be as­signed prob­a­bil­ities by fre­quen­tist statis­ti­ci­ans—could have fre­quen­tist prob­a­bil­ities for Re­ichen­bach, even though he didn’t think we knew enough yet to judge them. He thought at some point we’d be able to judge of a hy­poth­e­sis of a type how fre­quently hy­poth­e­sis like it would be true. I think it’s a prob­le­matic idea, but my point was just to illus­trate that some large items are be­ing mis­rep­re­sented here, and peo­ple sold a wrong­headed view. Just in case any­one cares. Sorry to in­ter­rupt the con­ver­sa­tion (er­rorstatis­tics.com)

• What does that mean for fre­quen­tist statis­ti­cal in­fer­ence? Well, it’s for­bid­den to as­sign prob­a­bil­ities to any­thing that is de­ter­minis­tic in your model of re­al­ity.

Wait—Bayesi­ans can as­sign prob­a­bil­ities to things that are de­ter­minis­tic? What does that mean?

What would a Bayesian do in­stead of a T-test?

• Wait—Bayesi­ans can as­sign prob­a­bil­ities to things that are de­ter­minis­tic? What does that mean?

Ab­solutely!

The Bayesian philos­o­phy is that prob­a­bil­ities are about states of knowl­edge. Prob­a­bil­ity is rea­son­ing with in­com­plete in­for­ma­tion, not about whether an event is “de­ter­minis­tic”, as prob­a­bil­ities do still make sense in a com­pletely de­ter­minis­tic uni­verse. In a poker game, there are al­most surely no quan­tum events in­fluenc­ing how the deck is shuffled. Clas­si­cal me­chan­ics, which is de­ter­minis­tic, suffices to pre­dict the or­der­ing of cards. Even so, we have nei­ther suffi­cient ini­tial con­di­tions (on all the par­ti­cles in the dealer’s body and brain, and any in­com­ing sig­nals), nor com­pu­ta­tional power to calcu­late the or­der­ing of the cards. In this case, we can still use prob­a­bil­ity the­ory to figure out prob­a­bil­ities of var­i­ous hand com­bi­na­tions that we can use to guide our bet­ting. In­cor­po­rat­ing knowl­edge of what cards I’ve been dealt, and what (if any) are pub­lic is straight­for­ward. In­cor­po­rat­ing player’s ac­tions and re­ac­tions is much harder, and not re­ally well enough defined that there is a math­e­mat­i­cally cor­rect an­swer, but clearly we should use that knowl­edge in de­ter­min­ing what types of hands we think it likely for our op­po­nents to have. If we count as the dealer shuffles, and see he only shuffled three or four times, in prin­ci­ple we can (given a rea­son­able math­e­mat­i­cal model of shuffling, such as the one Di­a­co­nis con­structed to give the re­sult that 7 shuffles are needed to ran­dom­ize a deck) use the cor­re­la­tions left in there to give us even more clues about op­po­nents’ likely hands.

What would a Bayesian do in­stead of a T-test?

In most cases we’d step back, and ask what you were try­ing to do, such that a T-test seemed like a good idea.

For those un­aware, a t-test is a way of calcu­lat­ing the “like­li­hood” for the null hy­poth­e­sis, which mea­sures how likely the data are given that model. If the data is even mod­er­ately com­pat­i­ble, Fre­quen­tists say “we can’t re­ject it”. If it is ter­ribly un­likely, the Fre­quen­tists say that it can be re­jected—that it’s worth look­ing at an­other model.

From a Bayesian per­spec­tive, this is some­what back­wards—we don’t re­ally care how likely the data is given this model P(D|M) -- af­ter all, we ac­tu­ally got the data. We effec­tively want to know how use­ful the model is, now that we know this data. Some sim­ple con­sis­tency re­quire­ments and scal­ing con­straints means that this use­ful­ness has to act just like a prob­a­bil­ity. So let’s just call it the prob­a­bil­ity of the model, given the data: P(M|D). A small bit of alge­bra gives us that P(M|D) = P(D|M) * P(M)/​P(D), where P(D) is the sum over all mod­els i of P(D|M_i) P(M_i), and P(M_i) is some “prior prob­a­bil­ity” of each model—how use­ful we think that model would be, even with­out any data col­lected (But, im­por­tantly, with some back­ground knowl­edge).

In this frame­work, we don’t have ab­solute ob­jec­tive lev­els of con­fi­dence in our the­o­ries. All that is ab­solute and ob­jec­tive is how the data should change our con­fi­dence in var­i­ous the­o­ries. We can’t just re­ject a the­ory if the data don’t match well, un­less we have a bet­ter al­ter­na­tive the­ory to which we can switch. In many cases these mod­els can be con­tin­u­ously in­dexed, such that the in­dex cor­re­sponds to a pa­ram­e­ter in a unified model, then this be­comes pa­ram­e­ter es­ti­ma­tion—we get a range of the­o­ries with prob­a­bil­ity den­si­ties in­stead of prob­a­bil­ities, or equiv­a­lently, one the­ory with a prob­a­bil­ity den­sity on a pa­ram­e­ter, and get­ting new data me­chan­i­cally turns a crank to give us a new prob­a­bil­ity den­sity on this pa­ram­e­ter.

There are a cou­ple un­satis­fy­ing bits here:
First it re­ally would be nice to say “this the­ory is ridicu­lous be­cause it doesn’t ex­plain the data” with­out any refer­ence to any other the­ory. But if we know it’s the only the­ory in town, we don’t have a choice. If it’s not the only the­ory in town, then how use­ful it is can re­ally only co­her­ently be mea­sured rel­a­tive to how use­ful other the­o­ries are.
Se­cond, we need to give “prior prob­a­bil­ities” to our var­i­ous the­o­ries, and the math doesn’t give any di­rect jus­tifi­ca­tions for what these should be. How­ever, as long as these aren’t crazy, the in­com­ing data will con­tin­u­ously up­date these so that the ones that seem more use­ful will get weighted as more use­ful, and the ones that aren’t will get weighted as less use­ful. This of course means we need rea­son­able spaces of the­o­ries to work over, and we’ll only pick a good model if we have a good model in this space of the­o­ries. If you even­tu­ally re­al­ize that “hey, all these mod­els are crappy”, there is no good way of ex­pand­ing the set of mod­els you’re will­ing to con­sider, though a com­mon way is to just “start over” with an ex­panded model space, and re­al­lo­cated prior prob­a­bil­ities. You can’t just pre­tend that the first anal­y­sis was over some sub­set of this anal­y­sis, be­cause the rescal­ing due to the P(D) term de­pends on the set of mod­els you have. (Though you can hand­wave that you weren’t ac­tu­ally calcu­lat­ing P(M_i|D), but P(M_i|D, {M}), the prob­a­bil­ity of each model given the data, as­sum­ing that it was one of these mod­els).

A some­times use­ful short­cut is rather than work­ing di­rectly with the prob­a­bil­ities, and hence need­ing the rescal­ing is to work with the like­li­hoods (or more tractably, the log of them). The differ­ence of the log like­li­hoods of two differ­ent the­o­ries for some data is a rea­son­able mea­sure of how much that data should effect their rel­a­tive rank­ing. But any given like­li­hood by it­self hasn’t much mean­ing—only in com­par­i­son to the rest in a set tells you any­thing use­ful.

• Very nice! I’d only re­place “use­ful” with “plau­si­ble”. (Sure, it’s hard to define plau­si­bil­ity, but use­ful­ness is not re­ally the right con­cept.)

• “Use­ful­ness” cer­tainly isn’t the or­tho­dox Bayesian phras­ing. I call my­self a Bayesian be­cause I rec­og­nize that Bayes’s Rule is the right thing to use in these situ­a­tions. Whether or not the prob­a­bil­ities as­signed to hy­pothe­ses “ac­tu­ally are” prob­a­bil­ities (what­ever that means), they should obey the same math­e­mat­i­cal rules of calcu­la­tion as prob­a­bil­ities.

But pre­cisely be­cause only the ma­nipu­la­tion rules mat­ter, I’m not sure it is worth em­pha­siz­ing that “to be a good Bayesian” you must ac­cord these prob­a­bil­ities the same sta­tus as other prob­a­bil­ities. A hard­core Fre­quen­tist is not go­ing to be com­fortable do­ing that. Heck, I’m not sure I’m com­fortable do­ing that. Data and event prob­a­bil­ities are things that can even­tu­ally be “re­solved” to true or false, by look­ing af­ter the fact. Prob­a­bil­ity as plau­si­bil­ity makes sense for these things.

But for hy­pothe­ses and mod­els, I ask my­self “plau­si­bil­ity of what? Be­ing true?” Al­most cer­tainly, the “real” model (when that even makes sense) isn’t in our space of mod­els. For ex­am­ple, a com­mon, al­most nec­es­sary, as­sump­tion is ex­change­abil­ity: that any given per­mu­ta­tion of the data is equally likely—effec­tively that all data points are drawn from the same dis­tri­bu­tion. Data of­ten doesn’t be­have like that, in­stead hav­ing a time drift. Coins be­ing tossed de­velop wear, cards be­ing shuffled and dealt get bent.

I re­ally do pre­fer to think of some mod­els be­ing more or less use­ful. Of course, fol­low­ing this path shades into de­ci­sion the­ory: we might want to as­sign pri­ors ac­cord­ing to how “tractable” the mod­els are, in­clud­ing both in speci­fi­ca­tion (stupid mod­els that just spec­ify what the data will be take lots of speci­fi­ca­tion, so should have lower ini­tial prob­a­bil­ities). Models that take longer to com­pute data prob­a­bil­ities should similarly have a prob­a­bil­ity penalty, not sim­ply be­cause they’re im­plau­si­ble, but be­cause we don’t want to use them un­less the data force us to.

• ...shades into de­ci­sion the­ory...Models that take longer to com­pute data prob­a­bil­ities should similarly have a prob­a­bil­ity penalty, not sim­ply be­cause they’re im­plau­si­ble, but be­cause we don’t want to use them un­less the data force us to.

Whoa! that sounds dan­ger­ous! Why not keep the be­liefs and costs sep­a­rate and only ap­ply this penalty at the de­ci­sion the­ory stage?

• Well, I said shaded into the lines of de­ci­sion the­ory...

Yes, it ab­solutely is dan­ger­ous, and think­ing about it more I agree it should not be done this way. Prob­a­bil­ity penalties do not scale cor­rectly with the data col­lected: they’re es­sen­tially just a fixed offset. Mod­ified util­ity of us­ing a par­tic­u­lar method re­ally is differ­ent. If a method is un­us­able, we shouldn’t use it, and meth­ods that trade off ac­cu­racy for man­age­abil­ity should be de­cided at that level, once we can judge the ac­cu­racy—not ear­lier.

EDIT: I sup­pose I was hop­ing for a valid way of jus­tify­ing the fact that we throw out mod­els that are too hard to use or an­a­lyze—they never make it into our set of hy­pothe­ses in the first place. It’s amaz­ing how of­ten con­ju­gate pri­ors “just hap­pen” to be cho­sen...

• Models that take longer to com­pute data prob­a­bil­ities should similarly have a prob­a­bil­ity penalty, not sim­ply be­cause they’re im­plau­si­ble, but be­cause we don’t want to use them un­less the data force us to.

I am much more com­fortable leav­ing prob­a­bil­ity as it is but us­ing a differ­ent term for use­ful­ness.

• But for hy­pothe­ses and mod­els, I ask my­self “plau­si­bil­ity of what? Be­ing true?”

Plau­si­bil­ity of be­ing true given the prior in­for­ma­tion. Just as Aris­totelian logic gives valid ar­gu­ments (but not nec­es­sar­ily sound ones), Bayes’s the­o­rem gives valid but not nec­es­sar­ily sound plau­si­bil­ity as­sess­ments.

fol­low­ing this path shades into de­ci­sion theory

That’s pretty much why I wanted to make the dis­tinc­tion be­tween plau­si­bil­ity and use­ful­ness. One of the things I like about the Cox-Jaynes ap­proach is that it cleanly splits in­fer­ence and de­ci­sion-mak­ing apart.

• Plau­si­bil­ity of be­ing true given the prior in­for­ma­tion.

Okay, sure we can go back to the Bayesian mantra of “all prob­a­bil­ities are con­di­tional prob­a­bil­ities”. But our prior in­for­ma­tion effec­tively in­cludes the state­ment that one of our mod­els is the “true one”. And that’s never the ac­tual case, so our ar­gu­ments are never sound in this sense, be­cause we are forced to work from prior in­for­ma­tion that isn’t true. This isn’t a huge prob­lem, but it in some sense un­der­mines the mo­ti­va­tion for find­ing these prob­a­bil­ities and treat­ing them se­ri­ously—they’re con­di­tional prob­a­bil­ities be­ing ap­plied in a case where we know that what is be­ing con­di­tioned on is false. What is the ground­ing to our ac­tual situ­a­tion? I like to take the stance that in prac­tice this is still use­ful—as an ap­prox­i­ma­tion pro­ce­dure—sort­ing through mod­els that are ap­prox­i­mately right.

• And that’s never the ac­tual case, so our ar­gu­ments are never sound in this sense, be­cause we are forced to work from prior in­for­ma­tion that isn’t true.

One does gen­er­ally re­sort to non-Bayesian model check­ing meth­ods. An­drew Gel­man likes to in­clude such checks un­der the rubric of “Bayesian data anal­y­sis”; he calls the com­put­ing of pos­te­rior prob­a­bil­ities and den­si­ties “Bayesian in­fer­ence”, a pre­ced­ing sub­com­po­nent of Bayesian data anal­y­sis. This makes for sen­si­ble statis­ti­cal prac­tice, but the un­der­pin­nings aren’t strong. One might con­sider it an at­tempt to ap­prox­i­mate the Solomonoff prior.

• Yes, in prac­tice peo­ple re­sort to less mo­ti­vated meth­ods that work well.

I’d re­ally like to see some prin­ci­pled an­swer that has the same feel as Bayesi­anism though. As it stands, I have no prob­lem us­ing Bayesian meth­ods for pa­ram­e­ter es­ti­ma­tion. This is nat­u­ral be­cause we re­ally are get­ting pdf(pa­ram­e­ters | data, model). But for model se­lec­tion and eval­u­a­tion (i.e. non-para­met­ric Bayes) I always feel that I need an “es­cape hatch” to in­clude new mod­els that the Bayes for­mal­ism sim­ply doesn’t have any place for.

• the ten­dency to think of prob­a­bil­ities as in­her­ent prop­er­ties of ob­jects.

yeah, this was my in­tu­itive rea­son for think­ing fre­quen­tists are a lit­tle crazy.

• On the other hand, it’s ev­i­dence to me that we’re talk­ing about differ­ent types of minds. Have we iden­ti­fied whether this as­pect of fre­quen­tism is a choice, or just the way their minds work?

I’m a fre­quen­tist, I think, and when I in­ter­ro­gate my in­tu­ition about whether 50% heads /​ 50% tails is a prop­erty of a fair coin, it re­turns ‘yes’. How­ever, I un­der­stand that this prop­erty is an ab­stract one, and my in­tu­ition doesn’t make any differ­ent em­piri­cal pre­dic­tions about the coin than a Bayesian would. Thus, what differ­ence does it make if I find it nat­u­ral to as­sign this prop­erty?

In other words, in what (em­piri­cally mea­surable!) sense could it be crazy?

Well, the im­me­di­ate ob­jec­tion is that if you hand the coin to a skil­led tosser, the fre­quen­cies of heads and tails in the tosses can be markedly differ­ent than 50%. If you put this prob­a­bil­ity in the coin, than you re­ally aren’t mod­el­ing things in a man­ner that ac­cords with re­sults. You can, of course talk in­stead about a pro­ce­dure of coin-toss­ing, that nat­u­rally has to spec­ify the coin as well.

Of course, that merely pushes things back a level. If you com­pletely spec­ify the toss­ing pro­ce­dure (peo­ple have built coin-toss­ing ma­chines), then you can re­peat­edly get 100%/​0% splits by care­ful tun­ing. If you don’t know whether it is tuned to 100% heads or 100% tails, is it still use­ful to de­scribe this situ­a­tion prob­a­bil­is­ti­cally? A hard-core Fre­quen­tist “should” say no, as ev­ery­thing is de­ter­minis­tic. Most peo­ple are will­ing to al­low that 50% prob­a­bil­ity is a rea­son­able de­scrip­tion of the situ­a­tion. To the ex­tent that you do al­low this, you are Bayesian. To the ex­tent that you don’t, you’re miss­ing an ap­par­ently valuable tech­nique.

• The fre­quen­tist can ac­count for the bi­ased toss and de­ter­minism, in var­i­ous ways.

My preferred re­ply would be that the 5050 is a prop­erty of the sym­me­try of the coin. (Of course, it’s a prop­erty of an ideal­ized coin. Heck, a real coin can land bal­anced on its edge.) If some­one tosses the coin in a way that bi­ases the coin, she has ac­tu­ally bro­ken the sym­me­try in some way with her ini­tial con­di­tions. In par­tic­u­lar, the tosser must be­gin with the knowl­edge of which way she is hold­ing the coin—if she doesn’t know, she can’t bias the out­come of the coin.

I un­der­stand that Bayesian’s don’t tend to ab­stract things to their ideal­ized forms … I won­der to what ex­tent Fre­quen­tism does this nec­es­sar­ily. (What is the re­la­tion­ship be­tween Fre­quen­tism and Pla­ton­ism?)

• The fre­quen­tist can ac­count for these things, in var­i­ous ways.

Oh, ab­solutely. The typ­i­cal way is choos­ing some refer­ence class of ideal­ized ex­per­i­ments that could be done. Of course, the right choice of refer­ence class is just as ar­bi­trary as the right choice of Bayesian prior.

My preferred re­ply would be that the 5050 is a prop­erty of the sym­me­try of the coin.

Whereas the Bayesian would ar­gue that the 5050 prop­erty is a sym­me­try about our knowl­edge of the coin—even a coin that you know is bi­ased, but that you have no ev­i­dence for which way it is bi­ased.

I un­der­stand that Bayesian’s don’t tend to ab­stract things to their ideal­ized forms

Well, I don’t think Bayesi­ans are par­tic­u­larly re­luc­tant to look at ideal­ized forms, it’s just that when you can make your model more closely match the situ­a­tion (with­out in­cur­ring hor­ren­dous calcu­la­tional difficul­ties) there is a benefit to do so.

And of course, the ques­tion is “which ideal­ized form?” There are many ways to ideal­ize al­most any situ­a­tion, and I think talk­ing about “the” ideal­ized form can be mis­lead­ing. Talk­ing about a “fair coin” is already a se­ri­ous ab­strac­tion and ideal­iza­tion, but it’s one that has, of course, proven quite use­ful.

I won­der to what ex­tent Fre­quen­tism does this nec­es­sar­ily. (What is the re­la­tion­ship be­tween Fre­quen­tism and Pla­ton­ism?)

That’s a very in­ter­est­ing ques­tion.

• What is the re­la­tion­ship be­tween Fre­quen­tism and Pla­ton­ism?

To quote from Gel­man’s re­join­der that Phil Goetz men­tioned,

In a nut­shell: Bayesian statis­tics is about mak­ing prob­a­bil­ity state­ments, fre­quen­tist statis­tics is about eval­u­at­ing prob­a­bil­ity state­ments.

So, speak­ing very loosely, Bayesi­anism is to sci­ence, in­duc­tive logic, and Aris­totelianism as fre­quen­tism is to math, de­duc­tive logic, and Pla­ton­ism. That is, Bayesi­anism is syn­the­sis; fre­quen­tism is anal­y­sis.

• In­ter­est­ing! That makes a lot of sense to me, be­cause I had already made con­nec­tions be­tween sci­ence and Aris­totelianism, pure math and Pla­ton­ism.

• If it helps, I think this is an ex­am­ple of a prob­lem where they give differ­ent an­swers to the same prob­lem. From Jaynes; see http://​​bayes.wustl.edu/​​etj/​​ar­ti­cles/​​con­fi­dence.pdf , page 22 for the de­tails, and please let me know if I’ve erred or mis­in­ter­preted the ex­am­ple.

Three iden­ti­cal com­po­nents. You run them through a re­li­a­bil­ity test and they fail at times 12, 14, and 16 hours. You know that these com­po­nents fail in a par­tic­u­lar way: they last at least X hours, then have a life­time that you as­sess as an ex­po­nen­tial dis­tri­bu­tion with an av­er­age of 1 hour. What is the short­est 90% con­fi­dence in­ter­val /​ prob­a­bil­ity in­ter­val for X, the time of guaran­teed safe op­er­a­tion?

Fre­quen­tist 90% con­fi­dence in­ter­val: 12.1 hours − 13.8 hours

Bayesian 90% prob­a­bil­ity in­ter­val: 11.2 hours − 12.0 hours

Note: the fre­quen­tist in­ter­val has the strange prop­erty that we know for sure that the 90% con­fi­dence in­ter­val does not con­tain X (from the data we know that X ⇐ 12). The Bayesian in­ter­val seems to match our com­mon sense bet­ter.

• Heh, that’s a cheeky ex­am­ple. To ex­plain why it’s cheeky, I have to briefly run through it, which I’ll do here (us­ing Jaynes’s sym­bols so who­ever clicked through and has pages 22-24 open can di­rectly com­pare my sum­mary with Jaynes’s ex­po­si­tion).

Call N the sam­ple size and θ the min­i­mum pos­si­ble wid­get life­time (what bill calls X). Jaynes first builds a fre­quen­tist con­fi­dence in­ter­val around θ by defin­ing the un­bi­ased es­ti­ma­tor θ∗, which is the ob­ser­va­tions’ mean minus one. (Sub­tract­ing one ac­counts for the sam­ple mean be­ing >θ.) θ∗’s prob­a­bil­ity dis­tri­bu­tion turns out to be y^(N-1) exp(-Ny), where y = θ∗ - θ + 1. Note that y is es­sen­tially a mea­sure of how far our es­ti­ma­tor θ∗ is from the true θ, so Jaynes now has a pdf for that. Jaynes in­te­grates that pdf to get y’s cdf, which he calls F(y). He then makes the 90% CI by com­put­ing [y1, y2] such that F(y2) - F(y1) = 0.9. That gives [0.1736, 1.8259]. Sub­sti­tut­ing in N and θ∗ for the sam­ple and a lit­tle alge­bra (to get a CI cor­re­spond­ing to θ∗ rather than y) gives his θ CI of [12.1471, 13.8264].

For the Bayesian CI, Jaynes takes a con­stant prior, then jumps straight to the pos­te­rior be­ing N exp(N(θ - x1)), where x1′s the small­est life­time in the sam­ple (12 in this case). He then comes up with the small­est in­ter­val that en­com­passes 90% of the pos­te­rior prob­a­bil­ity, and it turns out to be [11.23, 12].

Jaynes rightly ob­serves that the Bayesian CI ac­cords with com­mon sense, and the fre­quen­tist CI does not. This com­par­i­son is what feels cheeky to me.

Why? Be­cause Jaynes has used differ­ent es­ti­ma­tors for the two meth­ods [edit: I had pre­vi­ously writ­ten here that Jaynes im­plic­itly used differ­ent es­ti­ma­tors, but this is ac­tu­ally false; when he dis­cusses the ex­am­ple sub­se­quently (see p. 25 of the PDF) he fleshes out this point in terms of suffi­cient v. non-suffi­cient statis­tics.]. For the Bayesian CI, Jaynes effec­tively uses the min­i­mum life­time as his es­ti­ma­tor for θ (by defin­ing the like­li­hood to be solely a func­tion of the small­est ob­ser­va­tion, in­stead of all of them), but for the fre­quen­tist CI, he ex­plic­itly uses the mean life­time minus 1. If Jaynes-as-fre­quen­tist had hap­pened to use the max­i­mum like­li­hood es­ti­ma­tor—which turns out to be the min­i­mum life­time here—in­stead of an ar­bi­trary un­bi­ased es­ti­ma­tor he would’ve got­ten pre­cisely the same re­sult as Jaynes-as-Bayesian.

So it seems to me that the ex­er­cise just demon­strates that Bayesi­anism-done-slyly out­performed fre­quen­tism-done-mind­lessly. I can imag­ine that if I had tried to do the same ex­er­cise from scratch, I would have ended up faux-prov­ing the re­verse: that the Bayesian CI was dumber than the fre­quen­tist’s. I would’ve just picked up a bor­ing, old-fash­ioned, not es­pe­cially Bayesian refer­ence book to look up the MLE, and used its sam­pling dis­tri­bu­tion to get my fre­quen­tist CI: that would’ve given me the com­mon sense CI [11.23, 12]. Then I’d con­struct the Bayesian CI by me­chan­i­cally defin­ing the like­li­hood as the product of the in­di­vi­d­ual ob­ser­va­tions’ like­li­hoods. That last step, I am pretty sure but can­not im­me­di­ately prove, would give me a crappy Bayesian CI like [12.1471, 13.8264], if not that very in­ter­val.

Ul­ti­mately, at least in this case, I reckon your choice of es­ti­ma­tor is far more im­por­tant than whether you have a por­trait of Bayes or Ney­man on your wall.

[Edited to re­place my as­ter­isks with ∗ so I don’t mess up the for­mat­ting.]

• So it seems to me that the ex­er­cise just demon­strates that Bayesi­anism-done-slyly out­performed fre­quen­tism-done-mind­lessly.

This ex­am­ple re­ally is Bayesi­anism-done-straight­for­wardly. The point is that you re­ally don’t need to be sly to get rea­son­able re­sults.

For the Bayesian CI, Jaynes takes a con­stant prior, then jumps straight to the pos­te­rior be­ing N exp(N(θ - x1))

A con­stant prior ends up us­ing only the like­li­hoods. The jump straight to the pos­te­rior is a com­pletely me­chan­i­cal calcu­la­tion, just prod­ucts, and nor­mal­iza­tion.

Then I’d con­struct the Bayesian CI by me­chan­i­cally defin­ing the like­li­hood as the product of the in­di­vi­d­ual ob­ser­va­tions’ like­li­hoods.

Each in­di­vi­d­ual like­li­hood goes to zero for (x < θ). This means that product also does for the small­est (x1 < θ). You will get out the same PDF as Jaynes. CIs can be con­structed many ways from PDFs, but con­struct­ing the small­est one will give you the same one as Jaynes.

EDIT: for full effect, please do the calcu­la­tion your­self.

• I stopped read­ing cupholder’s com­ment be­fore the last para­graph (to write my own re­ply) and com­pletely missed this! D’oh!

• Jaynes does go on to dis­cuss ev­ery­thing you have pointed out here. He noted that con­fi­dence in­ter­vals had com­monly been held not to re­quire suffi­cient statis­tics, pointed out that some fre­quen­tist statis­ti­ci­ans had been doubt­ful on that point, and re­marked that if the fre­quen­tist es­ti­ma­tor had been the suffi­cient statis­tic (the min­i­mum life­time) then the re­sults would have agreed. I think the real point of the story is that he ran through the fre­quen­tist calcu­la­tion for a group of peo­ple who did this sort of thing for a liv­ing and shocked them with it.

• You got me: I didn’t read the what-went-wrong sub­sec­tion that fol­lows the ex­am­ple. (In my defence, I did start read­ing it, but rol­led my eyes and stopped when I got to the claim that “there must be a very ba­sic fal­lacy in the rea­son­ing un­der­ly­ing the prin­ci­ple of con­fi­dence in­ter­vals”.)

I sus­pect I’m not the only one, though, so hope­fully my ex­pla­na­tion will catch some of the eye­balls that didn’t read Jaynes’s own post-mortem.

[Edit to add: you’re al­most cer­tainly right about the real point of the story, but I think my re­ply was fair given the spirit in which it was pre­sented here, i.e. as a fre­quen­tism-v.-Bayesian thing rather than an or­tho­dox-statis­ti­ci­ans-are-taught-badly thing.]

• I think my re­ply was fair…

In­de­pen­dently re­pro­duc­ing Jaynes’s anal­y­sis is ex­cel­lent, but call­ing him “cheeky” for “im­plic­itly us[ing] differ­ent es­ti­ma­tors” is not fair given that he’s ex­plicit on this point.

....given the spirit in which it was pre­sented here, i.e. as a fre­quen­tism-v.-Bayesian thing rather than an or­tho­dox-statis­ti­ci­ans-are-taught-badly thing.

It’s a fre­quen­tism-v.-Bayesian thing to the ex­tent that cor­rect cov­er­age is con­sid­ered a suffi­cient con­di­tion for good fre­quen­tist statis­ti­cal in­fer­ence. This is the fal­lacy that you rol­led your eyes at; the room full of shocked fre­quen­tists shows that it wasn’t a straw­man at the time. [ETA: This isn’t quite right. The “v.-Bayesian” part comes in when cor­rect cov­er­age is con­sid­ered a nec­es­sary con­di­tion, not a suffi­cient con­di­tion.]

ETA:

I sus­pect I’m not the only one, though, so hope­fully my ex­pla­na­tion will catch some of the eye­balls that didn’t read Jaynes’s own post-mortem.

This is a re­ally good point, and it makes me happy that you wrote your ex­pla­na­tion. For peo­ple for whom Jaynes’s phras­ing gets in the way, your phras­ing by­passes the polemics and lets them see the math be­hind the ex­am­ple.

• In­de­pen­dently re­pro­duc­ing Jaynes’s anal­y­sis is ex­cel­lent, but call­ing him “cheeky” for “im­plic­itly us[ing] differ­ent es­ti­ma­tors” is not fair given that he’s ex­plicit on this point.

I was wrong to say that Jaynes im­plic­itly used differ­ent es­ti­ma­tors for the two meth­ods. After the ex­am­ple he does men­tion it, a fact I missed due to skip­ping most of the post-mortem. I’ll edit my post higher up to fix that er­ror. (That said, at the risk of be­ing pedan­tic, I did take care to avoid call­ing Jaynes-the-per­son cheeky. I called his ex­am­ple cheeky, as well as his com­par­i­son of the fre­quen­tist CI to the Bayesian CI, kinda.)

It’s a fre­quen­tism-v.-Bayesian thing to the ex­tent that cor­rect cov­er­age is con­sid­ered a suffi­cient con­di­tion for good fre­quen­tist statis­ti­cal in­fer­ence. This is the fal­lacy that you rol­led your eyes at; the room full of shocked fre­quen­tists shows that it wasn’t a straw­man at the time. [ETA: This isn’t quite right. The “v.-Bayesian” part comes in when cor­rect cov­er­age is con­sid­ered a nec­es­sary con­di­tion, not a suffi­cient con­di­tion.]

When I read Jaynes’s fal­lacy claim, I didn’t in­ter­pret it as say­ing that treat­ing cov­er­age as nec­es­sary/​suffi­cient was fal­la­cious; I read it as ar­gu­ing that the use of con­fi­dence in­ter­vals in gen­eral was fal­la­cious. That was made me roll my eyes. [Edit to clar­ify: that is, I was rol­ling my eyes at what I felt was a straw­man, but a differ­ent one to the one you have in mind.] Hav­ing read his post-mortem fully and your re­ply, I think my ini­tial, eye-roll-in­duc­ing in­ter­pre­ta­tion was in­cor­rect, though it was rea­son­able on first read-through given the con­text in which the “fal­lacy” state­ment ap­peared.

• ex­cel­lent pa­per, thanks for the link.

• My in­tu­ition would be that the in­ter­val should be bounded above by 12 - ep­silon, since the prob­a­bil­ity that we got one com­po­nent that failed at the the­o­ret­i­cally fastest rate seems un­likely (prob­a­bil­ity zero?).

• You can treat the in­ter­val as open at 12.0 if you like; it makes no differ­ence.

• If by ep­silon, you mean a spe­cific num­ber greater than 0, the only rea­son to shave off an in­ter­val of length ep­silon from the high end of the con­fi­dence in­ter­val is if you can get the prob­a­bil­ity con­tained in that ep­silon-length in­ter­val back from a smaller in­ter­val at­tached to the low end of the con­fi­dence in­ter­val. (I haven’t work through the math, and the pdf link is giv­ing me “404 not found”, but pre­sum­ably this is not the case in this prob­lem.)

• This and this might be the kind of thing you’re look­ing for.

Though the con­flict re­ally only ap­plies in the ar­tifi­cial con­text of a math prob­lem. Fre­quen­tial­ism is more like a spe­cial case of Bayesi­anism where you’re mak­ing cer­tain as­sump­tions about your pri­ors, some­times speci­fi­cally stated in the prob­lem, for ease of calcu­la­tion. For in­stance, in a Fre­quen­tial­ist anal­y­sis of coin flips, you might ig­nore all your prior in­for­ma­tion about coins, and as­sume the coin is fair.

• thanks, that’s what I was look­ing for. would it be cor­rect to say that in the fre­quen­tist in­ter­pre­ta­tion your con­fi­dence in­ter­val nar­rows as your tri­als ap­proach in­finity?

• That is a highly de­sired prop­erty of Fre­quen­tist meth­ods, but it’s not guaran­teed by any means.

• An­drew Gel­man wrote a par­ody of ar­gu­ments against Bayesi­anism here. Note that he says that you don’t have to choose Bayesi­anism or fre­quen­tism; you can mix and match.

I’d be obliged if some­one would ex­plain this para­graph, from his re­sponse to his par­ody:

• “Why should I be­lieve your sub­jec­tive prior? If I re­ally be­lieved it, then I could just feed you some data and ask you for your sub­jec­tive pos­te­rior. That would save me a lot of effort!”: I agree that this crit­i­cism re­veals a se­ri­ous in­co­her­ence with the sub­jec­tive Bayesian frame­work as well with in the clas­si­cal util­ity the­ory of von Neu­mann and Mor­gen­stern (1947), which si­mul­ta­neously de­mands that an agent can rank all out­comes a pri­ori and ex­pects that he or she will make util­ity calcu­la­tions to solve new prob­lems. The re­s­olu­tion of this crit­i­cism is that Bayesian in­fer­ence (and also util­ity the­ory) are ideals or as­pira­tions as much as they are de­scrip­tions. If there is se­ri­ous dis­agree­ment be­tween your sub­jec­tive be­liefs and your calcu­lated pos­te­rior, then this should send you back to re-eval­u­ate your model.

• Nice ex­pla­na­tion. My only con­cern is that by the open­ing state­ment “aiming low”. It makes it difficult to send this ar­ti­cle to peo­ple with­out them jus­tifi­ably re­ject­ing it out of hand as a pa­tron­iz­ing act. When the in­ten­tion for aim low is truly no­ble, per­haps it is just as ac­cu­rately de­scribed as writ­ing clearly, writ­ing for non-ex­perts, or maybe even just writ­ing an “in­tro­duc­tion”.

• Good point. I changed “to aim low” to “to sum­ma­rize ba­sic ma­te­rial”.

• And be­sides, as a soft­ware de­vel­oper with plenty of Bayesian the­ory be­hind me, I ap­pre­ci­ate the sim­plic­ity of the ar­ti­cle for the clar­ity it pro­vides me. Thanks for “aiming low” ;-)

• Re: “Core tenet 1: For any given ob­ser­va­tion, there are lots of differ­ent rea­sons that may have caused it.”

This seems badly phrased. It is nor­mally pre­vi­ous events that cause ob­ser­va­tions. It is not clear what it means for a rea­son to cause some­thing.

• Good point. That sen­tence struc­ture was a car­ry­over from Fin­nish, where you can say that rea­sons cause things.

Would “Any given ob­ser­va­tion has many differ­ent pos­si­ble causes” be bet­ter?

• Yes, that would be bet­ter.

• Changed.

• Great, great post. I like that it’s more qual­i­ta­tive and philo­soph­i­cal than quan­ti­ta­tive, which re­ally makes it clear how to think like a Bayesian. Though I know the math is im­por­tant, hav­ing this kind of in­tu­itive, qual­i­ta­tive un­der­stand­ing is very use­ful for real life, when we don’t have ex­act statis­tics for so many things.

• I don’t know if it be­longs here or in a sep­a­rate post but afaik there is no ex­pla­na­tion of the Dutch book ar­gu­ment on Less Wrong. It seems like there should be. Just tel­ling peo­ple that struc­tur­ing your be­liefs ac­cord­ing to Bayes The­o­rem will make them ac­cu­rate might not do the trick for some. The Dutch book ar­gu­ment makes it clear why you can’t just use any old prob­a­bil­ity dis­tri­bu­tion.

• I thought about whether to in­clude a Dutch Book dis­cus­sion in this post, but felt it would have been too long and not as “deep core” as the other stuff. More like “sup­port­ing core”. But yes, it would be good to have a dis­cus­sion of that up on LW some­where.

• I don’t know if it be­longs here or in a sep­a­rate post but afaik there is no ex­pla­na­tion of the Dutch book ar­gu­ment on Less Wrong. It seems like there should be.

• I’m on it.

• Thanks Kaj,

As I stated in my last post, read­ing LW of­ten gives me the feel­ing that I have read some­thing very im­por­tant, yet I of­ten don’t im­me­di­ately know why what I just read should be im­por­tant un­til I have some later con­text in which to place the prior con­tent.

Your post just gave me the con­text in which to make bet­ter sense of all of the prior con­tent on Bayes here on LW.

It doesn’t hurt that I have fi­nally dipped my toes in the Bayesian Waters of Academia in an offi­cial ca­pac­ity with a Prob­a­bil­ity and Stats class (which seems to be a pre­req­ui­site for so many other classes). The com­bined in­for­ma­tion from school and the con­tent here have helped me to get a leg up on the other stu­dents in the us­age of Bayesian Prob­a­bil­ity at school.

I am just lack­ing one bit in or­der to fully in­te­grate Bayes into my life: How to use it to test my be­liefs against re­al­ity. I am sure that this will come with ex­pe­rience.

• A fre­quen­tist asks, “did you find enough ev­i­dence?” A Bayesian asks, “how much ev­i­dence did you find?”

Fre­quen­tists can be tricky, by say­ing that a very small amount of ev­i­dence is suffi­cient; and they can hide this claim be­hind lots of fancy calcu­la­tions, so they usu­ally get away with it. This makes for bet­ter press re­leases, be­cause say­ing “we found 10dB of ev­i­dence that X” doesn’t sound nearly as good as say­ing “we found that X”.

• Since when do fre­quen­tists mea­sure ev­i­dence in deci­bels?

• jim­ran­domh claimed that fre­quen­tists don’t re­port amounts of ev­i­dence. So you ob­ject that mea­sur­ing in deci­bels is not how they don’t re­port it? If they don’t re­ports amount of ev­i­dence, then of course they don’t re­port it in the pre­cise way in the ex­am­ple.

• Fre­quen­tists (or just about any­body in­volved in ex­per­i­men­tal work) re­port p-val­ues, which are their main quan­ti­ta­tive mea­sure of ev­i­dence.

• Ev­i­dence, as mea­sured in log odds, has the nice prop­erty that ev­i­dence from in­de­pen­dent sources can be com­bined by adding. Is there any way at all to com­bine p-val­ues from in­de­pen­dent sources? As I un­der­stand them, p-val­ues are used to make a sin­gle bi­nary de­ci­sion to de­clare a the­ory sup­ported or not, not to track cu­mu­la­tive strength of be­lief in a the­ory. They are not a mea­sure of ev­i­dence.

• Log odds of in­de­pen­dent events do not add up, just as the odds of in­de­pen­dent events do not mul­ti­ply. The odds of flip­ping heads is 1:1, the odds of flip­ping heads twice is not 1:1 (you have to mul­ti­ply odds by like­li­hood ra­tios, not odds by odds, and like­wise you don’t add log odds and log odds, but log odds and log like­li­hood-ra­tios). So call­ing log odds them­selves “ev­i­dence” doesn’t fit the way peo­ple use the word “ev­i­dence” as some­thing that “adds up”.

This ter­minol­ogy may have origi­nated here:

http://​​causal­i­tyre­lay.word­press.com/​​2008/​​06/​​23/​​odds-and-in­tu­itive-bayes/​​

I’m vot­ing your com­ment up, be­cause I think it’s a great ex­am­ple of how ter­minol­ogy should be cho­sen and used care­fully. If you de­cide to edit it, I think it would be most helpful if you left your origi­nal words as a warn­ing to oth­ers :)

• By “ev­i­dence”, I re­fer to events that change an agent’s strength of be­lief in a the­ory, and the mea­sure of ev­i­dence is the mea­sure of this change in be­lief, that is, the like­li­hood-ra­tio and log like­li­hood-ra­tio you re­fer to.

I never meant for “ev­i­dence” to re­fer to the pos­te­rior strength of be­lief. “Log odds” was only meant to spec­ify a par­tic­u­lar mea­sure­ment of strength in be­lief.

• Can you be clearer? Log like­li­hood ra­tios do add up, so long as the in­de­pen­dence crite­rion is satis­fied (ie so long as P(E_2|H_x) = P(E_2|E_1,H_x) for each H_x).

• Sure, just ed­ited in the clar­ifi­ca­tion: “you have to mul­ti­ply odds by like­li­hood ra­tios, not odds by odds, and like­wise you don’t add log odds and log odds, but log odds and log like­li­hood-ra­tios”.

• As long as there are only two H_x, mind you. They no longer add up when you have three hy­pothe­ses or more.

• There’s lots of pa­pers on com­bin­ing p-val­ues.

• Well, just look­ing at the first re­sult, it gives a for­mula for com­bin­ing n p-val­ues that as near as I can tell, lacks the prop­erty that C(p1,p2,p3) = C(C(p1,p2),p3). I sus­pect this is a re­sult of un­spo­ken as­sump­tions that the com­bined p-val­ues were ob­tained in a similar fash­ion (which I vi­o­late by try­ing to com­bine a p-value com­bined from two ex­per­i­ments with an­other ob­tained from a third ex­per­i­ment), which would be in­for­ma­tion not con­tained in the p-value it­self. I am not sure of this be­cause I did not com­pletely fol­low the deriva­tion.

But is there a par­tic­u­lar pa­per I should look at that gives a good an­swer?

• I haven’t ac­tu­ally read any of that liter­a­ture—Cox’s the­o­rem sug­gests it would not be a wise in­vest­ment of time. I was just Googling it for you.

• Fair enough, though it prob­a­bly isn’t worth my time ei­ther.

Un­less some­one claims that they have a good gen­eral method for com­bin­ing p-val­ues, such that it does not mat­ter where the p-val­ues come from, or in what or­der they are com­bine, and can point me at one spe­cific method that does all that.

• Bayesi­anism is more than just sub­jec­tive prob­a­bil­ity; it is a com­plete de­ci­sion the­ory.

A de­cent sum­mary is pro­vided by Sven Ove Hans­son:

1. The Bayesian sub­ject has a co­her­ent set of prob­a­bil­is­tic be­liefs.

2. The Bayesian sub­ject has a com­plete set of prob­a­bil­is­tic be­liefs.

3. When ex­posed to new ev­i­dence, the Bayesian sub­ject changes his (her) be­liefs in ac­cor­dance with his (her) con­di­tional prob­a­bil­ities.

4. Fi­nally, Bayesi­anism states that the ra­tio­nal agent chooses the op­tion with the high­est ex­pected util­ity.

• What Bayescraft cov­ers is a mat­ter of ten­den­tious defi­ni­tions. I per­son­ally do not con­sider de­ci­sion the­ory a nec­es­sary part of it, though it is cer­tainly part of we’re try­ing to cap­ture at LessWrong.

• I agree. The line be­tween be­lief and de­ci­sion is the line be­tween 3 and 4 in that list and it is such a clean line that the von Neu­mann-Mor­gen­stern ax­ioms can be (and usu­ally are) pre­sented about a fre­quen­tist world.

• I re­cently started work­ing through this Ap­plied Bayesian Statis­tics course ma­te­rial, which has done won­ders for my un­der­stand­ing of Bayesi­anism vs. the bag-of-tricks statis­tics I learned in en­g­ineer­ing school.

• So I fi­nally picked up a copy of Prob­a­bil­ity The­ory: The Logic of Science, by E.T. Jaynes. It’s pretty in­timi­dat­ing and tech­ni­cal, but I was sur­prised how much prose there is, which makes it sur­pris­ingly palat­able. We should recom­mend this more here on Less Wrong.

• Just re­mem­ber that Jaynes was not a math­e­mat­i­cian and many of his claims about pure math­e­mat­ics (as op­posed to com­pu­ta­tions and their ap­pli­ca­tions) in the book are wrong. Espe­cially, in­finity is not mys­te­ri­ous.

• Espe­cially, in­finity is not mys­te­ri­ous.

It should be ob­vi­ous that in­finity (like all things) is not in­her­ently mys­te­ri­ous, and equally ob­vi­ous that it’s mys­te­ri­ous (if not un­known) to most peo­ple.

• In­finity is mys­te­ri­ous was in­tended as a para­phrase of Jaynes’ chap­ter on “para­doxes” of prob­a­bil­ity the­ory, and I in­tended mys­te­ri­ous pre­cisely in the sense of in­her­ently mys­te­ri­ous. As far as I know, Jaynes didn’t use the word mys­te­ri­ous him­self. But he cer­tainly claims that rules of rea­son­ing about in­finity (which he con­ve­niently ig­nores) are not to be trusted and that they lead to para­doxes.

• Pos­si­ble typo:

A the­ory about the laws of physics gov­ern­ing the mo­tion of planets, de­vised by Sir Isaac New­ton, or a the­ory sim­ply stat­ing that the Fly­ing Spaghetti Mon­ster pushes the planets for­ward>s< with His Noodly Ap­pendage.

In the spirit of aiming low, I don’t think you aimed nearly low enough. If I hadn’t already read a small amount from the se­quences I wouldn’t have been able to pick up too much from this ar­ti­cle. This reads as a great sum­mary; I am not con­vinced it is a good ex­pla­na­tion.

The rest of this com­ment is me say­ing the above in more de­tail. Do note that this is my per­spec­tive. Even a newb such as my­self has been tainted with enough key­words to be­ing in­fer­ring de­tails that are not ex­plic­itly men­tioned. This cri­tique is mas­sively ex­ces­sive com­pared to the qual­ity of the work. This means that you did a good job but I went all pesky-picky on you any­way.

You’ve prob­a­bly seen the word ‘Bayesian’ used a lot on this site, but may be a bit un­cer­tain of what ex­actly we mean by that. You may have read the in­tu­itive ex­pla­na­tion, but that only seems to ex­plain a cer­tain math for­mula.

I don’t know which is a more suc­cess­ful way to talk to peo­ple: Us­ing “you” or not us­ing “you.” Rephras­ing those two sen­tences with­out the word, “You:”

The word “Bayesian” is used a lot on this site but it is a difficult con­cept to fully grasp. There is an in­tu­itive ex­pla­na­tion, but it fo­cuses on the math be­hind the con­cept.

• Lets the reader know that the Bayesian con­cept is deeper than math. Math is at the core but for peo­ple who are scared of Math an­other way to think about the sub­ject is pos­si­ble.

• Notes that the con­cept is difficult to un­der­stand be­cause it is difficult to un­der­stand, not be­cause the reader is an idiot

Things I don’t like:

• As much as I like the in­tu­itive ex­pla­na­tion, start­ing with Math is bad for peo­ple scared of math. Even bring­ing it up can shut them into a, “Oh no, I won’t be able to un­der­stand this,” mode. I don’t know if there is a bet­ter way to say what needs to be said, how­ever.

• “You,” in this case, is a lit­tle pa­tron­iz­ing. Not a big deal; just a minor point.

• Too defen­sive. The first cou­ple para­graphs are try­ing to con­vince the LessWrong crowd that this ex­pla­na­tion is needed. That is good, but the fi­nal edit should prob­a­bly leave it out. The in­tended au­di­ence is much, much lower than that.

• There is no men­tion of the Sim­ple Truth or an equiv­a­lent start­ing ground. This may not be needed, but it sure helped me get into the right mind­set when start­ing the se­quences.

We’ll start with a brief ex­am­ple, illus­trat­ing Bayes’ the­o­rem. Sup­pose you are a doc­tor, and a pa­tient comes to you, com­plain­ing about a headache. Fur­ther sup­pose that there are two rea­sons for why peo­ple get headaches: they might have a brain tu­mor, or they might have a cold. A brain tu­mor always causes a headache, but ex­ceed­ingly few peo­ple have a brain tu­mor. In con­trast, a headache is rarely a symp­tom for cold, but most peo­ple man­age to catch a cold ev­ery sin­gle year. Given no other in­for­ma­tion, do you think it more likely that the headache is caused by a tu­mor, or by a cold?

I would drop the term “Bayes’ the­o­rem” here. “We’ll” is an­other ex­am­ple of, “You.” This para­graph could be touched up a bit but I feel this is more me notic­ing that my writ­ing style is differ­ent from yours.

I am not sold on this be­ing a good first ex­am­ple. I like that it is some­thing that most peo­ple will iden­tify with, but the edge cases here are nuts:

• There are more than two rea­sons for headaches

• Do brain tu­mors always cause a headache?

• I don’t nor­mally get headaches from colds and don’t nor­mally as­so­ci­ate headaches with colds. When pon­der­ing why I have a headache, “colds” is pretty far down the list.

• More than “ex­ceed­ingly few” have brain tu­mors. A heck of a lot more peo­ple have colds but “ex­ceed­ingly few” doesn’t im­me­di­ately trans­late into “more peo­ple have colds.”

• Is the type of headache from a brain tu­mor the same type of headache from a com­mon cold? This doesn’t mat­ter to you, since you don’t ac­tu­ally care about the de­tails of the headache, but a reader may very well offer this sug­ges­tion as a solu­tion to figur­ing out if the headache is from a brain tu­mor or a cold. Peo­ple like to stick un­nec­es­sary de­tails into ex­am­ples be­cause that is how they solve real-world ex­am­ples. At this point in the ar­ti­cle, they don’t care about the ex­am­ple. They are imag­in­ing some­one with a cold.

Given the chance, I would re­word the para­graph as such:

A sim­ple ex­am­ple can be found when some­one asks a doc­tor why they have a headache. The doc­tor knows that a typ­i­cal cold will only some­times cause headaches. The doc­tor also knows that a brain tu­mor will al­most always causes headaches. If the doc­tor com­pared these two causes and de­cided that it is more likely a brain tu­mor is at fault, then some­thing went wrong. If you walked into a doc­tor’s office com­plain­ing of a headache and were im­me­di­ately di­ag­nosed with a brain tu­mor, you would prob­a­bly be a lit­tle sus­pi­cious. Bayes’ the­o­rem helps us ex­plain what, ex­actly, went wrong and how to fix it. It uses math to do it, but the ba­sic con­cept is easy to un­der­stand.

Do you want more of this? If not, I can stop now. If so, I can con­tinue later. If you’d like some­thing similar but much shorter and con­cise, I can do that too.

• This is ex­cel­lent feed­back; please, do go on.

I did won­der if this was still too short and not aiming low enough. I chose to go on the side of brief­ness, par­tially be­cause I was wor­ried about end­ing up with a gi­ant mam­moth post and par­tially be­cause I felt I’d just be re­peat­ing what Eliezer’s said be­fore. But yeah, look­ing at it now, I’m not at all con­vinced of how well I’d have got­ten the mes­sage if my pre-OB self had read this.

In­ter­est­ing that you find the us­age of “you” and “we” pa­tron­iz­ing. I hadn’t thought of it like that—I in­tended it as a way to make the post less for­mal and build a more com­fortable at­mo­sphere to the reader.

Your re­word­ing sounds good: not ex­actly the way I’d put it, but cer­tainly some­thing to build on.

Hmm, what do peo­ple think—if we end up rewrit­ing this, should I just edit this post? Or make an en­tirely new one? Per­haps keep this one as it is, but work the changes into a fu­ture one that’s longer?

• Con­tin­u­ing.

If you thought a cold was more likely, well, that was the an­swer I was af­ter.

Part of the great dan­ger in ex­plain­ing a High topic is that peo­ple who haven’t been able to un­der­stand High top­ics are su­per wary about look­ing like an idiot. Math is the most ob­vi­ous High topic that peo­ple hate try­ing to un­der­stand. They would much rather ad­mit to fear­ing math than try­ing and failing at un­der­stand­ing it.

This is sad, to me, be­cause math isn’t re­ally that hard to un­der­stand. It is a daunt­ing sub­ject that never ends but the fun­da­men­tals are already un­der­stood by any­one who func­tions in so­ciety. They just never put all the pieces to­gether with the right terms.

I am firmly con­vinced that the Way of Bayes is like this. The se­quences are, for the most part, about sub­jects that could be easy to un­der­stand. They make in­tu­itive sense. The de­tails and the num­bers are a pain, but the con­cept it­self is some­thing I could ex­plain to nearly ev­ery­one I know. (So I think. I haven’t ac­tu­ally tried yet.)

A sen­tence like the one I quoted above is one that will put a layper­son on defen­sive. This pushes Bayesi­anism into the realm of High top­ics: Topics that are grasped by the Smart peo­ple; the in­tel­lec­tual elite. Ask­ing them ques­tions at all makes them re­al­ize they don’t know the an­swer. This is scary. Im­me­di­ately an­swer­ing the ques­tion and tel­ling them the an­swer should be ob­vi­ous could eas­ily make them feel awk­ward, even if they got the an­swer cor­rect.

Ar­ti­cles ex­plain­ing “ob­vi­ous” things are of­ten ex­plain­ing not-ob­vi­ous things and as­sume that you are fol­low­ing them each step of the way. Th­ese ar­ti­cles are full of trick ques­tions and try to make you sec­ond guess your­self in an effort to show you what you do not know. This is scary and elitist to some­one who has sold their own in­tel­li­gence short.

Your ex­am­ple is so minor that most peo­ple wouldn’t have a prob­lem with it. I bring it up be­cause I am picky. This is an ex­am­ple of aiming far, far too high. The au­di­ence at LessWrong reads a ques­tion/​an­swer like this and en­joys it. They like learn­ing they are wrong and revel in the in­tro­spec­tion that fol­lows as they chase down the er­ror in the ma­chine so they can fix it. A layper­son dreads this. They think it means they are stupid and un­able to un­der­stand. They fail at the com­pe­ti­tion of in­tel­li­gence whether the com­pe­ti­tion ac­tu­ally ex­ists or not.

Even if a brain tu­mor caused a headache ev­ery time, and a cold caused a headache only one per cent of the time (say), hav­ing a cold is so much more com­mon that it’s go­ing to cause a lot more headaches than brain tu­mors do.

I think this be­longs in the de­scrip­tion of the ex­am­ple. You could even leave out the ac­tual num­bers be­cause they only mat­ter for the peo­ple that have the ex­act num­bers. It takes too long to ex­plain that you just made the num­bers up be­cause:

• Every word is more pro­cess­ing that needs to be done

• The in­tended au­di­ence are prob­a­bly in­ex­pe­rienced at skim­ming these sorts of topics

• The num­bers re­ally are irrelevant

• Some­one will dis­agree with the num­bers and make a big stink about some­thing that was irrelevant

Bayes’ the­o­rem, ba­si­cally, says that if cause A might be the rea­son for symp­tom X, then we have to take into ac­count both the prob­a­bil­ity that A caused X (found, roughly, by mul­ti­ply­ing the fre­quency of A with the chance that A causes X) and the prob­a­bil­ity that any­thing else caused X. (For a thor­ough math­e­mat­i­cal treat­ment of Bayes’ the­o­rem, see Eliezer’s In­tu­itive Ex­pla­na­tion.)

And… the layper­son just zoned out. This is the big ob­sta­cle in try­ing to de­scribe Bayesi­anism. Math scares peo­ple away. Even peo­ple who are good at math will glaze over when they see As and Xs and words like “prob­a­bil­ity.” I have no idea how to get around this ob­sta­cle, hon­estly. Your at­tempt was solid. But I still think this is the para­graph where you will lose the low­est rung of your au­di­ence.

There should be noth­ing sur­pris­ing about that, of course.

What if they were sur­prised? What if their whole world reeled at the ques­tion of what causes headaches? What if, hor­rifi­cally, they com­pletely mi­s­un­der­stood the pre­vi­ous ex­am­ple and are cur­rently pon­der­ing if their headache means they have a brain tu­mor?

If they are com­pletely be­wil­dered right now, tel­ling them they shouldn’t be sur­prised will make them feel dumb. Even if they are dumb, your ar­ti­cle shouldn’t make them feel dumb. It should make them feel smart.

Sup­pose you’re out­side, and you see a per­son run­ning. They might be run­ning for the sake of ex­er­cise, or they might be run­ning be­cause they’re in a hurry some­where, or they might even be run­ning be­cause it’s cold and they want to stay warm. To figure out which one is the case, you’ll try to con­sider which of the ex­pla­na­tions is true most of­ten, and fits the cir­cum­stances best.

I don’t think this ex­am­ple clar­ifies much. A bul­let list:

• “they’re in a hurry some­where” sounds funny to me. Per­haps, “they’re in a hurry to get some­where” or “they’re in a hurry” works bet­ter? This could be a style thing.

• Run­ning be­cause it’s cold will mean ran­dom things to ran­dom peo­ple. If I am out­side and its cold I don’t think of run­ning. I think of do­ing hard work like shov­el­ing snow or sim­ply go­ing in­side. The rea­son I bring this up is be­cause ev­ery sec­ond some­one thinks, “That’s weird, why would you run out­side if it’s cold?” is a sec­ond that the points you made above get shoved fur­ther away from the points be­low.

• To figure out which one is the case, peo­ple could think of (a) ask­ing the run­ner (b) look­ing for more ev­i­dence. Judg­ing which rea­son hap­pens most of­ten may not trans­late well. I didn’t even at­tach this lan­guage to the headache on first read. If you know the an­swer you can see the re­la­tion but I am not con­fi­dent that it is available for ev­ery reader.

More com­ing if you still want it. My lunch break is over. :)

• Very in­ter­est­ing. Ac­tu­ally, I didn’t seek to aim that low—I was tar­get­ing the av­er­age LW reader (or at least an av­er­age per­son who was com­fortable with maths). How­ever, I still find this to be very valuable, since I have played around with the idea of try­ing to write a book that’d at­tempt to sell (im­plic­itly or ex­plic­itly) the idea of “maths /​ sci­ence, es­pe­cially as ap­plied to ra­tio­nal­ity /​ cog­ni­tive sci­ence is ac­tu­ally fun” to a lay au­di­ence.

So I prob­a­bly won’t al­ter the origi­nal ar­ti­cle as a re­ac­tion to this, but if you want to nev­er­the­less help me in figur­ing out how to reach to that au­di­ence, do con­tinue. :)

• So I prob­a­bly won’t al­ter the origi­nal ar­ti­cle as a re­ac­tion to this, but if you want to nev­er­the­less help me in figur­ing out how to reach to that au­di­ence, do con­tinue. :)

Haha, will do. I do re­al­ize that some of what I am bring­ing up is ex­tremely petty, but I have watched some of my ar­ti­cles get com­pletely de­railed by what I would con­sider to be a com­pletely ir­rele­vant point. Even amongst the high qual­ity dis­cus­sions in the com­ments I find my­self need­ing to back up and ask a Really Ob­vi­ous Ques­tion.

This is likely a fault in the way I com­mu­ni­cate (which is ac­cen­tu­ated on­line) and also a glitch where peo­ple are not will­ing/​able to drop sub­jects that are bug­ging them. If I was fun­da­men­tally op­posed to the claim that all brain tu­mors caused headaches I would feel com­pel­led to point it out in the com­ments. (This com­pul­sion is some­thing I am try­ing to curb.)

In any case, I am glad the com­ments are helpful and I will con­tinue as I find the time. If you ever start draft­ing some­thing like what you men­tioned I am will­ing to proofread and com­ment.

• In­ter­est­ing that you find the us­age of “you” and “we” pa­tron­iz­ing. I hadn’t thought of it like that—I in­tended it as a way to make the post less for­mal and build a more com­fortable at­mo­sphere to the reader.

Us­ing “you” is a two-edged sword; it can cre­ate greater in­ti­macy with your au­di­ence, but only if you know your au­di­ence well enough, and don’t mind po­lariz­ing your re­sponse, or are will­ing to limit your­self to hy­po­thet­i­cals (e.g. “if you walked into a doc­tor’s office”)

If you’re less cer­tain of your au­di­ence, but still want the strong in­ti­macy or iden­ti­fi­ca­tion re­sponse, you may want to use “I” in­stead. By tel­ling a story that your reader can re­late to… that is, a story of how you made this dis­cov­ery, found out why it’s im­por­tant, or ap­plied it in some way to achieve a goal the reader shares or rec­og­nizes as valuable, then you al­low the reader to sim­ply iden­tify with you on a less con­scious/​con­tentious level.

(No­tice, for ex­am­ple, how many of Eliezer’s best posts be­gin with such a story, ei­ther about Eliezer or some fic­tional char­ac­ters.)

• Hmm, what do peo­ple think—if we end up rewrit­ing this, should I just edit this post? Or make an en­tirely new one? Per­haps keep this one as it is, but work the changes into a fu­ture one that’s longer?

Per­son­ally, I think if it’s just minor stylis­tic changes in ex­press­ing the same ma­te­rial, edit­ing the post is the way to go; if it’s adding more ma­te­rial, or ex­press­ing it rad­i­cally differ­ently, then a new post is ap­pro­pri­ate.

• it’s fine the way it is I think, it cov­ers enough with­out be­ing too spe­cific. great post.

• “A might be the rea­son for symp­tom X, then we have to take into ac­count both the prob­a­bil­ity that X caused A”

I think you have ac­ci­den­tally swapped some vari­ables there

• Thanks, fixed.

• It seems there are a few meta-po­si­tions you have to hold be­fore tak­ing Bayesi­anism as talked about here; you need the con­cept of Win­ning first. Bayes is not suffi­cient for san­ity, if you have, say, an anti-Oc­camian or anti-Lapla­cian prior.

What this site is for is to help us be good ra­tio­nal­ists; to win. Bayesi­anism is the best can­di­date method­ol­ogy for deal­ing with un­cer­tainty. We even have the­o­rems that show that in it’s do­main it’s uniquely good. My un­der­stand­ing of what we mean by Bayesi­anism is up­dat­ing in the light of new ev­i­dence, and up­dat­ing cor­rectly within the con­straints of san­ity (cf Dutch books).

• We can dis­cuss both epistemic and in­stru­men­tal ra­tio­nal­ity.

• You are right that Bayesi­anism isn’t suffi­cient for san­ity, but why should it pre­vent a post ex­plain­ing what Bayesi­anism is? It’s pos­si­ble to be a Bayesian with wrong pri­ors. It’s also good to know what Bayesi­anism is, es­pe­cially when the term is so heav­ily used. My un­der­stand­ing is that the OP is do­ing a good job keep­ing con­cepts of win­ning and Bayesi­anism sep­a­rated. The con­trary would con­flate Bayesi­anism with ra­tio­nal­ity.

• Jonathan’s post doesn’t seem like much of an ar­gu­ment but more of crit­i­cism. There’s lots more to write on this topic.

• The penul­ti­mate para­graph about our be­liefs isn’t about Bayesi­anism so much as heuris­tics and bi­ases. Un­less you were a Bayesian from birth, for at least part of your life your be­liefs evolved in a crazy fash­ion not en­tirely gov­erned by Bayes’ the­o­rem. It is for this rea­son that we should be sus­pi­cious of the be­liefs based on as­sump­tions we’ve never scru­ti­nized.

• Doesn’t “Bayesi­anism” ba­si­cally boil down to the idea that one can think of be­liefs in terms of math­e­mat­i­cal prob­a­bil­ities?

• That’s like say­ing that Sunni be­liefs boil down to be­lief in Is­lam.

• Fol­low­ing your anal­ogy, what is the equiv­a­lent to Shia Is­lam?

Put an­other way: Bayesi­anism as op­posed to what?

• Fre­quen­tism, ac­cord­ing to the posters here. Un­less I mi­s­un­der­stand what you mean by think­ing of a be­lief in terms of prob­a­bil­ities.

• But the stan­dard Fre­quen­tist stance is that prob­a­bil­ities are not de­grees of be­lief, but solely long term fre­quen­cies in ran­dom ex­per­i­ments.

• Most “fre­quen­tists” aren’t such stick­lers about ter­minol­ogy. Most peo­ple who at­tach prob­a­bil­ities to be­liefs in knowl­edge rep­re­sen­ta­tions—say, AI sys­tems—are more fa­mil­iar with fre­quen­tist than Bayesian method­ol­ogy.

• Fre­quen­tism, ac­cord­ing to the posters here

I looked up “Fre­quen­tism” on Wikipe­dia . . . .I don’t un­der­stand your point.

What con­cept am I omit­ting by char­ac­ter­iz­ing “Bayesi­anism” the way I did?

• Google fre­quen­tist in­stead of fre­quen­tism. It’s the usual way of do­ing statis­tics and work­ing with prob­a­bil­ities.

• I did and I still don’t un­der­stand your point.

Again my ques­tion: Ex­actly what con­cept am I omit­ting by char­ac­ter­iz­ing “Bayesi­anism” the way I did?

• I PM’ed you re­gard­ing this thread. (I men­tion it here be­cause I seem to re­call that you’re sub­ject to a bug that pre­vents you from get­ting mes­sage/​re­ply no­tifi­ca­tions.)

• Thanks!

And in­ter­est­ingly, I find my­self look­ing at my up­votes here and there and won­der­ing what the ap­pro­pri­ate “con­ver­sion rate” is for pur­poses of feel­ing good over a suc­cess­ful post. I’ve now got­ten 31 up­votes there, but only 13 here. Ob­vi­ously get­ting up­votes over there is eas­ier than over here, so I shouldn’t value this as much as if I’d got 13 + 31 = 46 up­votes here. On the other hand, I should prob­a­bly al­low my­self a small bonus for writ­ing a cross-do­main post that is good enough to get up­votes on both sites. Hum. Man, this is tough.

• By any stan­dard you had a suc­cess­ful Hacker News post—it was on the front page for most of the morn­ing, which is good. The num­ber of votes is not mean­ingful at all on Hacker News so there’s no con­ver­sion rate. Also, I strongly sus­pect that many of the ini­tial early votes on HN came from pri­mary LW users fol­low­ing my link and then up­vot­ing, pos­si­bly even peo­ple that didn’t up­vote it on LW.

• The ‘In­tu­itive Ex­pla­na­tion’ link has changed to http://​​yud­kowsky.net/​​ra­tio­nal/​​bayes

• Or take the de­bate we had on 9/​11 con­spir­acy the­o­ries. Some peo­ple thought that un­ex­plained and oth­er­wise sus­pi­cious things in the offi­cial ac­count had to mean that it was a gov­ern­ment con­spir­acy. Others con­sid­ered their prior for “the gov­ern­ment is ready to con­duct mas­sively risky op­er­a­tions that kill thou­sands of its own cit­i­zens as a pub­lic­ity stunt”, judged that to be over­whelm­ingly un­likely, and thought it far more prob­a­ble that some­thing else caused the sus­pi­cious things.

Don’t for­get the prior: “The offi­cial ac­count of big con­flicts with a lot of differ­ent in­ter­ests in­volved will always leave some things un­ex­plained or oth­er­wise sus­pi­cious.” “Govern­ment agen­cies who fail on a mas­sive scale don’t like to be trans­par­ent about how the failure hap­pened.”

Ac­tors in gov­ern­ment agen­cies didn’t think: “How can I con­vince that pub­lic that 9/​11 wasn’t an in­side job.” They think: “How can I in­fluence the pub­lic per­cep­tion of 9/​11 in a way that my de­part­ment gets more fund­ing.” Or when it comes to pres­i­dent Bush at that time: “How can I in­fluence the pub­lic per­cep­tion in a way that makes it more likely that I’ll win the next elec­tion.”

News­pa­per jour­nal­ists don’t care about fact check­ing ev­ery sin­gle fact in their ar­ti­cles. It’s way to much effort. If you have back­ground knowl­edge you will in most news sto­ries facts that aren’t true.

• Don’t for­get the prior: “The offi­cial ac­count of big con­flicts with a lot of differ­ent in­ter­ests in­volved will always leave some things un­ex­plained or oth­er­wise sus­pi­cious.” “Govern­ment agen­cies who fail on a mas­sive scale don’t like to be trans­par­ent about how the failure hap­pened.”

“Govern­ments in gen­eral, and the U.S. in spe­cific, have a his­tory of ly­ing to jus­tify war. I can think of sev­eral in­ci­dents where an offi­cial ca­sus belli turned out to be ei­ther a lie, as in the sec­ond Gulf of Tonkin in­ci­dent or the Iraqi WMD alle­ga­tion; or at least sig­nifi­cantly doubt­ful, such as the sink­ing of the Maine. In these cases, the ‘con­spir­acy the­o­rists’ and peace ac­tivists were right; and I can’t think of any where they were wrong. So they have more cred­i­bil­ity than the offi­cial re­port.”

• In these cases, the ‘con­spir­acy the­o­rists’ and peace ac­tivists were right; and I can’t think of any where they were wrong. So they have more cred­i­bil­ity than the offi­cial re­port.”

Know­ing that the offi­cial re­port con­tains in­for­ma­tion that’s false, doesn’t lead you to know what’s true.

• Others con­sid­ered their prior for “the gov­ern­ment is ready to con­duct mas­sively risky op­er­a­tions that kill thou­sands of its own cit­i­zens as a pub­lic­ity stunt”, judged that to be over­whelm­ingly un­likely,

Here I have to take ob­jec­tion: you framed it as a pub­lic­ity stunt but ac­tu­ally 9-11 has shaped ev­ery­thing in the USA: do­mes­tic poli­cies, for­eign poli­cies, mil­i­tary spend­ing the iden­tity of the na­tion as a whole(It’s US vs. THEM) etc… So there is a lot at stake.

Btw, as far as the will­ing­ness of the gov­ern­ment to kill its own citzens goes, more than 4,000 US sol­diers have died in Iraq un­til now(over 30,000 wounded) more than 1,000 in Afghanistan, com­pared to less than 3,000 in the WTC at­tack. This was on known false in­for­ma­tion, re­mem­ber the origi­nal claim of WMDs in Iraq? So if you se­ri­ously main­tain that the gov­ern­ment is not will­ing to sac­ri­fice its own cit­i­zens I want to know where you get your pri­ors from.

• The con­trol­ling fea­ture for this prior isn’t “will­ing­ness to kill own cit­i­zens” or “pub­lic­ity stunt” but “mas­sively risky”. “Mas­sively risky” is ac­tu­ally an in­cred­ible un­der­state­ment. We’re talk­ing about peo­ple already at the top of the so­cial hi­er­ar­chy risky death and eter­nal shame for them and their fam­i­lies in hopes the hun­dreds of peo­ple part of the con­spir­acy keep quiet and that no damn­ing ev­i­dence of a re­mark­able com­pli­cated plot is left be­hind.

The gov­ern­ment’s will­ing­ness to kill it’s own cit­i­zens, such as it is, less of­ten car­ries over to civili­ans and even less of­ten car­ries over to rich white peo­ple on Wall Street. And for some­thing that has help shaped the coun­try… well re­mark­ably lit­tle has changed in the di­rec­tion that ad­minis­tra­tion wanted to things to go. In­deed, why in all those years of wan­ing pop­u­lar­ity, wouldn’t they try some­thing like it again (maybe foil the at­tempt this time). If they’re so pow­er­ful why not get some­one else elected Pres­i­dent?

• You know, I have lit­tle in­ter­est in 9/​11 Truth, but I have no pa­tience for the “but it would be so ob­vi­ous” re­ply to Truthers. Here is how that con­ver­sa­tion trans­lates in my head:

Truther: I think the tow­ers came down due to a de­liber­ate de­mo­li­tion by our gov­ern­ment. I think this be­cause thus and so.

Non-Truther: But the gov­ern­ment would never have done any­thing so easy to find out about, be­cause it would carry mas­sive risk. Every­body would know about it.

Truther: Well, if peo­ple were pay­ing at­ten­tion to thus and so, they’d know -

Non-Truther: BUT SINCE I DIDN’T ALREADY KNOW ABOUT THUS AND SO IT’S CLEARLY NOT SOMETHING EVERYBODY KNOWS ABOUT AND I CAN’T HEAR YOU NANANANANANANANA.

• Just to clar­ify: Do you think that is what I’m do­ing here?

• It was at least strongly rem­i­nis­cent, enough that un­der your com­ment seemed like a good place to put mine, but I did not in­tend to at­tack you speci­fi­cally.

• obli­ga­tory XKCD comic: http://​​xkcd.com/​​690/​​

(ac­tu­ally, that’s not as rele­vant as I first though, but I’ll go ahead and post it here any­way)

• I be­lieve you were un­fairly voted down. Your re­cast­ing shows that this is es­sen­tially an ap­peal to au­thor­ity, with the au­thor­ity be­ing “ev­ery­one else”.

• We’re talk­ing about peo­ple already at the top of the so­cial hi­er­ar­chy risky death and eter­nal shame for them and their fam­i­lies in hopes the hun­dreds of peo­ple part of the con­spir­acy keep quiet and that no damn­ing ev­i­dence of a re­mark­able com­pli­cated plot is left be­hind.

Well, there is a lot of ev­i­dence left be­hind and that has been cited over and over.

The gov­ern­ment’s will­ing­ness to kill it’s own cit­i­zens, such as it is, less of­ten car­ries over to civili­ans and even less of­ten car­ries over to rich white peo­ple on Wall Street.

AFAIK none of the peo­ple kil­led was ex­cep­tion­ally rich and/​or pow­er­ful.

And for some­thing that has help shaped the coun­try… well re­mark­ably lit­tle has changed in the di­rec­tion that ad­minis­tra­tion wanted to things to go.

Wait, what???

If they’re so pow­er­ful why not get some­one else elected Pres­i­dent?

Some­one else? What are you talk­ing about, ev­ery Pres­i­dent in the last decades has been a mem­ber of one of the same two par­ties. Obama has not sig­nifi­cantly changed the for­eign policy and is mov­ing in the same di­rec­tion.

• Well, there is a lot of ev­i­dence left be­hind and that has been cited over and over.

Well we’re talk­ing about the prior. Ob­vi­ously we can then up­date on the ev­i­dence what­ever that is. Peo­ple will also dis­agree about what the ev­i­dence means but the point is this is a re­ally un­likely even you guys are claiming took place. We can in­ter­pret the ev­i­dence but strange co­in­ci­dences or some video footage not be­ing re­leased is not close to suffi­cient for me to sud­denly start be­liev­ing 9/​11 was an in­side job.

AFAIK none of the peo­ple kil­led was ex­cep­tion­ally rich and/​or pow­er­ful.

I don’t know what ex­cep­tion­ally means here but, ya know, the WTC wasn’t a home­less shelter.

And for some­thing that has help shaped the coun­try… well re­mark­ably lit­tle has changed in the di­rec­tion that ad­minis­tra­tion wanted to things to go. Wait, what???

...

Some­one else? What are you talk­ing about, ev­ery Pres­i­dent in the last decades has been a mem­ber of one of the same two par­ties. Obama has not sig­nifi­cantly changed the for­eign policy and is mov­ing in the same di­rec­tion.

Look, I have no idea what your par­tic­u­lar con­spir­acy is. So it is a lit­tle hard to ex­am­ine the sup­posed mo­ti­va­tions. My com­ments made sense given cer­tain as­sump­tions about what the mo­ti­va­tions of such a con­spir­acy would be. Ob­vi­ously they aren’t your as­sump­tions so share yours.

• Well, there is a lot of ev­i­dence left be­hind and that has been cited over and over.

Well we’re talk­ing about the prior.

Sorry. What I should have an­swered was: un­der the as­sump­tion of the con­spir­acy the­ory the peo­ple who planned the whole thing are from the ex­ec­u­tive branch of the gov­ern­ment which is the one who took charge of the in­ves­ti­ga­tion. So they have noth­ing to fear. Or can you tell me who they have to fear?

I don’t know what ex­cep­tion­ally means here but, ya know, the WTC wasn’t a home­less shelter.

By ex­cep­tion­ally rich I mean peo­ple like bankers, etc… Most if not all of those kil­led in the WTC were: office work­ers, clean­ing staff, tourists, po­lice and fire­men.

• Well ar­gued, but if you credit the U.S. gov­ern­ment such brazen cru­elty to­ward the cit­i­zens it nom­i­nally serves, then why would the gov­ern­ment need a pre­tense at all? Why not in­vade with only forged doc­u­ments and lies? No self-in­flicted wound should be nec­es­sary; the U.S. mil­i­tary may not fear in­ter­ven­tion by other na­tions’ forces if they ap­pear to only pick on a few small oil-rich na­tions.

• Forged doc­u­ments and lies are not enough to con­vince the pub­lic opinion or bet­ter to arouse strong emo­tions, some­thing more salient is needed. You have to re­mem­ber, at 9-11 ba­si­cally the whole world stood still watch­ing the events un­fold. Wikipe­dia:

The NATO coun­cil de­clared that the at­tacks on the United States were con­sid­ered an at­tack on all NATO na­tions and, as such, satis­fied Ar­ti­cle 5 of the NATO charter

http://​​en.wikipe­dia.org/​​wiki/​​Septem­ber_11_at­tacks#cite_note-155

Btw ar­ti­cle 5 al­lows the use of armed(mil­i­tary) force. This was the offi­cial NATO po­si­tion even be­fore there was any in­ves­ti­ga­tion as to who was sup­pos­edly be­hind the “at­tacks”.

Any­one ar­gu­ing against mil­i­tary ac­tion can be and still is de­cried as un­pa­tri­otic, cal­lous to­wards the fam­i­lies of those who died. You can­not achieve this with just a batch of doc­u­ments.

• Core tenet 3: We can use the con­cept of prob­a­bil­ity to mea­sure our sub­jec­tive be­lief in some­thing. Fur­ther­more, we can ap­ply the math­e­mat­i­cal laws re­gard­ing prob­a­bil­ity to choos­ing be­tween differ­ent be­liefs. If we want our be­liefs to be cor­rect, we must do so.

Fre­quently mi­s­un­der­stood. E.g. you have propo­si­tions A and B , you mis­tak­enly con­sider that prob­a­bly ei­ther one of them will hap­pen, and you may give me money if you judge P(A)/​P(B) > some thresh­old.

If both A and B hap­pen to be un­likely, I can use that to make ar­gu­ments which only prompt you to up­date (lower prob­a­bil­ity of) B .

Like­wise, if you have some A prob­a­bil­ity of which is in­creased by some ar­gu­ments and de­creased by the other, I can give you only the ar­gu­ments in favour of A. As a good Bayesian you are go­ing to keep up­dat­ing the be­lief, to my ad­van­tage.

Every­thing breaks down on in­com­plete in­fer­ence graphs that very fre­quently con­tain mis­takes (in­valid re­la­tions, in­valid nodes, etc). No mat­ter how much you in­ter­nal­ize the tenets, un­less you in­ter­nal­ize some sort of quan­tum hy­per-com­put­ing im­plant into your head, your in­fer­ence graphs will be in­com­plete to an un­known ex­tent, and only par­tially prop­a­gated. If the prop­a­ga­tions are ever to be prompted by read­ing that you should prop­a­gate some­thing, you’ll be sig­nifi­cantly un­der re­mote con­trol.

• Sub-tenet 1: If you ex­pe­rience some­thing that you think could only be caused by cause A, ask your­self “if this cause didn’t ex­ist, would I re­gard­less ex­pect to ex­pe­rience this with equal prob­a­bil­ity?” If the an­swer is “yes”, then it prob­a­bly wasn’t cause A.

I don’t un­der­stand this at all—if you ex­pe­rience some­thing that you think could only be caused by A, then the ques­tion you’re sup­posed to ask your­self makes no sense what­so­ever: ab­sent A, you would ex­pect to never ex­pe­rience this thing, per the origi­nal con­di­tion! And if the an­swer to the ques­tion is any­thing above “never”, then clearly you don’t think that A is the only pos­si­ble cause!

• The point is that peo­ple can er­ro­neously re­port, even to them­selves, that they be­lieve their ex­pe­rience could only be caused by cause A. Ask­ing the ques­tion if you would still an­ti­ci­pate the ex­pe­rience if cause A did not ex­ist is a way of check­ing that you re­ally be­lieve that your ex­pe­rience could only be caused by cause A.

More gen­er­ally, it is use­ful to ex­am­ine be­liefs you have ex­pressed in high level lan­guage, to see if you still be­lieve them af­ter dig­ging deeper into what that high level lan­guage means.

• I think that the in­con­sis­tency of such a po­si­tion was the point. It would prob­a­bly be bet­ter phrased as ”… some­thing that has to be caused by cause A” (or pos­si­bly just “proof of A”), which is effec­tively equiv­a­lent, but IMO some­thing that some­one who would an­swer yes to the fol­low­ing ques­tion could plau­si­bly have claimed to be­lieve (i. e. I wouldn’t be very sur­prised by the ex­is­tence of peo­ple who are that in­con­sis­tent in their be­liefs) .

• . Fur­ther sup­pose that there are two rea­sons for why peo­ple get headaches: they might have a brain tu­mor, or they might have a cold.

Or, if you’re very un­lucky, you could have a headache and a brain tu­mor.… :3

• A brain tu­mor always causes a headache, but ex­ceed­ingly few peo­ple have a brain tu­mor. In con­trast, a headache is rarely a symp­tom for cold, but most peo­ple man­age to catch a cold ev­ery sin­gle year. Given no other in­for­ma­tion, do you think it more likely that the headache is caused by a tu­mor, or by a cold?

Given no other in­for­ma­tion, we don’t know which is more likely. We need num­bers for “rarely”, “most”, and “ex­ceed­ingly few”. For ex­am­ple, if 10% of hu­mans cur­rently have a cold, and 1% of hu­mans with a cold have a headache, but 1% of hu­mans have a brain tu­mor, then the brain tu­mor is ac­tu­ally more likely.

(The calcu­la­tion we’re perform­ing is: com­pare (“rarely” times “most”) to “ex­ceed­ingly few” and see which one is larger.)

You're miss­ing the point. This post is suit­able for an au­di­ence whose eyes would glaze over if you threw in num­bers, which is won­der­ful (I read the "In­tu­itive Ex­pla­na­tion of Bayes' The­o­rem" and was rant­ing for days about how there was not one in­tu­itive thing about it! it was all num­bers! and graphs!). Ad­ding num­bers would make it more strictly ac­cu­rate but would not im­prove any­one's un­der­stand­ing.

Agreed, I did not find the "In­tu­itive Ex­pla­na­tion" to be par­tic­u­larly in­tu­itive even af­ter mul­ti­ple read­ings. Un­der­stand­ing the math and prin­ci­ples is one thing, but this post ac­tu­ally made me sit up and go, "Oh, now I see what all the fuss is about," out­side a rel­a­tively nar­row range of is­sues like di­ag­nos­ing can­cer or iden­ti­fy­ing spam emails.

Now I get it well enough to sum­ma­rize: "Even if A will always cause B, that doesn't mean A did cause B. If B would hap­pen any­way, this tells you noth­ing about whether A caused B."

Which is both a "well duh" and an im­por­tant idea at the same time, when you con­sider that our brains ap­pear to be built to latch onto the first "A" that would cause B, and then stub­bornly hang onto it un­til it can be con­clu­sively dis­proven.

That's a "click" right there, that makes retroac­tively com­pre­hen­si­ble many reams of Eliezer's math rants and Beisut­sukai sto­ries.

So, yeah… this is way too im­por­tant of an idea to have math as­so­ci­ated with it in any way. ;-)

Per­son­ally it both­ers me that the ex­pla­na­tion asks a ques­tion which is nu­mer­i­cally unan­swer­able, and then as­serts that ra­tio­nal­ists would an­swer it in a given way. Sim­ple ex­pla­na­tions are good, but not when they con­tain state­ments which are fac­tu­ally in­cor­rect.

But, look­ing at the karma scores it ap­pears that you are cor­rect that this is bet­ter for many peo­ple. ^_^;

I thought Truly Part of you is an ex­cel­lent in­tro­duc­tion to ra­tio­nal­ism/​Bayesi­anism/​Less Wrong philos­o­phy that avoids much use of num­bers, graphs, and tech­ni­cal lan­guage. So I think it's more ap­pro­pri­ate for the av­er­age per­son, or for peo­ple that equa­tions don't ap­peal to.

Does any­one who meets that de­scrip­tion agree?

And could some­one ask Ali­corn if she prefers it?

Hm­mmm.… that's an in­ter­est­ing ar­ti­cle too, but it fo­cuses on a differ­ent ques­tion, the ques­tion what knowl­edge re­ally means, and uses AI con­cepts to dis­cuss that (some­what re­lated to Searle's Chi­nese Room gedanken­ex­per­i­ment.)

How­ever, I think the ar­ti­cle dis­cussed here is a bit more di­rectly con­nected to Bayesi­anism. It's clear what Bayes The­o­rem means, but what many peo­ple to­day mean with Bayesi­anism, is some­what of a loose ex­trap­o­la­tion of that -- or even just a metaphor.

I think the ar­ti­cle does a good job at ex­plain­ing the cur­rent use.

I think the ar­ti­cle does a good job at ex­plain­ing the cur­rent use.

• Okay, I’m ris­ing to the bait here...

I would re­ally ap­pre­ci­ate it if peo­ple would be more care­ful about pass­ing on memes re­gard­ing sub­jects they have not re­searched prop­erly. This should be a ba­sic part of “ra­tio­nal­ist eti­quette”, in the same way that “wash your hands be­fore you han­dle food” is part of com­mon eat­ing eti­quette.

Keep­ing my com­ments on topic:

may be­lieve it likely that the gov­ern­ment did some­thing hor­ren­dous, but we re­al­ize the ev­i­dence is weak and circumstantial

Did you read the ac­tual post about Bayesi­anism? Part of the point is you're not al­lowed to do this! One can't both think some­thing is likely and think the ev­i­dence is weak and cir­cum­stan­tial! Hold­ing a be­lief but not ar­gu­ing for it be­cause you know you don't have the ev­i­dence is a defin­ing ex­am­ple of ir­ra­tional­ity. If you don't think the gov­ern­ment was in­volved, fine. But if you do you're obli­gated to defend your be­lief.

• One can’t both think some­thing is likely and think the ev­i­dence is weak and cir­cum­stan­tial!

One definitely can. What else is one sup­posed to do when ev­i­dence is weak and cir­cum­stan­tial? As­sign prob­a­bil­ities that sum to less than one?

If the ev­i­dence for a par­tic­u­lar claim is weak and cir­cum­stan­tial one should as­sign that claim a low prob­a­bil­ity and other, com­pet­ing, pos­si­bil­ities higher prob­a­bil­ities.

• What if the ev­i­dence for those is also weak and cir­cum­stan­tial?

Or what if one had as­signed that claim a very high prior prob­a­bil­ity?

• I’m al­lowed to be­lieve what­ever I want; I’m just not al­lowed to try to con­vince you of it un­less I have a ra­tio­nal ar­gu­ment.

Tra­di­tional Ra­tion­al­ity is of­ten ex­pressed as so­cial rules, un­der which this claim might work. But in Bayesian Ra­tion­al­ity, there is math that tells you ex­actly what you ought to be­lieve given the ev­i­dence you have ob­served.

Okay—but in prac­ti­cal­ity, what if I don't have time (or men­tal fo­cus, or what­ever re­sources it takes) to ex­plic­itly iden­tify, enu­mer­ate, and eval­u­ate each piece of ev­i­dence that I may be con­sid­er­ing? It took me over an hour just to get this far with a Bayesian anal­y­sis of one hy­poth­e­sis, which I'm prob­a­bly not even do­ing right.

Or do we step out­side the realm of Bayesian Ra­tion­al­ity when we look at prac­ti­cal con­sid­er­a­tions like "finite com­put­ing re­sources"?

Or do we step out­side the realm of Bayesian Ra­tion­al­ity when we look at prac­ti­cal con­sid­er­a­tions like “finite com­put­ing re­sources”?

I'd ac­tu­ally say, start with the prior and with the strongest piece of ev­i­dence you think you have. This of it­self should re­veal some­thing in­ter­est­ing and dis­putable.

As some­one who re­cently failed at an at­tempt at Bayesian anal­y­sis let my try to offer a few poin­t­ers: You cor­rectly con­clude that "What is the like­li­hood that ev­i­dence E would oc­cur even if H were false?" is more im­me­di­ately rele­vant than "What is the like­li­hood that ev­i­dence E would not oc­cur if H were true?", which you only asked be­cause you got the syn­tax wrong, "the like­li­hood that ev­i­dence E would oc­cur even if H were false" would be P(E|~H). P(H) is your prior, the prob­a­bil­ity be­fore con­sid­er­ing any ev­i­dence E, not the prob­a­bil­ity in ab­sence of any ev­i­dence. The con­sid­er­a­tions you list un­der ev­i­dence against are of the sort you would make when de­ter­min­ing the pri­ors, ask­ing "What is the like­li­hood that Bush is a twit if H were true?" and so on would be very difficult to set prob­a­bil­ities for, you CAN threat it that way but it's far from straight­for­ward.

Ac­tu­ally I have never seen a non-triv­ial ex­am­ple of this sort of anal­y­sis for this sort of real word prob­lem done right on this site.

Ac­tu­ally I have never seen a non-triv­ial ex­am­ple of this sort of anal­y­sis for this sort of real word prob­lem done right on this site.

H = this sort of anal­y­sis is practical

E = user FAWS has not seen any ex­am­ple of this sort of anal­y­sis done right.

P(H)=0.9 smart peo­ple like Eliezer seem to praise Bayesian think­ing, and peo­ple ask for pri­ors and so on.

P(E|H)= 0.3 I haven’t read ev­ery com­ment, prob­a­bly not even 10%, but if this is used any­where it would be here, and if it’s prac­ti­cal it should be used at least some­what reg­u­larly.

P(E|~H) =0.9 Might still be done even if im­prac­ti­cal when it’s a point of pride and /​ or group iden­ti­fi­ca­tion, which could be ar­gued to be the case.

Calcu­lat­ing the pos­te­rior prob­a­bil­ity P(H|E):

P(H|E) = P(H&E)/​P(E)= P(H)*P(E|H)/​P(E)= P(H)*P(E|H)/​(P(E|H)*P(H)+P(E|~H)\P(~H))= 0.9 * 0.3 /​(0.3 * 0.9 + 0.9 * 0.1)= 0.75

• I’m al­lowed to be­lieve what­ever I want; I’m just not al­lowed to try to con­vince you of it un­less I have a ra­tio­nal ar­gu­ment.

Isn’t this what Bayesi­anism is all about—reach­ing the most likely con­clu­sion in the face of weak or in­con­clu­sive ev­i­dence? Or am I mi­s­un­der­stand­ing some­thing?

The best source to look at here is Prob­a­bil­ity is Sub­jec­tively Ob­jec­tive. You can­not (in the bayesian sense) be­lieve what­ever you ‘want’. There is pre­cisely one set of be­liefs to which you are epistem­i­cally en­ti­tled given your cur­rent ev­i­dence even though I are obliged to form a differ­ent set of be­liefs given what I have been ex­posed to.

Reach­ing the most likely con­clu­sion while un­cer­tain yes. But that doesn't mean be­liev­ing things with­out ev­i­dence.

Reach­ing the most likely con­clu­sion while un­cer­tain yes. But that doesn’t mean be­liev­ing things with­out ev­i­dence.

• (Some folks have ex­pressed dis­ap­proval of this con­ver­sa­tion con­tin­u­ing in this thread; iron­i­cally, though, it’s be­com­ing more and more an ex­plicit les­son in Bayesi­anism—as this com­ment in par­tic­u­lar will demon­strate. Nev­er­the­less, af­ter this com­ment, I am will­ing to move it el­se­where, if peo­ple in­sist.)

Again, I think this ques­tion is a di­ver­sion from what I have been ar­gu­ing; its truth or false­ness does not sub­stan­tially af­fect the truth or false­ness of my ac­tual claims (as op­posed to be­liefs men­tioned in pass­ing)

You’re in Bayes-land here, not a de­bat­ing so­ciety. Beliefs are what we’re in­ter­ested in. There’s no dis­tinc­tion be­tween an ar­gu­ment that a cer­tain point of view should be taken se­ri­ously and an ar­gu­ment that the point of view in ques­tion has a sig­nifi­cant prob­a­bil­ity of be­ing true. If you want to make a case for the former, you’ll nec­es­sar­ily have to make a case for the lat­ter.

That said, I made a start at a Bayesian anal­y­sis, but ran out of men­tal swap-space. If some­one wants to sug­gest what I need to do next, I might be able to do it.

Here’s how you do a Bayesian anal­y­sis: you start with a prior prob­a­bil­ity P(H). Then you con­sider how much more likely the ev­i­dence is to oc­cur if your hy­poth­e­sis is true (P(E|H)) than it is in gen­eral (P(E)) -- that is, you calcu­late P(E|H)/​P(E). Mul­ti­ply­ing this “strength of ev­i­dence” ra­tio P(E|H)/​P(E) by the prior prob­a­bil­ity P(H) gives you your pos­te­rior (up­dated) prob­a­bil­ity P(H|E).

Alter­na­tively, you could think in terms of odds: start­ing with the prior odds P(H)/​P(~H), and con­sid­er­ing how much more likely the ev­i­dence is to oc­cur if your hy­poth­e­sis is true (P(E|H)) than if it is false (P(E|~H)); the ra­tio P(E|H)/​P(E|~H) is called the “like­li­hood ra­tio” of the ev­i­dence. Mul­ti­ply­ing the prior odds by the like­li­hood ra­tio gives you the pos­te­rior odds P(H|E)/​P(~H|E).

One of the two ques­tions you need to an­swer is: by what fac­tor do you think the ev­i­dence raises the prob­a­bil­ity/​odds of your hy­poth­e­sis be­ing true? Are we talk­ing twice as likely? Ten times? A hun­dred times?

If you know that, plus your cur­rent es­ti­mate of how likely your hy­poth­e­sis is, di­vi­sion will tell you what your prior was—which is the other ques­tion you need to an­swer.

Is there enough in­for­ma­tion there to calcu­late some odds, or are there still bits miss­ing?

If there’s enough in­for­ma­tion for you to have a be­lief, then there’s enough in­for­ma­tion to calcu­late the odds. Be­cause, if you’re a Bayesian, that’s what these num­bers rep­re­sent in the first place: your de­gree of be­lief.

I’m not sure what you’re sug­gest­ing by your use of the word “failure” here

“Your failure to dis­miss...” is sim­ply an English-lan­guage lo­cu­tion that means “The fact that you did not dis­miss...”

