Thinking About Filtered Evidence Is (Very!) Hard

The con­tent of this post would not ex­ist if not for con­ver­sa­tions with Zack Davis, and owes some­thing to con­ver­sa­tions with Sam Eisen­stat.

There’s been some talk about filtered ev­i­dence re­cently. I want to make a math­e­mat­i­cal ob­ser­va­tion which causes some trou­ble for the Bayesian treat­ment of filtered ev­i­dence. [OK, when I started writ­ing this post, it was “re­cently”. It’s been on the back burner for a while.]

This is also a con­tinu­a­tion of the line of re­search about trol­ling math­e­mat­i­ci­ans, and hence, rele­vant to log­i­cal un­cer­tainty.

I’m go­ing to be mak­ing a math­e­mat­i­cal ar­gu­ment, but, I’m go­ing to keep things rather in­for­mal. I think this in­creases the clar­ity of the ar­gu­ment for most read­ers. I’ll make some com­ments on proper for­mal­iza­tion at the end.

Alright, here’s my ar­gu­ment.

Ac­cord­ing to the Bayesian treat­ment of filtered ev­i­dence, you need to up­date on the fact that the fact was pre­sented to you, rather than the raw fact. This in­volves rea­son­ing about the al­gorithm which de­cided which facts to show you. The point I want to make is that this can be in­cred­ibly com­pu­ta­tion­ally difficult, even if the al­gorithm is so sim­ple that you can pre­dict what it will say next. IE, I don’t need to rely on any­thing like “hu­mans are too com­plex for hu­mans to re­ally treat as well-speci­fied ev­i­dence-fil­ter­ing al­gorithms”.

For my re­sult, we imag­ine that a Bayesian rea­soner (the “listener”) is listen­ing to a se­ries of state­ments made by an­other agent (the “speaker”).

First, I need to es­tab­lish some ter­minol­ogy:

As­sump­tion 1. A listener will be said to have a rich hy­poth­e­sis space if the listener as­signs some prob­a­bil­ity to the speaker enu­mer­at­ing any com­putably enu­mer­able set of state­ments.

The in­tu­ition be­hind this as­sump­tion is sup­posed to be: due to com­pu­ta­tional limi­ta­tions, the listener may need to re­strict to some set of eas­ily com­puted hy­pothe­ses; for ex­am­ple, the hy­pothe­ses might be poly-time or even log-poly. This pre­vents hy­pothe­ses such as “the speaker is giv­ing us the bits of a halt­ing or­a­cle in or­der”, as well as “the speaker has a lit­tle more pro­cess­ing power than the listener”. How­ever, the hy­poth­e­sis space is not so re­stricted as to limit the world to be­ing a finite-state ma­chine. The listener can imag­ine the speaker prov­ing com­pli­cated the­o­rems, so long as it is done suffi­ciently slowly for the listener to keep up. In such a model, the listener might imag­ine the speaker stay­ing quiet for quite a long time (ob­serv­ing the null string over and over, or some sim­ple sen­tence such as 1=1) while a long com­pu­ta­tion com­pletes; and only then mak­ing a com­pli­cated claim.

This is also not to say that I as­sume my listener con­sid­ers only hy­pothe­ses in which it can 100% keep up with the speaker’s rea­son­ing. The listener can also have prob­a­bil­is­tic hy­pothe­ses which rec­og­nize its in­abil­ity to perfectly an­ti­ci­pate the speaker. I’m only point­ing out that my re­sult does not rely on a speaker which the listener can’t keep up with.

What it does rely on is that there are not too many re­stric­tions on what the speaker even­tu­ally says.

As­sump­tion 2. A listener be­lieves a speaker to be hon­est if the listener dis­t­in­guishes be­tween “X” and “the speaker claims X at time t” (aka “claims-X”), and also has be­liefs such that P(X| claims-X)=1 when P(claims-X) > 0.

This as­sump­tion is, ba­si­cally, say­ing that the agent trusts its ob­ser­va­tions; the speaker can filter ev­i­dence, but the speaker can­not falsify ev­i­dence.

Maybe this as­sump­tion seems quite strong. I’ll talk about re­lax­ing it af­ter I sketch the cen­tral re­sult.

As­sump­tion 3. A listener is said to have min­i­mally con­sis­tent be­liefs if each propo­si­tion X has a nega­tion X*, and P(X)+P(X*)1.

The idea be­hind min­i­mally con­sis­tent be­liefs is that the listener need not be log­i­cally om­ni­scient, but does avoid out­right con­tra­dic­tions. This is im­por­tant, since as­sum­ing log­i­cal om­ni­science would throw out com­putabil­ity from the start, mak­ing any com­pu­ta­tional-difficulty re­sult rather bor­ing; but to­tally throw­ing out logic would make my re­sult im­pos­si­ble. Min­i­mal con­sis­tency keeps an ex­tremely small amount of logic, but, it is enough to prove my re­sult.

The­o­rem(/​Con­jec­ture). It is not pos­si­ble for a Bayesian rea­soner, ob­serv­ing a se­quence of re­marks made by a speaker, to si­mul­ta­neously:

  • Have a rich hy­poth­e­sis space.

  • Believe the speaker to be hon­est.

  • Have min­i­mally con­sis­tent be­liefs.

  • Have com­putable be­liefs.

Proof sketch. Sup­pose as­sump­tions 1-3. Thanks to the rich hy­poth­e­sis space as­sump­tion, the listener will as­sign some prob­a­bil­ity to the speaker enu­mer­at­ing the­o­rems of PA (Peano Arith­metic). Since this hy­poth­e­sis makes dis­tinct pre­dic­tions, it is pos­si­ble for the con­fi­dence to rise above 50% af­ter finitely many ob­ser­va­tions. At that point, since the listener ex­pects each the­o­rem of PA to even­tu­ally be listed, with prob­a­bil­ity > 50%, and the listener be­lieves the speaker, the listener must as­sign > 50% prob­a­bil­ity to each the­o­rem of PA! But this im­plies that the listener’s be­liefs are not com­putable, since if we had ac­cess to them we could sep­a­rate the­o­rems of PA from con­tra­dic­tions by check­ing whether a sen­tence’s prob­a­bil­ity is > 50%.

So goes my ar­gu­ment.

What does the ar­gu­ment ba­si­cally es­tab­lish?

The ar­gu­ment is sup­posed to be sur­pris­ing, be­cause min­i­mally con­sis­tent be­liefs are com­pat­i­ble with com­putable be­liefs; and rich hy­poth­e­sis space is com­pat­i­ble with be­liefs which are com­putable on ob­ser­va­tions alone; yet, when com­bined with a be­lief that the speaker is hon­est, we get an in­com­putabil­ity re­sult.

My take-away from this re­sult is that we can­not si­mul­ta­neously use our un­re­stricted abil­ity to pre­dict sen­sory ob­ser­va­tions ac­cu­rately and have com­pletely co­her­ent be­liefs about the world which pro­duces those sen­sory ob­ser­va­tions, at least if our “bridge” be­tween the sen­sory ob­ser­va­tions and the world in­cludes some­thing like lan­guage (whereby sen­sory ob­ser­va­tions con­tain com­plex “claims” about the world).

This is be­cause us­ing the full force of our abil­ity to pre­dict sen­sory ex­pe­riences in­cludes some hy­pothe­ses which even­tu­ally make sur­pris­ing claims about the world, by in­cre­men­tally com­put­ing in­creas­ingly com­pli­cated in­for­ma­tion (like a the­o­rem prover which slowly but in­evitably pro­duces all the­o­rems of PA). In other words, a rich sen­sory model con­tains im­plicit in­for­ma­tion about the world which we can­not im­me­di­ately com­pute the con­se­quences of (in terms of prob­a­bil­ities about the hid­den vari­ables out there in the world). This “im­plicit” in­for­ma­tion can be nec­es­sar­ily im­plicit, in the same way that PA is nec­es­sar­ily in­com­plete.

To give a non-log­i­cal ex­am­ple: sup­pose that your mo­ment-to-mo­ment an­ti­ci­pa­tions of your re­la­tion­ship with a friend are pretty ac­cu­rate. It might be that if you roll those an­ti­ci­pa­tions for­ward, you in­evitably be­come closer and closer un­til the friend­ship be­comes a ro­mance. How­ever, you can’t nec­es­sar­ily pre­dict that right now; even though the an­ti­ci­pa­tion of each next mo­ment is rel­a­tively easy, you face a halt­ing-prob­lem-like difficulty if you try to an­ti­ci­pate what the even­tual be­hav­ior of your re­la­tion­ship is. Be­cause our abil­ity to look ahead is bounded, each new con­se­quence can be pre­dictable with­out the over­all out­come be­ing pre­dictable.

Thus, in or­der for an agent to use the full force of its com­pu­ta­tional power on pre­dict­ing sen­sory ob­ser­va­tions, it must have par­tial hy­pothe­ses—similar to the way log­i­cal in­duc­tion con­tains traders which fo­cus only on spe­cial classes of sen­tences, or Vanessa’s in­com­plete Bayesi­anism con­tains in­com­plete hy­pothe­ses which do not try to pre­dict ev­ery­thing.

So, this is an ar­gu­ment against strict Bayesi­anism. In par­tic­u­lar, it is an ar­gu­ment against strict Bayesi­anism as a model of up­dat­ing on filtered ev­i­dence! I’ll say more about this, but first, let’s talk about pos­si­ble holes in my ar­gu­ment.

Here are some con­cerns you might have with the ar­gu­ment.

One might pos­si­bly ob­ject that the perfect hon­esty re­quire­ment is un­re­al­is­tic, and there­fore con­clude that the re­sult does not ap­ply to re­al­is­tic agents.

  • I would point out that the as­sump­tion is not so im­por­tant, so long as the listener can con­ceive of the pos­si­bil­ity of perfect hon­esty, and as­signs it nonzero prob­a­bil­ity. In that case, we can con­sider P(X|hon­esty) rather than P(X). Estab­lish­ing that some con­di­tional be­liefs are not com­putable seems similarly damn­ing.

  • Fur­ther­more, be­cause the “speaker” is serv­ing the role of our ob­ser­va­tions, the perfect hon­esty as­sump­tion is just a ver­sion of P(X|ob­serve-X)=1. IE, ob­serv­ing X gives us X. This is true in typ­i­cal filtered-ev­i­dence se­tups; IE, filtered ev­i­dence can be mis­lead­ing, but it can’t be false.

  • How­ever, one might fur­ther ob­ject that agents need not be able to con­ceive of “perfect hon­esty”, be­cause this as­sump­tion has an un­re­al­is­ti­cally aphys­i­cal, “perfectly log­i­cal” char­ac­ter. One might say that all ob­ser­va­tions are im­perfect; none are perfect ev­i­dence of what is ob­served. In do­ing so, we can get around my re­sult. This has some similar­ity to the as­ser­tion that zero is not a valid prob­a­bil­ity. I don’t find this re­sponse par­tic­u­larly ap­peal­ing, but I also don’t have a strong ar­gu­ment against it.

Along similar lines, one might ob­ject that the re­sult de­pends on an ex­am­ple (“the speaker is enu­mer­at­ing the­o­rems”) which comes from logic, as op­posed to any re­al­is­tic phys­i­cal world-model. The ex­am­ple does have a “log­i­cal” char­ac­ter—we’re not ex­plic­itly rea­son­ing about ev­i­dence-fil­ter­ing al­gorithms in­ter­fac­ing with an em­piri­cal world and se­lec­tively tel­ling us some things about it. How­ever, I want to point out that I’ve as­sumed ex­tremely lit­tle “logic”—the only thing I use is that you don’t ex­pect a sen­tence and its nega­tion to both be true. Ob­ser­va­tions cor­re­spond­ing to the­o­rems of PA are just an ex­am­ple used to prove the re­sult. The fact that P(X) can be very hard to com­pute even when we re­strict to eas­ily com­puted P(claims-X) is very gen­eral; even if we do re­strict at­ten­tion to finite-state-ma­chine hy­pothe­ses, we are in P-vs-NP ter­ri­tory.

What does this re­sult say about log­i­cal un­cer­tainty?

Sam’s un­trol­lable prior beat the trol­lable-math­e­mat­i­cian prob­lem by the usual Bayesian trick of ex­plic­itly mod­el­ing the se­quence of ob­ser­va­tions—up­dat­ing on I-ob­served-X-at-this-time rather than only X. (See also the illus­trated ex­pla­na­tion.)

How­ever, it did so at a high cost: Sam’s prior is dumb. It isn’t able to perform rich Oc­cam-style in­duc­tion to di­v­ine the hid­den rules of the uni­verse. It doesn’t be­lieve in hid­den rules; it be­lieves “if there’s a law of na­ture con­strain­ing ev­ery­thing to fit into a pat­tern, I will even­tu­ally ob­serve that law di­rectly.” It shifts its prob­a­bil­ities when it makes ob­ser­va­tions, but, in some sense, it doesn’t shift them very much; and in­deed, that prop­erty seems key to the com­putabil­ity of that prior.

So, a nat­u­ral ques­tion arises: is this an es­sen­tial prop­erty of an un­trol­lable prior? Or can we con­struct a “rich” prior which en­ter­tains hy­pothe­ses about the deep struc­ture of the uni­verse, learn­ing about them in an Oc­cam-like way, which is nonethe­less still un­trol­lable?

The pre­sent re­sult is a first at­tempt at an an­swer: given my (ad­mit­tedly a bit odd) no­tion of rich hy­poth­e­sis space, it is in­deed im­pos­si­ble to craft a com­putable prior over logic with some min­i­mal good prop­er­ties (like be­liev­ing what’s proved to it). I don’t di­rectly ad­dress a trol­la­bil­ity-type prop­erty, un­for­tu­nately; but I do think I get close to the heart of the difficulty: a “deep” abil­ity to adapt in or­der to pre­dict data bet­ter stands in con­tra­dic­tion with com­putabil­ity of the la­tent prob­a­bil­ity-of-a-sen­tence.

So, how should we think about filtered ev­i­dence?

Ortho­dox Bayesian (OB): We can always re­solve the prob­lem by dis­t­in­guish­ing be­tween X and “I ob­serve X”, and con­di­tion­ing on all the ev­i­dence available. Look how nicely it works out in the Monty Hall prob­lem and other sim­ple ex­am­ples we can write down.

Skep­ti­cal Cri­tique (SC): You’re ig­nor­ing the ar­gu­ment. You can’t han­dle cases where run­ning your model for­ward is eas­ier than an­swer­ing ques­tions about what hap­pens even­tu­ally; in those cases, many of your be­liefs will ei­ther be un­com­putable or in­co­her­ent.

OB: That’s not a prob­lem for me. Bayesian ideals of ra­tio­nal­ity ap­ply to the log­i­cally om­ni­scient case. What they give you is an ideal­ized no­tion of ra­tio­nal­ity, which defines the best an agent could do.

SC: Really? Surely your Bayesian per­spec­tive is sup­posed to have some solid im­pli­ca­tions for finite be­ings who are not log­i­cally om­ni­scient. I see you giv­ing out all this ad­vice to ma­chine learn­ing pro­gram­mers, statis­ti­ci­ans, doc­tors, and so on.

OB: Sure. We might not be able to achieve perfect Bayesian ra­tio­nal­ity, but when­ever we see some­thing less Bayesian than it could be, we can cor­rect it. That’s how we get closer to the Bayesian ideal!

SC: That sounds like cargo-cult Bayesi­anism to me. If you spot an in­con­sis­tency, it mat­ters how you cor­rect it; you don’t want to go around cor­rect­ing for the plan­ning fal­lacy by try­ing to do ev­ery­thing faster, right? Similarly, if your rule-of-thumb for the fre­quency of primes is a lit­tle off, you don’t want to add com­pos­ite num­bers to your list of primes to fudge the num­bers.

OB: No one would make those mis­takes.

SC: That’s be­cause there are, in fact, ra­tio­nal­ity prin­ci­ples which ap­ply. You don’t just cargo-cult Bayesi­anism by cor­rect­ing in­con­sis­ten­cies any old way. A bound­edly ra­tio­nal agent has ra­tio­nal­ity con­straints which ap­ply, guid­ing it to bet­ter ap­prox­i­mate “ideal” ra­tio­nal­ity. And those ra­tio­nal­ity con­straints don’t ac­tu­ally need to re­fer to the “ideal” ra­tio­nal­ity. The ra­tio­nal­ity con­straints are about the up­date, not in the ideal which the up­date limits to.

OB: Maybe we can imag­ine some sort of finite Bayesian rea­soner, who treats log­i­cal un­cer­tainty as a black box, and fol­lows the ev­i­dence to­ward un­bounded-Bayes-op­ti­mal­ity in a bounded-Bayes-op­ti­mal way...

SC: Maybe, but I don’t know of a good pic­ture which looks like that. The pic­ture we do have is given by log­i­cal in­duc­tion: we learn to avoid Dutch books by notic­ing lots of Dutch books against our­selves, and grad­u­ally be­com­ing less ex­ploitable.

OB: That sounds a lot like the pic­ture I gave.

SC: Sure, but it’s more pre­cise. And more im­por­tantly, it’s not a Bayesian up­date—there is a kind of fam­ily re­sem­blance in the math, but it isn’t learn­ing through a Bayesian up­date in a strict sense.

OB: Ok, so what does all this have to do with filtered ev­i­dence? I still don’t see why the way I han­dle that is wrong.

SC: Well, isn’t the stan­dard Bayesian an­swer a lit­tle sus­pi­cious? The num­bers con­di­tion­ing on X don’t come out to what you want, so you in­tro­duce some­thing new to con­di­tion on, ob­serve-X, which can have differ­ent con­di­tional prob­a­bil­ities. Can’t you get what­ever an­swer you want, that way?

OB: I don’t think so? The num­bers are dic­tated by the sce­nario. The Monty Hall prob­lem has a right an­swer, which de­ter­mines how you should play the game if you want to win. You can’t fudge it with­out chang­ing the game.

SC: Fair enough. But I still feel funny about some­thing. Isn’t there an in­finite regress? We jump to up­dat­ing on ob­serve-X when X is filtered. What if ob­serve-X is filtered? Do we jump to ob­serve-ob­serve-X? What if we can con­struct a “meta Monty-Hall prob­lem” where it isn’t suffi­cient to con­di­tion on ob­serve-X?

OB: If you ob­serve, you ob­serve that you ob­serve. And if you ob­serve that you ob­serve, then you must ob­serve. So there’s no differ­ence.

SC: If you’re log­i­cally perfect, sure. But a bound­edly ra­tio­nal agent need not re­al­ize im­me­di­ately that it ob­served X. And cer­tainly it need not re­al­ize and up­date on the en­tire se­quence “X”, “I ob­served X”, “I ob­served that I ob­served X”, and so on.

OB: Ok...

SC: To give a sim­ple ex­am­ple: call a sen­sory im­pres­sion “sub­limi­nal” when it is weak enough that only X is reg­istered. A stronger im­pres­sion also reg­isters “ob­serve-X”, mak­ing the sen­sory im­pres­sion more “con­sciously available”. Then, we can­not prop­erly track the effects of filtered ev­i­dence for sub­limi­nal im­pres­sions. Sublimi­nal im­pres­sions would always reg­ister as if they were un­filtered ev­i­dence.

OB: …no.

SC: What’s wrong?

OB: An agent should come with a ba­sic no­tion of sen­sory ob­ser­va­tion. If you’re a hu­man, that could be ac­ti­va­tion in the nerves run­ning to sen­sory cor­tex. If you’re a ma­chine, it might be RBG pixel val­ues com­ing from a cam­era. That’s the only thing you ever have to con­di­tion on; all your ev­i­dence has that form. Ob­serv­ing a rab­bit means get­ting pixel val­ues cor­re­spond­ing to a rab­bit. We don’t start by con­di­tion­ing on “rab­bit” and then patch things by adding “ob­serve-rab­bit” as an ad­di­tional fact. We con­di­tion on the com­pli­cated ob­ser­va­tion cor­re­spond­ing to the rab­bit, which hap­pens to, by in­fer­ence, tell us that there is a rab­bit.

SC: That’s… a bit frus­trat­ing.

OB: How so?

SC: The core Bayesian doc­trine is the Kol­mogorov ax­ioms, to­gether with the rule that we up­date be­liefs via Bayesian con­di­tion­ing. A com­mon ex­ten­sion of Bayesian doc­trine grafts on a dis­tinc­tion be­tween ob­ser­va­tions and hy­pothe­ses, nam­ing some spe­cial events as ob­serv­able, and oth­ers as non-ob­serv­able hy­pothe­ses. I want you to no­tice when you’re us­ing the ex­ten­sion rather than the core.

OB: How is that even an ex­ten­sion? It just sounds like a spe­cial case, which hap­pens to ap­ply to just about any or­ganism.

SC: But you’re re­strict­ing the rule “up­date be­liefs by Bayesian con­di­tion­ing”—you’re say­ing that it only works for ob­ser­va­tions, not for other kinds of events.

OB: Sure, but you could never up­date on those other kinds of events any­way.

SC: Really, though? Can’t you? Some in­for­ma­tion you up­date on comes from sen­sory ob­ser­va­tions, but other in­for­ma­tion comes from rea­son­ing. Some­thing like a feed­for­ward neu­ral net­work just com­putes one big func­tion on sense-data, and can prob­a­bly be mod­eled in the way you’re sug­gest­ing. But some­thing like a mem­ory net­work has a non­triv­ial rea­son­ing com­po­nent. A Bayesian can’t han­dle “up­dat­ing” on in­ter­nal calcu­la­tions it’s com­pleted; at best they’re treated as if they’re black boxes whose out­puts are “ob­ser­va­tions” again.

OB: Ok, I see you’re back­ing me into a cor­ner with log­i­cal un­cer­tainty stuff again. I still feel like there should be a Bayesian way to han­dle it. But what does this have to do with filtered ev­i­dence?

SC: The whole point of the ar­gu­ment we started out dis­cussing is that if you have this kind of ob­ser­va­tion/​hy­poth­e­sis di­vide, and have suffi­ciently rich ways of pre­dict­ing sen­sory ex­pe­riences, and re­main a clas­si­cal Bayesian, then your be­liefs about the hid­den in­for­ma­tion are not go­ing to be com­putable, even if your hy­pothe­ses them­selves are easy to com­pute. So we can’t re­al­is­ti­cally rea­son about the hid­den in­for­ma­tion just by Bayes-con­di­tion­ing on the ob­serv­ables. The only way to main­tain both com­putabil­ity and a rich hy­poth­e­sis space un­der these con­di­tions is to be less Bayesian, al­low­ing for more in­con­sis­ten­cies in your be­liefs. Which means, rea­son­ing about filtered ev­i­dence doesn’t re­duce to ap­ply­ing Bayes’ Law.

OB: That… seems wrong.

SC: Now we’re get­ting some­where!

All that be­ing said, rea­son­ing about filtered ev­i­dence via Bayes’ Law in the or­tho­dox way still seems quite prac­ti­cally com­pel­ling. The per­spec­tive SC puts for­ward in the above di­alogue would be much more com­pel­ling if I had more prac­ti­cal/​in­ter­est­ing “failure-cases” for Bayes’ Law, and more to say about al­ter­na­tive ways of rea­son­ing which work bet­ter for those cases. A real “meta Monty-Hall prob­lem”.

Ar­guably, log­i­cal in­duc­tion doesn’t use the “con­di­tion on the fact that X was ob­served” solu­tion:

  • Rather than the usual se­quen­tial pre­dic­tion model, log­i­cal in­duc­tion ac­com­mo­dates in­for­ma­tion com­ing in for any sen­tence, in any or­der. So, like the “core of Bayesi­anism” men­tioned by SC, it main­tains its good prop­er­ties with­out spe­cial as­sump­tions about what is be­ing con­di­tioned on. This is in con­trast to, e.g., Solomonoff in­duc­tion, which uses the se­quen­tial pre­dic­tion model.

  • In par­tic­u­lar, in Monty Hall, al­though there is a dis­tinc­tion be­tween the sen­tence “there is a goat be­hind door 3” and “the LI dis­cov­ers, at time t, that there is a goat be­hind door 3″ (or suit­able ar­ith­me­ti­za­tions of these sen­tences), we can con­di­tion on the first rather than the sec­ond. A log­i­cal in­duc­tor would learn to re­act to this in the ap­pro­pri­ate way, since do­ing oth­er­wise would leave it Dutch-book­able.

One might ar­gue that the traders are im­plic­itly us­ing the stan­dard Bayesian “con­di­tion on the fact that X was ob­served” solu­tion in or­der to ac­com­plish this. Or that the up­date an LI performs upon see­ing X is always that it saw X. But to me, this feels like stretch­ing things. The core of the Bayesian method for han­dling filtered ev­i­dence is to dis­t­in­guish be­tween X and the ob­ser­va­tion of X, and up­date on the lat­ter. A log­i­cal in­duc­tor doesn’t ex­plic­itly fol­low this, and in­deed ap­pears to vi­o­late it. Part of the usual idea seems to be that a Bayesian needs to “up­date on all the ev­i­dence”—but a log­i­cal in­duc­tor just gets a black-box re­port of X, with­out any in­for­ma­tion on how X was con­cluded or where it came from. So in­for­ma­tion can be ar­bi­trar­ily ex­cluded, and the log­i­cal in­duc­tor will still do its best (which, in the case of Monty Hall, ap­pears to be suffi­cient to learn the cor­rect re­sult).

A no­table thing about the stan­dard sort of cases, where the Bayesian way of rea­son­ing about filtered ev­i­dence is en­tirely ad­e­quate, is that you have a gears-level model of what is go­ing on—a causal model, which you can turn the crank on. If you run such a model “for­ward”—in causal or­der—you com­pute the hid­den causes be­fore you com­pute the filtered ev­i­dence about them. This makes it sound as if pre­dict­ing the hid­den vari­ables should be eas­ier than pre­dict­ing the sen­sory ob­ser­va­tions; and, cer­tainly makes it hard to vi­su­al­ize the situ­a­tion where it is much much harder.

How­ever, even in cases where we have a nice causal model like that, in­fer­ring the hid­den vari­ables from what is ob­served can be in­tractably com­pu­ta­tion­ally difficult, since it re­quires re­verse-en­g­ineer­ing the com­pu­ta­tion from its out­puts. For­ward-sam­pling causal mod­els is always effi­cient; run­ning them back­wards, not so.

So even with causal mod­els, there can be good rea­son to en­gage more di­rectly with log­i­cal un­cer­tainty rather than use pure Bayesian meth­ods.

How­ever, I sus­pect that one could con­struct a much more con­vinc­ing ex­am­ple if one were to use par­tial mod­els ex­plic­itly in the con­struc­tion of the ex­am­ple. Per­haps some­thing in­volv­ing an “out­side view” with strong em­piri­cal sup­port, but lack­ing a known “in­side view” (lack­ing a sin­gle con­sis­tent causal story).

Un­for­tu­nately, such an ex­am­ple es­capes me at the mo­ment.

Fi­nally, some notes on fur­ther for­mal­i­sa­tion of my main ar­gu­ment.

The listener is sup­posed to have prob­a­bil­is­tic be­liefs of the stan­dard va­ri­ety—an event space which is a sigma-alge­bra, and which has a P(event) obey­ing the Kol­mogorov ax­ioms. In par­tic­u­lar, the be­liefs are sup­posed to be perfectly log­i­cally con­sis­tent in the usual way.

How­ever, in or­der to al­low log­i­cal un­cer­tainty, I’m as­sum­ing that there is some em­bed­ding of ar­ith­metic; call it E[]. So, for each ar­ith­metic sen­tence S, there is an event E[S]. Ne­ga­tion gets mapped to the “star” of an event: E[¬S] = (E[S])*. This need not be the com­pli­ment of the event E[S]. Similarly, the em­bed­ding E[AB] need not be E[A]E[B]; E[AB] need not be E[A]E[B]; and so on. That’s what al­lows for log­i­cal non-om­ni­science—the prob­a­bil­ity dis­tri­bu­tion doesn’t nec­es­sar­ily know that E[AB] should act like E[A]E[B], and so on.

The more we im­pose re­quire­ments which force the em­bed­ding to act like it should, the more log­i­cal struc­ture we are forc­ing onto the be­liefs. If we im­pose very much con­sis­tency, how­ever, then that would already im­ply un­com­putabil­ity and the cen­tral re­sult would not be in­ter­est­ing. So, the “min­i­mal con­sis­tency” as­sump­tion re­quires very lit­tle of our em­bed­ding. Still, it is enough for the em­bed­ding of PA to cause trou­ble in con­nec­tion with the other as­sump­tions.

In ad­di­tion to all this, we have a dis­t­in­guished set of events which count as ob­ser­va­tions. A first pass on this is that for any event A, there is an as­so­ci­ated event obs(A) which is the ob­ser­va­tion of A. But I do worry that this in­cludes more ob­ser­va­tion events than we want to re­quire. Some events A do not cor­re­spond to sen­tences; sigma-alge­bras are closed un­der countable unions. If we think of the ob­ser­va­tion events as claims made by the speaker, it doesn’t make sense to imag­ine the speaker claiming a countable union of sen­tences (par­tic­u­larly not the union of an un­com­putable col­lec­tion).

So, more con­ser­va­tively, we might say that for events E[S], that is events in the image of the em­bed­ding, we also have an event obs(E[S]). In any case, this is closer to the min­i­mal thing we need to es­tab­lish the re­sult.

I don’t know if the ar­gu­ment works out ex­actly as I sketched; it’s pos­si­ble that the rich hy­poth­e­sis as­sump­tion needs to be “and also pos­i­tive weight on a par­tic­u­lar enu­mer­a­tion”. Given that, we can ar­gue: take one such enu­mer­a­tion; as we con­tinue get­ting ob­ser­va­tions con­sis­tent with that ob­ser­va­tion, the hy­poth­e­sis which pre­dicts it loses no weight, and hy­pothe­ses which (even­tu­ally) pre­dict other things must (even­tu­ally) lose weight; so, the up­dated prob­a­bil­ity even­tu­ally be­lieves that par­tic­u­lar enu­mer­a­tion will con­tinue with prob­a­bil­ity > 12.

On the other hand, that patched defi­ni­tion is cer­tainly less nice. Per­haps there is a bet­ter route.