The Equation of Knowledge

My book The Equa­tion of Knowl­edge has just been pub­lished at CRC Press, and I’m guess­ing that it may be of in­ter­est to read­ers of LessWrong. The book aims to be a some­what ac­cessible and very com­plete in­tro­duc­tion to Bayesi­anism. No prior knowl­edge is needed; though some sec­tions re­quire an im­por­tant fa­mil­iar­ity with math­e­mat­ics and com­puter sci­ence. The book has been de­signed so that these sec­tions can be skipped with­out hin­der­ing the read­ing.

The aim of the book is to (1) high­light the most com­pel­ling ar­gu­ments, the­o­rems and em­piri­cal ev­i­dence in fa­vor of Bayesi­anism, (2) pre­sent nu­mer­ous ap­pli­ca­tions in a very wide va­ri­ety of do­mains, and (3) dis­cuss solu­tions for prag­matic Bayesi­anism with limited com­pu­ta­tional re­sources. Please find here a pro­mo­tional 5-minute video of the book.

In this post, I will briefly sketch the out­line of the book. Just like the book, I’ll di­vide the post in four sec­tions.

Pure Bayesianism

The first sec­tion of the book is a gen­tle in­tro­duc­tion to pure Bayesi­anism, which is defined as obey­ing strictly to the laws of prob­a­bil­ity the­ory. The key equa­tion is ev­i­dently Bayes rule, which I like to write as fol­lows:

.

This equa­tion says that the crit­i­cal vari­able is , that is, the cre­dence of the­ory given data . Com­put­ing this is ar­guably the end goal of Bayes rule. Bayes rule thus does not quite aim to dis­t­in­guish truth from false­hood; it rather mo­ti­vates us to as­sign quan­ti­ta­tive mea­sures of re­li­a­bil­ity to differ­ent the­o­ries, given ob­served data. It sug­gests that we should re­place ques­tions like “is true?” by “how cred­ible is ?” (or per­haps even by “how much should I trust the pre­dic­tions of the­ory ?”). I ar­gue in the book that this is a great way to im­prove the qual­ity of many de­bates.

Bayes rule then goes on tel­ling us how to com­pute the cre­dence of a the­ory given em­piri­cal data. Im­por­tantly, on the right hand side, we have the term which mea­sures the cre­dence of the the­ory prior to the ob­ser­va­tion of data . This is crit­i­cal. A the­ory which was ex­tremely un­likely be­fore we knew will likely re­main un­likely even given , un­less D is over­whelm­ingly com­pel­ling. This cor­re­sponds to Carl Sa­gan’s phrase “ex­traor­di­nary claims re­quire ex­traor­di­nary ev­i­dence” (which was an­a­lyzed math­e­mat­i­cally by Laplace back in 1814!).

Bayes rule then tells us to up­date our prior be­liefs based on ob­served data de­pend­ing on how well the­ory pre­dicts data . Essen­tially, we can see any the­ory as a bet­ting in­di­vi­d­ual. If bets on , which cor­re­sponds to a large value of , then it should gain cre­dence in . But if the­ory found ob­served data un­likely (i.e. ), then we should de­crease our be­lief in once we ob­serve .

Well, ac­tu­ally, Bayes rule tells us that this up­date also de­pends on how well al­ter­na­tive the­o­ries perform. In­deed, the de­nom­i­na­tor or­ches­trates a sort of com­pe­ti­tion be­tween the differ­ent the­o­ries. In par­tic­u­lar, the cre­dence of the­ory will be de­creas­ing only if its bet is out­performed by the bets of al­ter­na­tive the­o­ries . In par­tic­u­lar, this means that Bayes rule for­bids the anal­y­sis of a the­ory in­de­pen­dently of oth­ers; the cre­dence of a the­ory is only rel­a­tive to the set of al­ter­na­tives.

Chap­ters 2 to 5 of the book de­tails the anal­y­sis of Bayes rule, and illus­trates it through a large num­ber of ex­am­ples, like Sally Clark’s in­fa­mous law­suit, Hem­pel’s raven para­dox, Ein­stein’s dis­cov­ery of gen­eral rel­a­tivity and the Linda prob­lem, among many other ex­am­ples. They also draw con­nec­tions and ten­sions with first-or­der logic, Pop­per’s falsifi­a­bil­ity and null hy­poth­e­sis statis­ti­cal tests.

Chap­ter 6 then dis­cusses the his­tory of Bayesi­anism, which also hints at the im­por­tance of prob­a­bil­ity the­ory in es­sen­tially all hu­man en­deav­ors. Fi­nally, Chap­ter 7 con­cludes the first part of the book, by in­tro­duc­ing Solomonoff’s in­duc­tion, which I call pure Bayesi­anism. In brief, Bayes rule re­quires any the­ory to bet on any imag­in­able ob­serv­able data (for­mally, needs to define a prob­a­bil­ity mea­sure on the space of data, oth­er­wise the quan­tity is ill-defined). Solomonoff’s ge­nius was to sim­ply also de­mand this bet to be com­putable. It turns out that the rest of Solomonoff’s the­ory es­sen­tially beau­tifully falls out from this sim­ple ad­di­tional con­straint.

Ev­i­dently, a lot more ex­pla­na­tions and de­tails can be found in the book!

Ap­plied Bayesianism

The sec­ond sec­tion of the book goes deeper into ap­pli­ca­tions of Bayesi­anism to nu­mer­ous differ­ent fields. Chap­ter 8 dis­cusses the strong con­nec­tion be­tween Bayesi­anism and pri­vacy. After all, if Bayesi­anism is the right the­ory of knowl­edge, it is clearly crit­i­cal to any the­ory on how to pre­vent knowl­edge. And in­deed, the lead­ing con­cept of pri­vacy, namely differ­en­tial pri­vacy, has a very nat­u­ral defi­ni­tion in terms of prob­a­bil­ity the­ory.

Chap­ter 9 dwells on the strong con­nec­tion be­tween Bayesi­anism and eco­nomics, and in par­tic­u­lar game the­ory. No­bel prize win­ner Roger My­er­son once ar­gued that “the unity and scope of mod­ern in­for­ma­tion eco­nomics was found in Harsanyi’s frame­work”. Again, this can be made ev­i­dent by the fact that much of mod­ern eco­nomics fo­cuses on the con­se­quences of in­com­plete (e.g. asym­met­ric) in­for­ma­tion.

Chap­ter 10 moves on to the sur­pris­ingly strong con­nec­tions be­tween Dar­wi­nian evolu­tion and Bayes rule. In par­tic­u­lar, the fa­mous Lotka-Volterra equa­tions for pop­u­la­tion dy­nam­ics fea­tures an in­trigu­ing re­sem­blance with Bayes rule. This re­sem­blance is then ex­ploited to dis­cuss to which ex­tent the spread of ideas within the sci­en­tific com­mu­nity can be com­pared to the growth of the cre­dence in a the­ory for a Bayesian. This al­lows to iden­tify re­li­able rules of thumbs to de­ter­mine when a sci­en­tific con­sen­sus or a (pre­dic­tive) mar­ket prize is cred­ible, and when they are less so.

Chap­ter 11 dis­cusses ex­po­nen­tial growths, which emerge out of re­peated mul­ti­pli­ca­tions. Such growths are crit­i­cal to un­der­stand to have an in­tu­itively feel for Bayes rule, as re­peated Bayesian up­dates are typ­i­cally mul­ti­plica­tive. The chap­ter also draws a fas­ci­nat­ing con­nec­tion be­tween the mul­ti­plica­tive weights up­date al­gorithm and var­i­ants like Ad­aboost, and Bayes rule. It ar­gues that the suc­cess of these meth­ods is no ac­ci­dent; and that their late dis­cov­ery may be due to math­e­mat­i­ci­ans’ poor in­tu­itive un­der­stand­ing of ex­po­nen­tial growth.

Chap­ter 12 pre­sents nu­mer­ous ap­pli­ca­tions of Ock­ham’s ra­zor to avoid er­ro­neous con­clu­sions. It also shows that the prac­ti­cal use­ful­ness of Ock­ham’s ra­zor is in­ti­mately con­nected to the im­por­tance of pri­ors in Bayesian think­ing, as ev­i­denced by the com­pel­ling the­o­rem that says that, un­der mild as­sump­tions, only Bayesian meth­ods are “statis­ti­cally ad­mis­si­ble”. Fi­nally, the chap­ter con­cludes with an­other stun­ning the­o­rem: it can be proved in one line that a ver­sion of Ock­ham’s ra­zor is a the­o­rem un­der Bayesi­anism (I’ll keep this one line se­cret to tease you!).

Chap­ter 13 then stresses the dan­ger of Simp­son’s para­dox and the im­por­tance of con­found­ing vari­ables when an­a­lyz­ing em­piri­cal un­con­trol­led data. After dis­cussing the value and limits of ran­dom­ized con­trol­led tests, I then re­for­mu­late the nec­es­sary anal­y­sis of plau­si­ble con­found­ing vari­ables for data anal­y­sis as the un­avoid­abil­ity of pri­ors to think cor­rectly. The chap­ter closes with some philo­soph­i­cal dis­cus­sions on the on­tol­ogy of these con­found­ing vari­ables.

Prag­matic Bayesianism

Un­for­tu­nately, pure Bayesi­anism de­mands un­rea­son­able com­pu­ta­tional ca­pa­bil­ities. Nor our brains nor our ma­chines have ac­cess to such ca­pa­bil­ities. As a re­sult, in prac­tice, pure Bayesi­anism is doomed to fail. In other words, we can­not obey strictly the laws of prob­a­bil­ity. We’ll have to con­tent our­selves with ap­prox­i­ma­tions of these laws.

Chap­ter 14 con­tex­tu­al­izes this strat­egy un­der the more gen­eral the­ory of com­pu­ta­tional com­plex­ity. It gives nu­mer­ous ex­am­ples where this strat­egy has been used, for in­stance to study prime num­bers or Ram­sey the­ory. It also pre­sents Tur­ing’s 1950 com­pel­ling ar­gu­ment for the need of ma­chine learn­ing to achieve hu­man-level AI, based on com­pu­ta­tional com­plex­ity. The chap­ter also draws con­nec­tion with Kah­ne­man’s Sys­tem 1 /​ Sys­tem 2 model.

Chap­ter 15 then stresses the need to em­brace (quan­ti­ta­tive) un­cer­tainty. It pro­vides nu­mer­ous ar­gu­ments for why this un­cer­tainty will always re­main, from chaos the­ory to quan­tum me­chan­ics, statis­ti­cal me­chan­ics and au­tomata with ir­re­ducible com­pu­ta­tions. It then dis­cusses ways to mea­sure suc­cess un­der un­cer­tainty, us­ing cross-en­tropy for in­stance, or more gen­eral proper scor­ing rules. Fi­nally it draws con­nec­tions with mod­ern ma­chine learn­ing, in par­tic­u­lar gen­er­a­tive ad­ver­sar­ial net­works (GANs).

Chap­ter 16 then dis­cusses the challenges posed by hav­ing limited in­for­ma­tion stor­age spaces, both from a com­pu­ta­tional and from a cog­ni­tive per­spec­tive. The chap­ter dis­cusses things like Kal­man filters, false mem­ory, re­cur­rent neu­ral net­work, at­ten­tion mechanisms and what should be taught in our mod­ern world, where we can now ex­ploit much bet­ter in­for­ma­tion stor­age sys­tems than our brains.

Chap­ter 17 dis­cusses ap­prox­i­ma­tions of Bayes rule us­ing sam­pling. It is a gen­tle in­tro­duc­tion to Monte-Carlo meth­ods, and then to Markov Chain Monte-Carlo (MCMC) meth­ods. It then ar­gues that our brains prob­a­bly run MCMC-like al­gorithms, and dis­cusses the con­se­quences on cog­ni­tive bi­ases. In­deed, MCMC only has asymp­totic guaran­tees; but if MCMC does not run for long, it will be heav­ily bi­ased by its start­ing point. Ar­guably, some­thing similar oc­curs in our brains.

Chap­ter 18 ad­dresses a fun­da­men­tal ques­tion of episte­mol­ogy, namely the un­rea­son­able effec­tive­ness of ab­strac­tion. This chap­ter draws heav­ily on the­o­ret­i­cal com­puter sci­ence, and in par­tic­u­lar on Kol­mogorov so­phis­ti­ca­tion and Ben­nett log­i­cal depth, to sug­gest ex­pla­na­tions of the suc­cess of ab­strac­tions based on com­pu­ta­tional prop­er­ties of our cur­rent uni­verse. It is in­ter­est­ing to note that, in the far past or the very far fu­ture, the state of the uni­verse may be such that deep ab­strac­tion would be un­likely to re­main use­ful (and thus “effec­tive”).

Chap­ter 19 in­tro­duces the Bayesian brain hy­poth­e­sis, and the nu­mer­ous fas­ci­nat­ing re­cent dis­cov­er­ies of cog­ni­tive sci­ences in this re­gard. Amaz­ingly, Bayes rule has been sug­gested again and again to ex­plain our vuln­er­a­bil­ity to op­ti­cal illu­sions, our abil­ity to gen­er­al­ize from few ex­am­ples or ba­bies’ learn­ing ca­pa­bil­ities. The Bayesian per­spec­tive has fas­ci­nat­ing con­se­quences on the fa­mous Na­ture vs Nur­ture de­bate.

Beyond Bayesianism

The last sec­tion of the book takes a bit of dis­tance from Bayesi­anism, though it is still strongly con­nected to the laws of prob­a­bil­ity. Chap­ter 20 dis­cusses what I ar­gue to be nat­u­ral con­se­quences of pure Bayesian think­ing on sci­en­tific re­al­ism. In par­tic­u­lar, it ar­gues that the­o­ries are mostly tools to pre­dict past and fu­ture data. As a re­sult, it seems pointless to ar­gue about the truth of their com­po­nents; what mat­ters rather seems to be the use­ful­ness of think­ing with these com­po­nents. I dis­cuss con­se­quences on how we ought to dis­cuss con­cepts like money, life or elec­trons.

Chap­ter 21 is my best effort to en­courage read­ers to ques­tion their most strongly held be­liefs. It does so by pro­vid­ing the ex­am­ples of my own jour­ney, and by stress­ing the nu­mer­ous cog­ni­tive bi­ases that I have been suffer­ing. It then goes on un­der­lin­ing what seems to me to be the key rea­sons of my progress to­wards Bayesi­anism, namely the so­cial and in­for­ma­tional en­vi­ron­ment I have been so lucky to end up in. Im­prov­ing this en­vi­ron­ment may in­deed be key for any­one to ques­tion their most strongly held be­liefs.

Fi­nally, Chap­ter 22 briefly goes be­yond episte­mol­ogy to en­ter the realm of moral philos­o­phy. After dis­cus­sions on the im­por­tance of de­scrip­tive moral the­o­ries to un­der­stand hu­man in­ter­ac­tions, the chap­ter gives a brief clas­si­cal in­tro­duc­tion of the main moral the­o­ries, in par­tic­u­lar de­on­tol­ogy and util­i­tar­i­anism. It then ar­gues that con­se­quen­tial­ism some­how gen­er­al­izes these the­o­ries, but that only Bayesian con­se­quen­tial­ism is con­sis­tent with the laws of prob­a­bil­ity. It then illus­trates de­ci­sion-mak­ing un­der Bayesian con­se­quen­tial­ism with ex­am­ples, and stresses the im­por­tance of catas­trophic events, as long as their prob­a­bil­ity is not suffi­ciently neg­ligible.

One last thing I’d add is that I have made a lot of effort to make the book en­joy­able. It is writ­ten in a very in­for­mal style, of­ten with per­sonal ex­am­ples. I have also made a lot of effort to share com­plex ideas with a lot of en­thu­si­asm, not be­cause it makes them more con­vinc­ing, but be­cause it seems nec­es­sary to me to mo­ti­vate the read­ers to re­ally pon­der these com­plex ideas.

Fi­nally, note that French-speak­ing read­ers can also watch the se­ries of videos I’ve made on Bayesi­anism on YouTube!