Pascal’s Muggle: Infinitesimal Priors and Strong Evidence

Fol­lowup to: Pas­cal’s Mug­ging: Tiny Prob­a­bil­ities of Vast Utilities, The Pas­cal’s Wager Fal­lacy Fal­lacy, Be­ing Half-Ra­tional About Pas­cal’s Wager Is Even Worse

Short form: Pas­cal’s Muggle

tl;dr: If you as­sign su­per­ex­po­nen­tially in­finites­i­mal prob­a­bil­ity to claims of large im­pacts, then ap­par­ently you should ig­nore the pos­si­bil­ity of a large im­pact even af­ter see­ing huge amounts of ev­i­dence. If a poorly-dressed street per­son offers to save 10(10^100) lives (googol­plex lives) for $5 us­ing their Ma­trix Lord pow­ers, and you claim to as­sign this sce­nario less than 10-(10^100) prob­a­bil­ity, then ap­par­ently you should con­tinue to be­lieve ab­solutely that their offer is bo­gus even af­ter they snap their fingers and cause a gi­ant silhou­ette of them­selves to ap­pear in the sky. For the same rea­son, any ev­i­dence you en­counter show­ing that the hu­man species could cre­ate a suffi­ciently large num­ber of de­scen­dants—no mat­ter how nor­mal the cor­re­spond­ing laws of physics ap­pear to be, or how well-de­signed the ex­per­i­ments which told you about them—must be re­jected out of hand. There is a pos­si­ble re­ply to this ob­jec­tion us­ing Robin Han­son’s an­thropic ad­just­ment against the prob­a­bil­ity of large im­pacts, and in this case you will treat a Pas­cal’s Mug­ger as hav­ing de­ci­sion-the­o­retic im­por­tance ex­actly pro­por­tional to the Bayesian strength of ev­i­dence they pre­sent you, with­out quan­ti­ta­tive de­pen­dence on the num­ber of lives they claim to save. This how­ever cor­re­sponds to an odd men­tal state which some, such as my­self, would find un­satis­fac­tory. In the end, how­ever, I can­not see any bet­ter can­di­date for a prior than hav­ing a lev­er­age penalty plus a com­plex­ity penalty on the prior prob­a­bil­ity of sce­nar­ios.

In late 2007 I coined the term “Pas­cal’s Mug­ging” to de­scribe a prob­lem which seemed to me to arise when com­bin­ing con­ven­tional de­ci­sion the­ory and con­ven­tional episte­mol­ogy in the ob­vi­ous way. On con­ven­tional episte­mol­ogy, the prior prob­a­bil­ity of hy­pothe­ses diminishes ex­po­nen­tially with their com­plex­ity; if it would take 20 bits to spec­ify a hy­poth­e­sis, then its prior prob­a­bil­ity re­ceives a 2-20 penalty fac­tor and it will re­quire ev­i­dence with a like­li­hood ra­tio of 1,048,576:1 - ev­i­dence which we are 1048576 times more likely to see if the the­ory is true, than if it is false—to make us as­sign it around 50-50 cred­i­bil­ity. (This isn’t as hard as it sounds. Flip a coin 20 times and note down the ex­act se­quence of heads and tails. You now be­lieve in a state of af­fairs you would have as­signed a mil­lion-to-one prob­a­bil­ity be­fore­hand—namely, that the coin would pro­duce the ex­act se­quence HTHHHHTHTTH… or what­ever—af­ter ex­pe­rienc­ing sen­sory data which are more than a mil­lion times more prob­a­ble if that fact is true than if it is false.) The prob­lem is that al­though this kind of prior prob­a­bil­ity penalty may seem very strict at first, it’s easy to con­struct phys­i­cal sce­nar­ios that grow in size vastly faster than they grow in com­plex­ity.

I origi­nally illus­trated this us­ing Pas­cal’s Mug­ger: A poorly dressed street per­son says “I’m ac­tu­ally a Ma­trix Lord run­ning this world as a com­puter simu­la­tion, along with many oth­ers—the uni­verse above this one has laws of physics which al­low me easy ac­cess to vast amounts of com­put­ing power. Just for fun, I’ll make you an offer—you give me five dol­lars, and I’ll use my Ma­trix Lord pow­ers to save 3↑↑↑↑3 peo­ple in­side my simu­la­tions from dy­ing and let them live long and happy lives” where ↑ is Knuth’s up-ar­row no­ta­tion. This was origi­nally posted in 2007, when I was a bit more naive about what kind of math­e­mat­i­cal no­ta­tion you can throw into a ran­dom blog post with­out cre­at­ing a stum­bling block. (E.g.: On sev­eral oc­ca­sions now, I’ve seen some­one on the In­ter­net ap­prox­i­mate the num­ber of dust specks from this sce­nario as be­ing a “billion”, since any in­com­pre­hen­si­bly large num­ber equals a billion.) Let’s try an eas­ier (and way smaller) num­ber in­stead, and sup­pose that Pas­cal’s Mug­ger offers to save a googol­plex lives, where a googol is 10100 (a 1 fol­lowed by a hun­dred ze­roes) and a googol­plex is 10 to the googol power, so 1010100 or 1010,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 lives saved if you pay Pas­cal’s Mug­ger five dol­lars, if the offer is hon­est.

If Pas­cal’s Mug­ger had only offered to save a mere googol lives (10100), we could per­haps re­ply that al­though the no­tion of a Ma­trix Lord may sound sim­ple to say in English, if we ac­tu­ally try to imag­ine all the ma­chin­ery in­volved, it works out to a sub­stan­tial amount of com­pu­ta­tional com­plex­ity. (Similarly, Thor is a worse ex­pla­na­tion for light­ning bolts than the laws of physics be­cause, among other points, an an­thro­po­mor­phic de­ity is more com­plex than calcu­lus in for­mal terms—it would take a larger com­puter pro­gram to simu­late Thor as a com­plete mind, than to simu­late Maxwell’s Equa­tions—even though in mere hu­man words Thor sounds much eas­ier to ex­plain.) To imag­ine this sce­nario in for­mal de­tail, we might have to write out the laws of the higher uni­verse the Mug­ger sup­pos­edly comes from, the Ma­trix Lord’s state of mind lead­ing them to make that offer, and so on. And so (we re­ply) when mere ver­bal English has been trans­lated into a for­mal hy­poth­e­sis, the Kol­mogorov com­plex­ity of this hy­poth­e­sis is more than 332 bits—it would take more than 332 ones and ze­roes to spec­ify—where 2-332 ~ 10-100. There­fore (we con­clude) the net ex­pected value of the Mug­ger’s offer is still tiny, once its prior im­prob­a­bil­ity is taken into ac­count.

But once Pas­cal’s Mug­ger offers to save a googol­plex lives—offers us a sce­nario whose value is con­structed by twice-re­peated ex­po­nen­ti­a­tion—we seem to run into some difficulty us­ing this an­swer. Can we re­ally claim that the com­plex­ity of this sce­nario is on the or­der of a googol bits—that to for­mally write out the hy­poth­e­sis would take one hun­dred billion billion times more bits than there are atoms in the ob­serv­able uni­verse?

And a tiny, paltry num­ber like a googol­plex is only the be­gin­ning of com­pu­ta­tion­ally sim­ple num­bers that are uni­mag­in­ably huge. Ex­po­nen­ti­a­tion is defined as re­peated mul­ti­pli­ca­tion: If you see a num­ber like 35, it tells you to mul­ti­ply five 3s to­gether: 3×3×3×3×3 = 243. Sup­pose we write 35 as 3↑5, so that a sin­gle ar­row ↑ stands for ex­po­nen­ti­a­tion, and let the dou­ble ar­row ↑↑ stand for re­peated ex­po­nen­ta­tion, or tetra­tion. Thus 3↑↑3 would stand for 3↑(3↑3) or 333 = 327 = 7,625,597,484,987. Te­tra­tion is also writ­ten as fol­lows: 33 = 3↑↑3. Thus 42 = 2222 = 224 = 216 = 65,536. Then pen­ta­tion, or re­peated tetra­tion, would be writ­ten with 3↑↑↑3 = 333 = 7,625,597,484,9873 = 33...3 where the … sum­ma­rizes an ex­po­nen­tial tower of 3s seven trillion lay­ers high.

But 3↑↑↑3 is still quite sim­ple com­pu­ta­tion­ally—we could de­scribe a small Tur­ing ma­chine which com­putes it—so a hy­poth­e­sis in­volv­ing 3↑↑↑3 should not there­fore get a large com­plex­ity penalty, if we’re pe­nal­iz­ing hy­pothe­ses by al­gorith­mic com­plex­ity.

I had origi­nally in­tended the sce­nario of Pas­cal’s Mug­ging to point up what seemed like a ba­sic prob­lem with com­bin­ing con­ven­tional episte­mol­ogy with con­ven­tional de­ci­sion the­ory: Con­ven­tional episte­mol­ogy says to pe­nal­ize hy­pothe­ses by an ex­po­nen­tial fac­tor of com­pu­ta­tional com­plex­ity. This seems pretty strict in ev­ery­day life: “What? for a mere 20 bits I am to be called a mil­lion times less prob­a­ble?” But for stranger hy­pothe­ses about things like Ma­trix Lords, the size of the hy­po­thet­i­cal uni­verse can blow up enor­mously faster than the ex­po­nen­tial of its com­plex­ity. This would mean that all our de­ci­sions were dom­i­nated by tiny-seem­ing prob­a­bil­ities (on the or­der of 2-100 and less) of sce­nar­ios where our light­est ac­tion af­fected 3↑↑4 peo­ple… which would in turn be dom­i­nated by even more re­mote prob­a­bil­ities of af­fect­ing 3↑↑5 peo­ple...

This prob­lem is worse than just giv­ing five dol­lars to Pas­cal’s Mug­ger—our ex­pected util­ities don’t con­verge at all! Con­ven­tional episte­mol­ogy tells us to sum over the pre­dic­tions of all hy­pothe­ses weighted by their com­pu­ta­tional com­plex­ity and ev­i­den­tial fit. This works fine with epistemic prob­a­bil­ities and sen­sory pre­dic­tions be­cause no hy­poth­e­sis can pre­dict more than prob­a­bil­ity 1 or less than prob­a­bil­ity 0 for a sen­sory ex­pe­rience. As hy­pothe­ses get more and more com­plex, their con­tributed pre­dic­tions have tinier and tinier weights, and the sum con­verges quickly. But de­ci­sion the­ory tells us to calcu­late ex­pected util­ity by sum­ming the util­ity of each pos­si­ble out­come, times the prob­a­bil­ity of that out­come con­di­tional on our ac­tion. If hy­po­thet­i­cal util­ities can grow faster than hy­po­thet­i­cal prob­a­bil­ity diminishes, the con­tri­bu­tion of an av­er­age term in the se­ries will keep in­creas­ing, and this sum will never con­verge—not if we try to do it the same way we got our epistemic pre­dic­tions, by sum­ming over com­plex­ity-weighted pos­si­bil­ities. (See also this similar-but-differ­ent pa­per by Peter de Blanc.)

Un­for­tu­nately I failed to make it clear in my origi­nal writeup that this was where the prob­lem came from, and that it was gen­eral to situ­a­tions be­yond the Mug­ger. Nick Bostrom’s writeup of Pas­cal’s Mug­ging for a philos­o­phy jour­nal used a Mug­ger offer­ing a quin­til­lion days of hap­piness, where a quin­til­lion is merely 1,000,000,000,000,000,000 = 1018. It takes at least two ex­po­nen­ti­a­tions to out­run a singly-ex­po­nen­tial com­plex­ity penalty. I would be will­ing to as­sign a prob­a­bil­ity of less than 1 in 1018 to a ran­dom per­son be­ing a Ma­trix Lord. You may not have to in­voke 3↑↑↑3 to cause prob­lems, but you’ve got to use some­thing like 1010100 - dou­ble ex­po­nen­ti­a­tion or bet­ter. Ma­nipu­lat­ing or­di­nary hy­pothe­ses about the or­di­nary phys­i­cal uni­verse taken at face value, which just con­tains 1080 atoms within range of our telescopes, should not lead us into such difficul­ties.

(And then the phrase “Pas­cal’s Mug­ging” got com­pletely bas­tardized to re­fer to an emo­tional feel­ing of be­ing mugged that some peo­ple ap­par­ently get when a high-stakes char­i­ta­ble propo­si­tion is pre­sented to them, re­gard­less of whether it’s sup­posed to have a low prob­a­bil­ity. This is enough to make me re­gret hav­ing ever in­vented the term “Pas­cal’s Mug­ging” in the first place; and for fur­ther thoughts on this see The Pas­cal’s Wager Fal­lacy Fal­lacy (just be­cause the stakes are high does not mean the prob­a­bil­ities are low, and Pas­cal’s Wager is fal­la­cious be­cause of the low prob­a­bil­ity, not the high stakes!) and Be­ing Half-Ra­tional About Pas­cal’s Wager Is Even Worse. Again, when deal­ing with is­sues the mere size of the ap­par­ent uni­verse, on the or­der of 1080 - for small large num­bers—we do not run into the sort of de­ci­sion-the­o­retic prob­lems I origi­nally meant to sin­gle out by the con­cept of “Pas­cal’s Mug­ging”. My rough in­tu­itive stance on x-risk char­ity is that if you are one of the tiny frac­tion of all sen­tient be­ings who hap­pened to be born here on Earth be­fore the in­tel­li­gence ex­plo­sion, when the ex­is­tence of the whole vast in­ter­galac­tic fu­ture de­pends on what we do now, you should ex­pect to find your­self sur­rounded by a smor­gas­bord of op­por­tu­ni­ties to af­fect small large num­bers of sen­tient be­ings. There is then no rea­son to worry about tiny prob­a­bil­ities of hav­ing a large im­pact when we can ex­pect to find medium-sized op­por­tu­ni­ties of hav­ing a large im­pact, so long as we re­strict our­selves to im­pacts no larger than the size of the known uni­verse.)

One pro­posal which has been floated for deal­ing with Pas­cal’s Mug­ger in the de­ci­sion-the­o­retic sense is to pe­nal­ize hy­pothe­ses that let you af­fect a large num­ber of peo­ple, in pro­por­tion to the num­ber of peo­ple af­fected—what we could call per­haps a “lev­er­age penalty” in­stead of a “com­plex­ity penalty”.

Un­for­tu­nately this po­ten­tially leads us into a differ­ent prob­lem, that of Pas­cal’s Mug­gle.

Sup­pose a poorly-dressed street per­son asks you for five dol­lars in ex­change for do­ing a googol­plex’s worth of good us­ing his Ma­trix Lord pow­ers.

“Well,” you re­ply, “I think it very im­prob­a­ble that I would be able to af­fect so many peo­ple through my own, per­sonal ac­tions—who am I to have such a great im­pact upon events? In­deed, I think the prob­a­bil­ity is some­where around one over googol­plex, maybe a bit less. So no, I won’t pay five dol­lars—it is un­think­ably im­prob­a­ble that I could do so much good!”

“I see,” says the Mug­ger.

A wind be­gins to blow about the alley, whip­ping the Mug­ger’s loose clothes about him as they shift from ill-fit­ting shirt and jeans into robes of in­finite black­ness, within whose depths tiny galax­ies and stranger things seem to twin­kle. In the sky above, a gap edged by blue fire opens with a hor­ren­dous tear­ing sound—you can hear peo­ple on the nearby street yel­ling in sud­den shock and ter­ror, im­ply­ing that they can see it too—and dis­plays the image of the Mug­ger him­self, wear­ing the same robes that now adorn his body, seated be­fore a key­board and a mon­i­tor.

“That’s not ac­tu­ally me,” the Mug­ger says, “just a con­cep­tual rep­re­sen­ta­tion, but I don’t want to drive you in­sane. Now give me those five dol­lars, and I’ll save a googol­plex lives, just as promised. It’s easy enough for me, given the com­put­ing power my home uni­verse offers. As for why I’m do­ing this, there’s an an­cient de­bate in philos­o­phy among my peo­ple—some­thing about how we ought to sum our ex­pected util­ities—and I mean to use the video of this event to make a point at the next de­ci­sion the­ory con­fer­ence I at­tend. Now will you give me the five dol­lars, or not?”

“Mm… no,” you re­ply.

No?” says the Mug­ger. “I un­der­stood ear­lier when you didn’t want to give a ran­dom street per­son five dol­lars based on a wild story with no ev­i­dence be­hind it. But now I’ve offered you ev­i­dence.”

“Un­for­tu­nately, you haven’t offered me enough ev­i­dence,” you ex­plain.

“Really?” says the Mug­ger. “I’ve opened up a fiery por­tal in the sky, and that’s not enough to per­suade you? What do I have to do, then? Rear­range the planets in your so­lar sys­tem, and wait for the ob­ser­va­to­ries to con­firm the fact? I sup­pose I could also ex­plain the true laws of physics in the higher uni­verse in more de­tail, and let you play around a bit with the com­puter pro­gram that en­codes all the uni­verses con­tain­ing the googol­plex peo­ple I would save if you gave me the five dol­lars—”

“Sorry,” you say, shak­ing your head firmly, “there’s just no way you can con­vince me that I’m in a po­si­tion to af­fect a googol­plex peo­ple, be­cause the prior prob­a­bil­ity of that is one over googol­plex. If you wanted to con­vince me of some fact of merely 2-100 prior prob­a­bil­ity, a mere decillion to one—like that a coin would come up heads and tails in some par­tic­u­lar pat­tern of a hun­dred coin­flips—then you could just show me 100 bits of ev­i­dence, which is within easy reach of my brain’s sen­sory band­width. I mean, you could just flip the coin a hun­dred times, and my eyes, which send my brain a hun­dred megabits a sec­ond or so—though that gets pro­cessed down to one megabit or so by the time it goes through the lat­eral genicu­late nu­cleus—would eas­ily give me enough data to con­clude that this decillion-to-one pos­si­bil­ity was true. But to con­clude some­thing whose prior prob­a­bil­ity is on the or­der of one over googol­plex, I need on the or­der of a googol bits of ev­i­dence, and you can’t pre­sent me with a sen­sory ex­pe­rience con­tain­ing a googol bits. In­deed, you can’t ever pre­sent a mor­tal like me with ev­i­dence that has a like­li­hood ra­tio of a googol­plex to one—ev­i­dence I’m a googol­plex times more likely to en­counter if the hy­poth­e­sis is true, than if it’s false—be­cause the chance of all my neu­rons spon­ta­neously re­ar­rang­ing them­selves to fake the same ev­i­dence would always be higher than one over googol­plex. You know the old say­ing about how once you as­sign some­thing prob­a­bil­ity one, or prob­a­bil­ity zero, you can never change your mind re­gard­less of what ev­i­dence you see? Well, odds of a googol­plex to one, or one to a googol­plex, work pretty much the same way.”

“So no mat­ter what ev­i­dence I show you,” the Mug­ger says—as the blue fire goes on crack­ling in the torn sky above, and screams and des­per­ate prayers con­tinue from the street be­yond—“you can’t ever no­tice that you’re in a po­si­tion to help a googol­plex peo­ple.”

“Right!” you say. “I can be­lieve that you’re a Ma­trix Lord. I mean, I’m not a to­tal Mug­gle, I’m psy­cholog­i­cally ca­pa­ble of re­spond­ing in some fash­ion to that gi­ant hole in the sky. But it’s just com­pletely for­bid­den for me to as­sign any sig­nifi­cant prob­a­bil­ity what­so­ever that you will ac­tu­ally save a googol­plex peo­ple af­ter I give you five dol­lars. You’re ly­ing, and I am ab­solutely, ab­solutely, ab­solutely con­fi­dent of that.”

“So you weren’t just in­vok­ing the lev­er­age penalty as a plau­si­ble-sound­ing way of get­ting out of pay­ing me the five dol­lars ear­lier,” the Mug­ger says thought­fully. “I mean, I’d un­der­stand if that was just a ra­tio­nal­iza­tion of your dis­com­fort at fork­ing over five dol­lars for what seemed like a tiny prob­a­bil­ity, when I hadn’t done my duty to pre­sent you with a cor­re­spond­ing amount of ev­i­dence be­fore de­mand­ing pay­ment. But you… you’re act­ing like an AI would if it was ac­tu­ally pro­grammed with a lev­er­age penalty on hy­pothe­ses!”

“Ex­actly,” you say. “I’m for­bid­den a pri­ori to be­lieve I can ever do that much good.”

“Why?” the Mug­ger says cu­ri­ously. “I mean, all I have to do is press this but­ton here and a googol­plex lives will be saved.” The figure within the blaz­ing por­tal above points to a green but­ton on the con­sole be­fore it.

“Like I said,” you ex­plain again, “the prior prob­a­bil­ity is just too in­finites­i­mal for the mas­sive ev­i­dence you’re show­ing me to over­come it—”

The Mug­ger shrugs, and van­ishes in a puff of pur­ple mist.

The por­tal in the sky above closes, tak­ing with the con­sole and the green but­ton.

(The screams go on from the street out­side.)

A few days later, you’re sit­ting in your office at the physics in­sti­tute where you work, when one of your col­leagues bursts in through your door, seem­ing highly ex­cited. “I’ve got it!” she cries. “I’ve figured out that whole dark en­ergy thing! Look, these sim­ple equa­tions retro­d­ict it ex­actly, there’s no way that could be a co­in­ci­dence!”

At first you’re also ex­cited, but as you pore over the equa­tions, your face con­figures it­self into a frown. “No...” you say slowly. “Th­ese equa­tions may look ex­tremely sim­ple so far as com­pu­ta­tional com­plex­ity goes—and they do ex­actly fit the petabytes of ev­i­dence our telescopes have gath­ered so far—but I’m afraid they’re far too im­prob­a­ble to ever be­lieve.”

“What?” she says. “Why?”

“Well,” you say rea­son­ably, “if these equa­tions are ac­tu­ally true, then our de­scen­dants will be able to ex­ploit dark en­ergy to do com­pu­ta­tions, and ac­cord­ing to my back-of-the-en­velope calcu­la­tions here, we’d be able to cre­ate around a googol­plex peo­ple that way. But that would mean that we, here on Earth, are in a po­si­tion to af­fect a googol­plex peo­ple—since, if we blow our­selves up via a nan­otech­nolog­i­cal war or (cough) make cer­tain other er­rors, those googol­plex peo­ple will never come into ex­is­tence. The prior prob­a­bil­ity of us be­ing in a po­si­tion to im­pact a googol­plex peo­ple is on the or­der of one over googol­plex, so your equa­tions must be wrong.”

“Hmm...” she says. “I hadn’t thought of that. But what if these equa­tions are right, and yet some­how, ev­ery­thing I do is ex­actly bal­anced, down to the googolth dec­i­mal point or so, with re­spect to how it im­pacts the chance of mod­ern-day Earth par­ti­ci­pat­ing in a chain of events that leads to cre­at­ing an in­ter­galac­tic civ­i­liza­tion?”

“How would that work?” you say. “There’s only seven billion peo­ple on to­day’s Earth—there’s prob­a­bly been only a hun­dred billion peo­ple who ever ex­isted to­tal, or will ex­ist be­fore we go through the in­tel­li­gence ex­plo­sion or what­ever—so even be­fore an­a­lyz­ing your ex­act po­si­tion, it seems like your lev­er­age on fu­ture af­fairs couldn’t rea­son­ably be less than a one in ten trillion part of the fu­ture or so.”

“But then given this phys­i­cal the­ory which seems ob­vi­ously true, my acts might im­ply ex­pected util­ity differ­en­tials on the or­der of 1010100-13,” she ex­plains, “and I’m not al­lowed to be­lieve that no mat­ter how much ev­i­dence you show me.”


This prob­lem may not be as bad as it looks; with some fur­ther rea­son­ing, the lev­er­age penalty may lead to more sen­si­ble be­hav­ior than de­picted above.

Robin Han­son has sug­gested that the logic of a lev­er­age penalty should stem from the gen­eral im­prob­a­bil­ity of in­di­vi­d­u­als be­ing in a unique po­si­tion to af­fect many oth­ers (which is why I called it a lev­er­age penalty). At most 10 out of 3↑↑↑3 peo­ple can ever be in a po­si­tion to be “solely re­spon­si­ble” for the fate of 3↑↑↑3 peo­ple if “solely re­spon­si­ble” is taken to im­ply a causal chain that goes through no more than 10 peo­ple’s de­ci­sions; i.e. at most 10 peo­ple can ever be solely10 re­spon­si­ble for any given event. Or if “fate” is taken to be a suffi­ciently ul­ti­mate fate that there’s at most 10 other de­ci­sions of similar mag­ni­tude that could cu­mu­late to de­ter­mine some­one’s out­come util­ity to within ±50%, then any given per­son could have their fate10 de­ter­mined on at most 10 oc­ca­sions. We would surely agree, while as­sign­ing pri­ors at the dawn of rea­son­ing, that an agent ran­domly se­lected from the pool of all agents in Real­ity has at most a 100/​X chance of be­ing able to be solely10 re­spon­si­ble for the fate10 of X peo­ple. Any rea­son­ing we do about uni­verses, their com­plex­ity, sen­sory ex­pe­riences, and so on, should main­tain this net bal­ance. You can even strip out the part about agents and carry out the rea­son­ing on pure causal nodes; the chance of a ran­domly se­lected causal node be­ing in a unique100 po­si­tion on a causal graph with re­spect to 3↑↑↑3 other nodes ought to be at most 100/​3↑↑↑3 for finite causal graphs. (As for in­finite causal graphs, well, if prob­lems arise only when in­tro­duc­ing in­finity, maybe it’s in­finity that has the prob­lem.)

Sup­pose we ap­ply the Han­so­nian lev­er­age penalty to the face-value sce­nario of our own uni­verse, in which there are ap­par­ently no aliens and the galax­ies we can reach in the fu­ture con­tain on the or­der of 1080 atoms; which, if the in­tel­li­gence ex­plo­sion goes well, might be trans­formed into on the very loose or­der of… let’s ig­nore a lot of in­ter­me­di­ate calcu­la­tions and just call it the equiv­a­lent of 1080 cen­turies of life. (The neu­rons in your brain perform lots of op­er­a­tions; you don’t get only one com­put­ing op­er­a­tion per el­e­ment, be­cause you’re pow­ered by the Sun over time. The uni­verse con­tains a lot more ne­gen­tropy than just 1080 bits due to things like the grav­i­ta­tional po­ten­tial en­ergy that can be ex­tracted from mass. Plus we should take into ac­count re­versible com­put­ing. But of course it also takes more than one com­put­ing op­er­a­tion to im­ple­ment a cen­tury of life. So I’m just go­ing to xe­rox the num­ber 1080 for use in these calcu­la­tions, since it’s not sup­posed to be the main fo­cus.)

Wouldn’t it be ter­ribly odd to find our­selves—where by ‘our­selves’ I mean the hun­dred billion hu­mans who have ever lived on Earth, for no more than a cen­tury or so apiece—solely100,000,000,000 re­spon­si­ble for the fate10 of around 1080 units of life? Isn’t the prior prob­a­bil­ity of this some­where around 10-68?

Yes, ac­cord­ing to the lev­er­age penalty. But a prior prob­a­bil­ity of 10-68 is not an in­sur­mountable episte­molog­i­cal bar­rier. If you’re tak­ing things at face value, 10-68 is just 226 bits of ev­i­dence or there­abouts, and your eyes are send­ing you a megabit per sec­ond. Be­com­ing con­vinced that you, yes you are an Earth­ling is epistem­i­cally doable; you just need to see a stream of sen­sory ex­pe­riences which is 1068 times more prob­a­ble if you are an Earth­ling than if you are some­one else. If we take ev­ery­thing at face value, then there could be around 1080 cen­turies of life over the his­tory of the uni­verse, and only 1011 of those cen­turies will be lived by crea­tures who dis­cover them­selves oc­cu­py­ing or­ganic bod­ies. Tak­ing ev­ery­thing at face value, the sen­sory ex­pe­riences of your life are unique to Earth­lings and should im­me­di­ately con­vince you that you’re an Earth­ling—just look­ing around the room you oc­cupy will provide you with sen­sory ex­pe­riences that plau­si­bly be­long to only 1011 out of 1080 life-cen­turies.

If we don’t take ev­ery­thing at face value, then there might be such things as an­ces­tor simu­la­tions, and it might be that your ex­pe­rience of look­ing around the room is some­thing that hap­pens in 1020 an­ces­tor simu­la­tions for ev­ery time that it hap­pens in ‘base level’ re­al­ity. In this case your prob­a­ble lev­er­age on the fu­ture is diluted (though it may be large even post-dilu­tion). But this is not some­thing that the Han­so­nian lev­er­age penalty forces you to be­lieve—not when the pu­ta­tive stakes are still as small as 1080. Con­cep­tu­ally, the Han­so­nian lev­er­age penalty doesn’t in­ter­act much with the Si­mu­la­tion Hy­poth­e­sis (SH) at all. If you don’t be­lieve SH, then you think that the ex­pe­riences of crea­tures like yours are rare in the uni­verse and hence pre­sent strong, con­vinc­ing ev­i­dence for you oc­cu­py­ing the lev­er­age-priv­ileged po­si­tion of an Earth­ling—much stronger ev­i­dence than its prior im­prob­a­bil­ity. (There’s some sep­a­rate an­thropic is­sues here about whether or not this is it­self ev­i­dence for SH, but I don’t think that ques­tion is in­trin­sic to lev­er­age penalties per se.)

A key point here is that even if you ac­cept a Han­son-style lev­er­age penalty, it doesn’t have to man­i­fest as an in­escapable com­mand­ment of mod­esty. You need not re­fuse to be­lieve (in your deep and ir­re­vo­ca­ble hu­mil­ity) that you could be some­one as spe­cial as an An­cient Earth­ling. Even if Earth­lings mat­ter in the uni­verse—even if we oc­cupy a unique po­si­tion to af­fect the fu­ture of galax­ies—it is still pos­si­ble to en­counter pretty con­vinc­ing ev­i­dence that you’re an Earth­ling. Uni­verses the size of 1080 do not pose prob­lems to con­ven­tional de­ci­sion-the­o­retic rea­son­ing, or to con­ven­tional episte­mol­ogy.

Things play out similarly if—still tak­ing ev­ery­thing at face value—you’re won­der­ing about the chance that you could be spe­cial even for an Earth­ling, be­cause you might be one of say 104 peo­ple in the his­tory of the uni­verse who con­tribute a ma­jor amount to an x-risk re­duc­tion pro­ject which ends up ac­tu­ally sav­ing the galax­ies. The vast ma­jor­ity of the im­prob­a­bil­ity here is just in be­ing an Earth­ling in the first place! Thus most of the clever ar­gu­ments for not tak­ing this high-im­pact pos­si­bil­ity at face value would also tell you not to take be­ing an Earth­ling at face value, since Earth­lings as a whole are much more unique within the to­tal tem­po­ral his­tory of the uni­verse than you are sup­pos­ing your­self to be unique among Earth­lings. But given ¬SH, the prior im­prob­a­bil­ity of be­ing an Earth­ling can be over­come by a few megabits of sen­sory ex­pe­rience from look­ing around the room and query­ing your mem­o­ries—it’s not like 1080 is enough fu­ture be­ings that the num­ber of agents ran­domly hal­lu­ci­nat­ing similar ex­pe­riences out­weighs the num­ber of real Earth­lings. Similarly, if you don’t think lots of Earth­lings are hal­lu­ci­nat­ing the ex­pe­rience of go­ing to a dona­tion page and click­ing on the Pay­pal but­ton for an x-risk char­ity, that sen­sory ex­pe­rience can eas­ily serve to dis­t­in­guish you as one of 104 peo­ple donat­ing to an x-risk philan­thropy.

Yes, there are var­i­ous clever-sound­ing lines of ar­gu­ment which in­volve not tak­ing things at face value—“Ah, but maybe you should con­sider your­self as an in­dis­t­in­guish­able part of this here large refer­ence class of de­luded peo­ple who think they’re im­por­tant.” Which I con­sider to be a bad idea be­cause it ren­ders you a per­ma­nent Mug­gle by putting you into an in­escapable refer­ence class of self-de­luded peo­ple and then dis­miss­ing all your fur­ther thoughts as in­suffi­cient ev­i­dence be­cause you could just be de­lud­ing your­self fur­ther about whether these are good ar­gu­ments. Nor do I be­lieve the world can only be saved by good peo­ple who are in­ca­pable of dis­t­in­guish­ing them­selves from a large class of crack­pots, all of whom have no choice but to con­tinue based on the tiny prob­a­bil­ity that they are not crack­pots. (For more on this see Be­ing Half-Ra­tional About Pas­cal’s Wager Is Even Worse.) In this case you are a Pas­cal’s Mug­gle not be­cause you’ve ex­plic­itly as­signed a prob­a­bil­ity like one over googol­plex, but be­cause you took an im­prob­a­bil­ity like 10-6 at un­ques­tion­ing face value and then clev­erly ques­tioned all the ev­i­dence which could’ve over­come that prior im­prob­a­bil­ity, and so, in prac­tice, you can never climb out of the episte­molog­i­cal sink­hole. By the same to­ken, you should con­clude that you are just self-de­luded about be­ing an Earth­ling since real Earth­lings are so rare and priv­ileged in their lev­er­age.

In gen­eral, lev­er­age penalties don’t trans­late into ad­vice about mod­esty or that you’re just de­lud­ing your­self—they just say that to be ra­tio­nally co­her­ent, your pic­ture of the uni­verse has to im­ply that your sen­sory ex­pe­riences are at least as rare as the cor­re­spond­ing mag­ni­tude of your lev­er­age.

Which brings us back to Pas­cal’s Mug­ger, in the origi­nal alley­way ver­sion. The Han­so­nian lev­er­age penalty seems to im­ply that to be co­her­ent, ei­ther you be­lieve that your sen­sory ex­pe­riences are re­ally ac­tu­ally 1 in a googol­plex—that only 1 in a googol­plex be­ings ex­pe­riences what you’re ex­pe­rienc­ing—or else you re­ally can’t take the situ­a­tion at face value.

Sup­pose the Mug­ger is tel­ling the truth, and a googol­plex other peo­ple are be­ing simu­lated. Then there are at least a googol­plex peo­ple in the uni­verse. Per­haps some of them are hal­lu­ci­nat­ing a situ­a­tion similar to this one by sheer chance? Rather than tel­ling you flatly that you can’t have a large im­pact, the Han­so­nian lev­er­age penalty im­plies a co­her­ence re­quire­ment on how uniquely you think your sen­sory ex­pe­riences iden­tify the po­si­tion you be­lieve your­self to oc­cupy. When it comes to be­liev­ing you’re one of 1011 Earth­lings who can im­pact 1080 other life-cen­turies, you need to think your sen­sory ex­pe­riences are unique to Earth­lings—iden­tify Earth­lings with a like­li­hood ra­tio on the or­der of 1069. This is quite achiev­able, if we take the ev­i­dence at face value. But when it comes to im­prob­a­bil­ity on the or­der of 1/​3↑↑↑3, the prior im­prob­a­bil­ity is in­escapable—your sen­sory ex­pe­riences can’t pos­si­bly be that unique—which is as­sumed to be ap­pro­pri­ate be­cause al­most-ev­ery­one who ever be­lieves they’ll be in a po­si­tion to help 3↑↑↑3 peo­ple will in fact be hal­lu­ci­nat­ing. Boltz­mann brains should be much more com­mon than peo­ple in a unique po­si­tion to af­fect 3↑↑↑3 oth­ers, at least if the causal graphs are finite.

Fur­ther­more—al­though I didn’t re­al­ize this part un­til re­cently—ap­ply­ing Bayesian up­dates from that start­ing point may par­tially avert the Pas­cal’s Mug­gle effect:

Mug­ger: “Give me five dol­lars, and I’ll save 3↑↑↑3 lives us­ing my Ma­trix Pow­ers.”

You: “Nope.”

Mug­ger: “Why not? It’s a re­ally large im­pact.”

You: “Yes, and I as­sign a prob­a­bil­ity on the or­der of 1 in 3↑↑↑3 that I would be in a unique po­si­tion to af­fect 3↑↑↑3 peo­ple.”

Mug­ger: “Oh, is that re­ally the prob­a­bil­ity that you as­sign? Be­hold!”

(A gap opens in the sky, edged with blue fire.)

Mug­ger: “Now what do you think, eh?”

You: “Well… I can’t ac­tu­ally say this ob­ser­va­tion has a like­li­hood ra­tio of 3↑↑↑3 to 1. No stream of ev­i­dence that can en­ter a hu­man brain over the course of a cen­tury is ever go­ing to have a like­li­hood ra­tio larger than, say, 101026 to 1 at the ab­surdly most, as­sum­ing one megabit per sec­ond of sen­sory data, for a cen­tury, each bit of which has at least a 1-in-a-trillion er­ror prob­a­bil­ity. I’d prob­a­bly start to be dom­i­nated by Boltz­mann brains or other ex­otic minds well be­fore then.”

Mug­ger: “So you’re not con­vinced.”

You: “In­deed not. The prob­a­bil­ity that you’re tel­ling the truth is so tiny that God couldn’t find it with an elec­tron micro­scope. Here’s the five dol­lars.”

Mug­ger: “Done! You’ve saved 3↑↑↑3 lives! Con­grat­u­la­tions, you’re never go­ing to top that, your peak life ac­com­plish­ment will now always lie in your past. But why’d you give me the five dol­lars if you think I’m ly­ing?”

You: “Well, be­cause the ev­i­dence you did pre­sent me with had a like­li­hood ra­tio of at least a billion to one—I would’ve as­signed less than 10-9 prior prob­a­bil­ity of see­ing this when I woke up this morn­ing—so in ac­cor­dance with Bayes’s The­o­rem I pro­moted the prob­a­bil­ity from 1/​3↑↑↑3 to at least 109/​3↑↑↑3, which when mul­ti­plied by an im­pact of 3↑↑↑3, yields an ex­pected value of at least a billion lives saved for giv­ing you five dol­lars.”


I con­fess that I find this line of rea­son­ing a bit sus­pi­cious—it seems overly clever. But on the level of in­tu­itive virtues of ra­tio­nal­ity, it does seem less stupid than the origi­nal Pas­cal’s Mug­gle; this muggee is at least be­hav­iorally re­act­ing to the ev­i­dence. In fact, they’re re­act­ing in a way ex­actly pro­por­tional to the ev­i­dence—they would’ve as­signed the same net im­por­tance to hand­ing over the five dol­lars if the Mug­ger had offered 3↑↑↑4 lives, so long as the strength of the ev­i­dence seemed the same.

(Any­one who tries to ap­ply the les­sons here to ac­tual x-risk re­duc­tion char­i­ties (which I think is prob­a­bly a bad idea), keep in mind that the vast ma­jor­ity of the im­prob­a­ble-po­si­tion-of-lev­er­age in any x-risk re­duc­tion effort comes from be­ing an Earth­ling in a po­si­tion to af­fect the fu­ture of a hun­dred billion galax­ies, and that sen­sory ev­i­dence for be­ing an Earth­ling is what gives you most of your be­lief that your ac­tions can have an out­sized im­pact.)

So why not just run with this—why not just de­clare the de­ci­sion-the­o­retic prob­lem re­solved, if we have a rule that seems to give rea­son­able be­hav­ioral an­swers in prac­tice? Why not just go ahead and pro­gram that rule into an AI?

Well… I still feel a bit ner­vous about the idea that Pas­cal’s Muggee, af­ter the sky splits open, is hand­ing over five dol­lars while claiming to as­sign prob­a­bil­ity on the or­der of 109/​3↑↑↑3 that it’s do­ing any good.

I think that my own re­ac­tion in a similar situ­a­tion would be along these lines in­stead:


Mug­ger: “Give me five dol­lars, and I’ll save 3↑↑↑3 lives us­ing my Ma­trix Pow­ers.”

Me: “Nope.”

Mug­ger: “So then, you think the prob­a­bil­ity I’m tel­ling the truth is on the or­der of 1/​3↑↑↑3?”

Me: “Yeah… that prob­a­bly has to fol­low. I don’t see any way around that re­vealed be­lief, given that I’m not ac­tu­ally giv­ing you the five dol­lars. I’ve heard some peo­ple try to claim silly things like, the prob­a­bil­ity that you’re tel­ling the truth is coun­ter­bal­anced by the prob­a­bil­ity that you’ll kill 3↑↑↑3 peo­ple in­stead, or some­thing else with a con­ve­niently equal and op­po­site util­ity. But there’s no way that things would bal­ance out ex­actly in prac­tice, if there was no a pri­ori math­e­mat­i­cal re­quire­ment that they bal­ance. Even if the prior prob­a­bil­ity of your sav­ing 3↑↑↑3 peo­ple and kil­ling 3↑↑↑3 peo­ple, con­di­tional on my giv­ing you five dol­lars, ex­actly bal­anced down to the log(3↑↑↑3) dec­i­mal place, the like­li­hood ra­tio for your tel­ling me that you would “save” 3↑↑↑3 peo­ple would not be ex­actly 1:1 for the two hy­pothe­ses down to the log(3↑↑↑3) dec­i­mal place. So if I as­signed prob­a­bil­ities much greater than 1/​3↑↑↑3 to your do­ing some­thing that af­fected 3↑↑↑3 peo­ple, my ac­tions would be over­whelm­ingly dom­i­nated by even a tiny differ­ence in like­li­hood ra­tio ele­vat­ing the prob­a­bil­ity that you saved 3↑↑↑3 peo­ple over the prob­a­bil­ity that you did some­thing bad to them. The only way this hy­poth­e­sis can’t dom­i­nate my ac­tions—re­ally, the only way my ex­pected util­ity sums can con­verge at all—is if I as­sign prob­a­bil­ity on the or­der of 1/​3↑↑↑3 or less. I don’t see any way of es­cap­ing that part.”

Mug­ger: “But can you, in your mor­tal un­cer­tainty, truly as­sign a prob­a­bil­ity as low as 1 in 3↑↑↑3 to any propo­si­tion what­ever? Can you truly be­lieve, with your er­ror-prone neu­ral brain, that you could make 3↑↑↑3 state­ments of any kind one af­ter an­other, and be wrong, on av­er­age, about once?”

Me: “Nope.”

Mug­ger: “So give me five dol­lars!”

Me: “Nope.”

Mug­ger: “Why not?”

Me: “Be­cause even though I, in my mor­tal un­cer­tainty, will even­tu­ally be wrong about all sorts of things if I make enough state­ments one af­ter an­other, this fact can’t be used to in­crease the prob­a­bil­ity of ar­bi­trary state­ments be­yond what my prior says they should be, be­cause then my prior would sum to more than 1. There must be some kind of re­quired con­di­tion for tak­ing a hy­poth­e­sis se­ri­ously enough to worry that I might be over­con­fi­dent about it—”

Mug­ger: “Then be­hold!”

(A gap opens in the sky, edged with blue fire.)

Mug­ger: “Now what do you think, eh?”

Me (star­ing up at the sky): “...whoa.” (Pause.) “You turned into a cat.”

Mug­ger: “What?”

Me: “Pri­vate joke. Okay, I think I’m go­ing to have to re­think a lot of things. But if you want to tell me about how I was wrong to as­sign a prior prob­a­bil­ity on the or­der of 1/​3↑↑↑3 to your sce­nario, I will shut up and listen very care­fully to what you have to say about it. Oh, and here’s the five dol­lars, can I pay an ex­tra twenty and make some other re­quests?”

(The thought bub­ble pops, and we re­turn to two peo­ple stand­ing in an alley, the sky above perfectly nor­mal.)

Mug­ger: “Now, in this sce­nario we’ve just imag­ined, you were tak­ing my case se­ri­ously, right? But the ev­i­dence there couldn’t have had a like­li­hood ra­tio of more than 101026 to 1, and prob­a­bly much less. So by the method of imag­i­nary up­dates, you must as­sign prob­a­bil­ity at least 10-1026 to my sce­nario, which when mul­ti­plied by a benefit on the or­der of 3↑↑↑3, yields an uni­mag­in­able bo­nanza in ex­change for just five dol­lars—”

Me: “Nope.”

Mug­ger: “How can you pos­si­bly say that? You’re not be­ing log­i­cally co­her­ent!”

Me: “I agree that I’m not be­ing log­i­cally co­her­ent, but I think that’s ac­cept­able in this case.”

Mug­ger: “This ought to be good. Since when are ra­tio­nal­ists al­lowed to de­liber­ately be log­i­cally in­co­her­ent?”

Me: “Since we don’t have in­finite com­put­ing power—”

Mug­ger: “That sounds like a fully gen­eral ex­cuse if I ever heard one.”

Me: “No, this is a spe­cific con­se­quence of bounded com­put­ing power. Let me start with a sim­pler ex­am­ple. Sup­pose I be­lieve in a set of math­e­mat­i­cal ax­ioms. Since I don’t have in­finite com­put­ing power, I won’t be able to know all the de­duc­tive con­se­quences of those ax­ioms. And that means I will nec­es­sar­ily fall prey to the con­junc­tion fal­lacy, in the sense that you’ll pre­sent me with a the­o­rem X that is a de­duc­tive con­se­quence of my ax­ioms, but which I don’t know to be a de­duc­tive con­se­quence of my ax­ioms, and you’ll ask me to as­sign a prob­a­bil­ity to X, and I’ll as­sign it 50% prob­a­bil­ity or some­thing. Then you pre­sent me with a brilli­ant lemma Y, which clearly seems like a likely con­se­quence of my math­e­mat­i­cal ax­ioms, and which also seems to im­ply X—once I see Y, the con­nec­tion from my ax­ioms to X, via Y, be­comes ob­vi­ous. So I as­sign P(X&Y) = 90%, or some­thing like that. Well, that’s the con­junc­tion fal­lacy—I as­signed P(X&Y) > P(X). The thing is, if you then ask me P(X), af­ter I’ve seen Y, I’ll re­ply that P(X) is 91% or at any rate some­thing higher than P(X&Y). I’ll have changed my mind about what my prior be­liefs log­i­cally im­ply, be­cause I’m not log­i­cally om­ni­scient, even if that looks like as­sign­ing prob­a­bil­ities over time which are in­co­her­ent in the Bayesian sense.”

Mug­ger: “And how does this work out to my not get­ting five dol­lars?”

Me: “In the sce­nario you’re ask­ing me to imag­ine, you pre­sent me with ev­i­dence which I cur­rently think Just Plain Shouldn’t Hap­pen. And if that ac­tu­ally does hap­pen, the sen­si­ble way for me to re­act is by ques­tion­ing my prior as­sump­tions and the rea­son­ing which led me as­sign such low prob­a­bil­ity. One way that I han­dle my lack of log­i­cal om­ni­science—my finite, er­ror-prone rea­son­ing ca­pa­bil­ities—is by be­ing will­ing to as­sign in­finites­i­mal prob­a­bil­ities to non-priv­ileged hy­pothe­ses so that my prior over all pos­si­bil­ities can sum to 1. But if I ac­tu­ally see strong ev­i­dence for some­thing I pre­vi­ously thought was su­per-im­prob­a­ble, I don’t just do a Bayesian up­date, I should also ques­tion whether I was right to as­sign such a tiny prob­a­bil­ity in the first place—whether it was re­ally as com­plex, or un­nat­u­ral, as I thought. In real life, you are not ever sup­posed to have a prior im­prob­a­bil­ity of 10-100 for some fact dis­t­in­guished enough to be writ­ten down in ad­vance, and yet en­counter strong ev­i­dence, say 1010 to 1, that the thing has ac­tu­ally hap­pened. If some­thing like that hap­pens, you don’t do a Bayesian up­date to a pos­te­rior of 10-90. In­stead you ques­tion both whether the ev­i­dence might be weaker than it seems, and whether your es­ti­mate of prior im­prob­a­bil­ity might have been poorly cal­ibrated, be­cause ra­tio­nal agents who ac­tu­ally have well-cal­ibrated pri­ors should not en­counter situ­a­tions like that un­til they are ten billion days old. Now, this may mean that I end up do­ing some non-Bayesian up­dates: I say some hy­poth­e­sis has a prior prob­a­bil­ity of a quadrillion to one, you show me ev­i­dence with a like­li­hood ra­tio of a billion to one, and I say ‘Guess I was wrong about that quadrillion to one thing’ rather than be­ing a Mug­gle about it. And then I shut up and listen to what you have to say about how to es­ti­mate prob­a­bil­ities, be­cause on my wor­ld­view, I wasn’t ex­pect­ing to see you turn into a cat. But for me to make a su­per-up­date like that—re­flect­ing a pos­te­rior be­lief that I was log­i­cally in­cor­rect about the prior prob­a­bil­ity—you have to re­ally ac­tu­ally show me the ev­i­dence, you can’t just ask me to imag­ine it. This is some­thing that only log­i­cally in­co­her­ent agents ever say, but that’s all right be­cause I’m not log­i­cally om­ni­scient.”


At some point, we’re go­ing to have to build some sort of ac­tual prior into, you know, some sort of ac­tual self-im­prov­ing AI.

(Scary thought, right?)

So far as I can presently see, the logic re­quiring some sort of lev­er­age penalty—not just so that we don’t pay $5 to Pas­cal’s Mug­ger, but also so that our ex­pected util­ity sums con­verge at all—seems clear enough that I can’t yet see a good al­ter­na­tive to it (feel wel­come to sug­gest one), and Robin Han­son’s ra­tio­nale is by far the best I’ve heard.

In fact, what we ac­tu­ally need is more like a com­bined lev­er­age-and-com­plex­ity penalty, to avoid sce­nar­ios like this:


Mug­ger: “Give me $5 and I’ll save 3↑↑↑3 peo­ple.”

You: “I as­sign prob­a­bil­ity ex­actly 1/​3↑↑↑3 to that.”

Mug­ger: “So that’s one life saved for $5, on av­er­age. That’s a pretty good bar­gain, right?”

You: “Not by com­par­i­son with x-risk re­duc­tion char­i­ties. But I also like to do good on a smaller scale now and then. How about a penny? Would you be will­ing to save 3↑↑↑3/​500 lives for a penny?”

Mug­ger: “Eh, fine.”

You: “Well, the prob­a­bil­ity of that is 500/​3↑↑↑3, so here’s a penny!” (Goes on way, whistling cheer­fully.)


Ad­ding a com­plex­ity penalty and a lev­er­age penalty is nec­es­sary, not just to avert this ex­act sce­nario, but so that we don’t get an in­finite ex­pected util­ity sum over a 1/​3↑↑↑3 prob­a­bil­ity of sav­ing 3↑↑↑3 lives, 1/​(3↑↑↑3 + 1) prob­a­bil­ity of sav­ing 3↑↑↑3 + 1 lives, and so on. If we com­bine the stan­dard com­plex­ity penalty with a lev­er­age penalty, the whole thing should con­verge.

Prob­a­bil­ity penalties are epistemic fea­tures—they af­fect what we be­lieve, not just what we do. Maps, ideally, cor­re­spond to ter­ri­to­ries. Is there any ter­ri­tory that this com­plex­ity+lev­er­age penalty can cor­re­spond to—any state of a sin­gle re­al­ity which would make these the true fre­quen­cies? Or is it only in­ter­pretable as pure un­cer­tainty over re­al­ities, with there be­ing no sin­gle re­al­ity that could cor­re­spond to it? To put it an­other way, the com­plex­ity penalty and the lev­er­age penalty seem un­re­lated, so per­haps they’re mu­tu­ally in­con­sis­tent; can we show that the union of these two the­o­ries has a model?

As near as I can figure, the cor­re­spond­ing state of af­fairs to a com­plex­ity+lev­er­age prior im­prob­a­bil­ity would be a Teg­mark Level IV mul­ti­verse in which each re­al­ity got an amount of mag­i­cal-re­al­ity-fluid cor­re­spond­ing to the com­plex­ity of its pro­gram (1/​2 to the power of its Kol­mogorov com­plex­ity) and then this mag­i­cal-re­al­ity-fluid had to be di­vided among all the causal el­e­ments within that uni­verse—if you con­tain 3↑↑↑3 causal nodes, then each node can only get 1/​3↑↑↑3 of the to­tal re­al­ness of that uni­verse. (As always, the term “mag­i­cal re­al­ity fluid” re­flects an at­tempt to de­mar­cate a philo­soph­i­cal area where I feel quite con­fused, and try to use cor­re­spond­ingly blatantly wrong ter­minol­ogy so that I do not mis­take my rea­son­ing about my con­fu­sion for a solu­tion.) This setup is not en­tirely im­plau­si­ble be­cause the Born prob­a­bil­ities in our own uni­verse look like they might be­have like this sort of mag­i­cal-re­al­ity-fluid—quan­tum am­pli­tude flow­ing be­tween con­figu­ra­tions in a way that pre­serves the to­tal amount of re­al­ness while di­vid­ing it be­tween wor­lds—and per­haps ev­ery other part of the mul­ti­verse must nec­es­sar­ily work the same way for some rea­son. It seems worth not­ing that part of what’s mo­ti­vat­ing this ver­sion of the ‘ter­ri­tory’ is that our sum over all real things, weighted by re­al­ity-fluid, can then con­verge. In other words, the rea­son why com­plex­ity+lev­er­age works in de­ci­sion the­ory is that the union of the two the­o­ries has a model in which the to­tal mul­ti­verse con­tains an amount of re­al­ity-fluid that can sum to 1 rather than be­ing in­finite. (Though we need to sup­pose that ei­ther (a) only pro­grams with a finite num­ber of causal nodes ex­ist, or (2) pro­grams can di­vide finite re­al­ity-fluid among an in­finite num­ber of nodes via some mea­sure that gives ev­ery ex­pe­rience-mo­ment a well-defined rel­a­tive amount of re­al­ity-fluid. Again see caveats about ba­sic philo­soph­i­cal con­fu­sion—per­haps our map needs this prop­erty over its un­cer­tainty but the ter­ri­tory doesn’t have to work the same way, etcetera.)

If an AI’s over­all ar­chi­tec­ture is also such as to en­able it to carry out the “You turned into a cat” effect—where if the AI ac­tu­ally ends up with strong ev­i­dence for a sce­nario it as­signed su­per-ex­po­nen­tial im­prob­a­bil­ity, the AI re­con­sid­ers its pri­ors and the ap­par­ent strength of ev­i­dence rather than ex­e­cut­ing a blind Bayesian up­date, though this part is for­mally a tad un­der­speci­fied—then at the mo­ment I can’t think of any­thing else to add in.

In other words: This is my best cur­rent idea for how a prior, e.g. as used in an AI, could yield de­ci­sion-the­o­retic con­ver­gence over ex­plo­sively large pos­si­ble wor­lds.

How­ever, I would still call this a semi-open FAI prob­lem (edit: wide-open) be­cause it seems quite plau­si­ble that some­body is go­ing to kick holes in the over­all view I’ve just pre­sented, or come up with a bet­ter solu­tion, pos­si­bly within an hour of my post­ing this—the pro­posal is both re­cent and weak even by my stan­dards. I’m also wor­ried about whether it turns out to im­ply any­thing crazy on an­thropic prob­lems. Over to you, read­ers.