# Pascal’s Muggle: Infinitesimal Priors and Strong Evidence

Short form: Pas­cal’s Muggle

tl;dr: If you as­sign su­per­ex­po­nen­tially in­finites­i­mal prob­a­bil­ity to claims of large im­pacts, then ap­par­ently you should ig­nore the pos­si­bil­ity of a large im­pact even af­ter see­ing huge amounts of ev­i­dence. If a poorly-dressed street per­son offers to save 10(10^100) lives (googol­plex lives) for \$5 us­ing their Ma­trix Lord pow­ers, and you claim to as­sign this sce­nario less than 10-(10^100) prob­a­bil­ity, then ap­par­ently you should con­tinue to be­lieve ab­solutely that their offer is bo­gus even af­ter they snap their fingers and cause a gi­ant silhou­ette of them­selves to ap­pear in the sky. For the same rea­son, any ev­i­dence you en­counter show­ing that the hu­man species could cre­ate a suffi­ciently large num­ber of de­scen­dants—no mat­ter how nor­mal the cor­re­spond­ing laws of physics ap­pear to be, or how well-de­signed the ex­per­i­ments which told you about them—must be re­jected out of hand. There is a pos­si­ble re­ply to this ob­jec­tion us­ing Robin Han­son’s an­thropic ad­just­ment against the prob­a­bil­ity of large im­pacts, and in this case you will treat a Pas­cal’s Mug­ger as hav­ing de­ci­sion-the­o­retic im­por­tance ex­actly pro­por­tional to the Bayesian strength of ev­i­dence they pre­sent you, with­out quan­ti­ta­tive de­pen­dence on the num­ber of lives they claim to save. This how­ever cor­re­sponds to an odd men­tal state which some, such as my­self, would find un­satis­fac­tory. In the end, how­ever, I can­not see any bet­ter can­di­date for a prior than hav­ing a lev­er­age penalty plus a com­plex­ity penalty on the prior prob­a­bil­ity of sce­nar­ios.

In late 2007 I coined the term “Pas­cal’s Mug­ging” to de­scribe a prob­lem which seemed to me to arise when com­bin­ing con­ven­tional de­ci­sion the­ory and con­ven­tional episte­mol­ogy in the ob­vi­ous way. On con­ven­tional episte­mol­ogy, the prior prob­a­bil­ity of hy­pothe­ses diminishes ex­po­nen­tially with their com­plex­ity; if it would take 20 bits to spec­ify a hy­poth­e­sis, then its prior prob­a­bil­ity re­ceives a 2-20 penalty fac­tor and it will re­quire ev­i­dence with a like­li­hood ra­tio of 1,048,576:1 - ev­i­dence which we are 1048576 times more likely to see if the the­ory is true, than if it is false—to make us as­sign it around 50-50 cred­i­bil­ity. (This isn’t as hard as it sounds. Flip a coin 20 times and note down the ex­act se­quence of heads and tails. You now be­lieve in a state of af­fairs you would have as­signed a mil­lion-to-one prob­a­bil­ity be­fore­hand—namely, that the coin would pro­duce the ex­act se­quence HTHHHHTHTTH… or what­ever—af­ter ex­pe­rienc­ing sen­sory data which are more than a mil­lion times more prob­a­ble if that fact is true than if it is false.) The prob­lem is that al­though this kind of prior prob­a­bil­ity penalty may seem very strict at first, it’s easy to con­struct phys­i­cal sce­nar­ios that grow in size vastly faster than they grow in com­plex­ity.

I origi­nally illus­trated this us­ing Pas­cal’s Mug­ger: A poorly dressed street per­son says “I’m ac­tu­ally a Ma­trix Lord run­ning this world as a com­puter simu­la­tion, along with many oth­ers—the uni­verse above this one has laws of physics which al­low me easy ac­cess to vast amounts of com­put­ing power. Just for fun, I’ll make you an offer—you give me five dol­lars, and I’ll use my Ma­trix Lord pow­ers to save 3↑↑↑↑3 peo­ple in­side my simu­la­tions from dy­ing and let them live long and happy lives” where ↑ is Knuth’s up-ar­row no­ta­tion. This was origi­nally posted in 2007, when I was a bit more naive about what kind of math­e­mat­i­cal no­ta­tion you can throw into a ran­dom blog post with­out cre­at­ing a stum­bling block. (E.g.: On sev­eral oc­ca­sions now, I’ve seen some­one on the In­ter­net ap­prox­i­mate the num­ber of dust specks from this sce­nario as be­ing a “billion”, since any in­com­pre­hen­si­bly large num­ber equals a billion.) Let’s try an eas­ier (and way smaller) num­ber in­stead, and sup­pose that Pas­cal’s Mug­ger offers to save a googol­plex lives, where a googol is 10100 (a 1 fol­lowed by a hun­dred ze­roes) and a googol­plex is 10 to the googol power, so 1010100 or 1010,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 lives saved if you pay Pas­cal’s Mug­ger five dol­lars, if the offer is hon­est.

If Pas­cal’s Mug­ger had only offered to save a mere googol lives (10100), we could per­haps re­ply that al­though the no­tion of a Ma­trix Lord may sound sim­ple to say in English, if we ac­tu­ally try to imag­ine all the ma­chin­ery in­volved, it works out to a sub­stan­tial amount of com­pu­ta­tional com­plex­ity. (Similarly, Thor is a worse ex­pla­na­tion for light­ning bolts than the laws of physics be­cause, among other points, an an­thro­po­mor­phic de­ity is more com­plex than calcu­lus in for­mal terms—it would take a larger com­puter pro­gram to simu­late Thor as a com­plete mind, than to simu­late Maxwell’s Equa­tions—even though in mere hu­man words Thor sounds much eas­ier to ex­plain.) To imag­ine this sce­nario in for­mal de­tail, we might have to write out the laws of the higher uni­verse the Mug­ger sup­pos­edly comes from, the Ma­trix Lord’s state of mind lead­ing them to make that offer, and so on. And so (we re­ply) when mere ver­bal English has been trans­lated into a for­mal hy­poth­e­sis, the Kol­mogorov com­plex­ity of this hy­poth­e­sis is more than 332 bits—it would take more than 332 ones and ze­roes to spec­ify—where 2-332 ~ 10-100. There­fore (we con­clude) the net ex­pected value of the Mug­ger’s offer is still tiny, once its prior im­prob­a­bil­ity is taken into ac­count.

But once Pas­cal’s Mug­ger offers to save a googol­plex lives—offers us a sce­nario whose value is con­structed by twice-re­peated ex­po­nen­ti­a­tion—we seem to run into some difficulty us­ing this an­swer. Can we re­ally claim that the com­plex­ity of this sce­nario is on the or­der of a googol bits—that to for­mally write out the hy­poth­e­sis would take one hun­dred billion billion times more bits than there are atoms in the ob­serv­able uni­verse?

And a tiny, paltry num­ber like a googol­plex is only the be­gin­ning of com­pu­ta­tion­ally sim­ple num­bers that are uni­mag­in­ably huge. Ex­po­nen­ti­a­tion is defined as re­peated mul­ti­pli­ca­tion: If you see a num­ber like 35, it tells you to mul­ti­ply five 3s to­gether: 3×3×3×3×3 = 243. Sup­pose we write 35 as 3↑5, so that a sin­gle ar­row ↑ stands for ex­po­nen­ti­a­tion, and let the dou­ble ar­row ↑↑ stand for re­peated ex­po­nen­ta­tion, or tetra­tion. Thus 3↑↑3 would stand for 3↑(3↑3) or 333 = 327 = 7,625,597,484,987. Te­tra­tion is also writ­ten as fol­lows: 33 = 3↑↑3. Thus 42 = 2222 = 224 = 216 = 65,536. Then pen­ta­tion, or re­peated tetra­tion, would be writ­ten with 3↑↑↑3 = 333 = 7,625,597,484,9873 = 33...3 where the … sum­ma­rizes an ex­po­nen­tial tower of 3s seven trillion lay­ers high.

But 3↑↑↑3 is still quite sim­ple com­pu­ta­tion­ally—we could de­scribe a small Tur­ing ma­chine which com­putes it—so a hy­poth­e­sis in­volv­ing 3↑↑↑3 should not there­fore get a large com­plex­ity penalty, if we’re pe­nal­iz­ing hy­pothe­ses by al­gorith­mic com­plex­ity.

I had origi­nally in­tended the sce­nario of Pas­cal’s Mug­ging to point up what seemed like a ba­sic prob­lem with com­bin­ing con­ven­tional episte­mol­ogy with con­ven­tional de­ci­sion the­ory: Con­ven­tional episte­mol­ogy says to pe­nal­ize hy­pothe­ses by an ex­po­nen­tial fac­tor of com­pu­ta­tional com­plex­ity. This seems pretty strict in ev­ery­day life: “What? for a mere 20 bits I am to be called a mil­lion times less prob­a­ble?” But for stranger hy­pothe­ses about things like Ma­trix Lords, the size of the hy­po­thet­i­cal uni­verse can blow up enor­mously faster than the ex­po­nen­tial of its com­plex­ity. This would mean that all our de­ci­sions were dom­i­nated by tiny-seem­ing prob­a­bil­ities (on the or­der of 2-100 and less) of sce­nar­ios where our light­est ac­tion af­fected 3↑↑4 peo­ple… which would in turn be dom­i­nated by even more re­mote prob­a­bil­ities of af­fect­ing 3↑↑5 peo­ple...

This prob­lem is worse than just giv­ing five dol­lars to Pas­cal’s Mug­ger—our ex­pected util­ities don’t con­verge at all! Con­ven­tional episte­mol­ogy tells us to sum over the pre­dic­tions of all hy­pothe­ses weighted by their com­pu­ta­tional com­plex­ity and ev­i­den­tial fit. This works fine with epistemic prob­a­bil­ities and sen­sory pre­dic­tions be­cause no hy­poth­e­sis can pre­dict more than prob­a­bil­ity 1 or less than prob­a­bil­ity 0 for a sen­sory ex­pe­rience. As hy­pothe­ses get more and more com­plex, their con­tributed pre­dic­tions have tinier and tinier weights, and the sum con­verges quickly. But de­ci­sion the­ory tells us to calcu­late ex­pected util­ity by sum­ming the util­ity of each pos­si­ble out­come, times the prob­a­bil­ity of that out­come con­di­tional on our ac­tion. If hy­po­thet­i­cal util­ities can grow faster than hy­po­thet­i­cal prob­a­bil­ity diminishes, the con­tri­bu­tion of an av­er­age term in the se­ries will keep in­creas­ing, and this sum will never con­verge—not if we try to do it the same way we got our epistemic pre­dic­tions, by sum­ming over com­plex­ity-weighted pos­si­bil­ities. (See also this similar-but-differ­ent pa­per by Peter de Blanc.)

Un­for­tu­nately I failed to make it clear in my origi­nal writeup that this was where the prob­lem came from, and that it was gen­eral to situ­a­tions be­yond the Mug­ger. Nick Bostrom’s writeup of Pas­cal’s Mug­ging for a philos­o­phy jour­nal used a Mug­ger offer­ing a quin­til­lion days of hap­piness, where a quin­til­lion is merely 1,000,000,000,000,000,000 = 1018. It takes at least two ex­po­nen­ti­a­tions to out­run a singly-ex­po­nen­tial com­plex­ity penalty. I would be will­ing to as­sign a prob­a­bil­ity of less than 1 in 1018 to a ran­dom per­son be­ing a Ma­trix Lord. You may not have to in­voke 3↑↑↑3 to cause prob­lems, but you’ve got to use some­thing like 1010100 - dou­ble ex­po­nen­ti­a­tion or bet­ter. Ma­nipu­lat­ing or­di­nary hy­pothe­ses about the or­di­nary phys­i­cal uni­verse taken at face value, which just con­tains 1080 atoms within range of our telescopes, should not lead us into such difficul­ties.

(And then the phrase “Pas­cal’s Mug­ging” got com­pletely bas­tardized to re­fer to an emo­tional feel­ing of be­ing mugged that some peo­ple ap­par­ently get when a high-stakes char­i­ta­ble propo­si­tion is pre­sented to them, re­gard­less of whether it’s sup­posed to have a low prob­a­bil­ity. This is enough to make me re­gret hav­ing ever in­vented the term “Pas­cal’s Mug­ging” in the first place; and for fur­ther thoughts on this see The Pas­cal’s Wager Fal­lacy Fal­lacy (just be­cause the stakes are high does not mean the prob­a­bil­ities are low, and Pas­cal’s Wager is fal­la­cious be­cause of the low prob­a­bil­ity, not the high stakes!) and Be­ing Half-Ra­tional About Pas­cal’s Wager Is Even Worse. Again, when deal­ing with is­sues the mere size of the ap­par­ent uni­verse, on the or­der of 1080 - for small large num­bers—we do not run into the sort of de­ci­sion-the­o­retic prob­lems I origi­nally meant to sin­gle out by the con­cept of “Pas­cal’s Mug­ging”. My rough in­tu­itive stance on x-risk char­ity is that if you are one of the tiny frac­tion of all sen­tient be­ings who hap­pened to be born here on Earth be­fore the in­tel­li­gence ex­plo­sion, when the ex­is­tence of the whole vast in­ter­galac­tic fu­ture de­pends on what we do now, you should ex­pect to find your­self sur­rounded by a smor­gas­bord of op­por­tu­ni­ties to af­fect small large num­bers of sen­tient be­ings. There is then no rea­son to worry about tiny prob­a­bil­ities of hav­ing a large im­pact when we can ex­pect to find medium-sized op­por­tu­ni­ties of hav­ing a large im­pact, so long as we re­strict our­selves to im­pacts no larger than the size of the known uni­verse.)

One pro­posal which has been floated for deal­ing with Pas­cal’s Mug­ger in the de­ci­sion-the­o­retic sense is to pe­nal­ize hy­pothe­ses that let you af­fect a large num­ber of peo­ple, in pro­por­tion to the num­ber of peo­ple af­fected—what we could call per­haps a “lev­er­age penalty” in­stead of a “com­plex­ity penalty”.

Un­for­tu­nately this po­ten­tially leads us into a differ­ent prob­lem, that of Pas­cal’s Mug­gle.

Sup­pose a poorly-dressed street per­son asks you for five dol­lars in ex­change for do­ing a googol­plex’s worth of good us­ing his Ma­trix Lord pow­ers.

“Well,” you re­ply, “I think it very im­prob­a­ble that I would be able to af­fect so many peo­ple through my own, per­sonal ac­tions—who am I to have such a great im­pact upon events? In­deed, I think the prob­a­bil­ity is some­where around one over googol­plex, maybe a bit less. So no, I won’t pay five dol­lars—it is un­think­ably im­prob­a­ble that I could do so much good!”

“I see,” says the Mug­ger.

A wind be­gins to blow about the alley, whip­ping the Mug­ger’s loose clothes about him as they shift from ill-fit­ting shirt and jeans into robes of in­finite black­ness, within whose depths tiny galax­ies and stranger things seem to twin­kle. In the sky above, a gap edged by blue fire opens with a hor­ren­dous tear­ing sound—you can hear peo­ple on the nearby street yel­ling in sud­den shock and ter­ror, im­ply­ing that they can see it too—and dis­plays the image of the Mug­ger him­self, wear­ing the same robes that now adorn his body, seated be­fore a key­board and a mon­i­tor.

“That’s not ac­tu­ally me,” the Mug­ger says, “just a con­cep­tual rep­re­sen­ta­tion, but I don’t want to drive you in­sane. Now give me those five dol­lars, and I’ll save a googol­plex lives, just as promised. It’s easy enough for me, given the com­put­ing power my home uni­verse offers. As for why I’m do­ing this, there’s an an­cient de­bate in philos­o­phy among my peo­ple—some­thing about how we ought to sum our ex­pected util­ities—and I mean to use the video of this event to make a point at the next de­ci­sion the­ory con­fer­ence I at­tend. Now will you give me the five dol­lars, or not?”

“Mm… no,” you re­ply.

No?” says the Mug­ger. “I un­der­stood ear­lier when you didn’t want to give a ran­dom street per­son five dol­lars based on a wild story with no ev­i­dence be­hind it. But now I’ve offered you ev­i­dence.”

“Un­for­tu­nately, you haven’t offered me enough ev­i­dence,” you ex­plain.

“Really?” says the Mug­ger. “I’ve opened up a fiery por­tal in the sky, and that’s not enough to per­suade you? What do I have to do, then? Rear­range the planets in your so­lar sys­tem, and wait for the ob­ser­va­to­ries to con­firm the fact? I sup­pose I could also ex­plain the true laws of physics in the higher uni­verse in more de­tail, and let you play around a bit with the com­puter pro­gram that en­codes all the uni­verses con­tain­ing the googol­plex peo­ple I would save if you gave me the five dol­lars—”

“Sorry,” you say, shak­ing your head firmly, “there’s just no way you can con­vince me that I’m in a po­si­tion to af­fect a googol­plex peo­ple, be­cause the prior prob­a­bil­ity of that is one over googol­plex. If you wanted to con­vince me of some fact of merely 2-100 prior prob­a­bil­ity, a mere decillion to one—like that a coin would come up heads and tails in some par­tic­u­lar pat­tern of a hun­dred coin­flips—then you could just show me 100 bits of ev­i­dence, which is within easy reach of my brain’s sen­sory band­width. I mean, you could just flip the coin a hun­dred times, and my eyes, which send my brain a hun­dred megabits a sec­ond or so—though that gets pro­cessed down to one megabit or so by the time it goes through the lat­eral genicu­late nu­cleus—would eas­ily give me enough data to con­clude that this decillion-to-one pos­si­bil­ity was true. But to con­clude some­thing whose prior prob­a­bil­ity is on the or­der of one over googol­plex, I need on the or­der of a googol bits of ev­i­dence, and you can’t pre­sent me with a sen­sory ex­pe­rience con­tain­ing a googol bits. In­deed, you can’t ever pre­sent a mor­tal like me with ev­i­dence that has a like­li­hood ra­tio of a googol­plex to one—ev­i­dence I’m a googol­plex times more likely to en­counter if the hy­poth­e­sis is true, than if it’s false—be­cause the chance of all my neu­rons spon­ta­neously re­ar­rang­ing them­selves to fake the same ev­i­dence would always be higher than one over googol­plex. You know the old say­ing about how once you as­sign some­thing prob­a­bil­ity one, or prob­a­bil­ity zero, you can never change your mind re­gard­less of what ev­i­dence you see? Well, odds of a googol­plex to one, or one to a googol­plex, work pretty much the same way.”

“So no mat­ter what ev­i­dence I show you,” the Mug­ger says—as the blue fire goes on crack­ling in the torn sky above, and screams and des­per­ate prayers con­tinue from the street be­yond—“you can’t ever no­tice that you’re in a po­si­tion to help a googol­plex peo­ple.”

“Right!” you say. “I can be­lieve that you’re a Ma­trix Lord. I mean, I’m not a to­tal Mug­gle, I’m psy­cholog­i­cally ca­pa­ble of re­spond­ing in some fash­ion to that gi­ant hole in the sky. But it’s just com­pletely for­bid­den for me to as­sign any sig­nifi­cant prob­a­bil­ity what­so­ever that you will ac­tu­ally save a googol­plex peo­ple af­ter I give you five dol­lars. You’re ly­ing, and I am ab­solutely, ab­solutely, ab­solutely con­fi­dent of that.”

“So you weren’t just in­vok­ing the lev­er­age penalty as a plau­si­ble-sound­ing way of get­ting out of pay­ing me the five dol­lars ear­lier,” the Mug­ger says thought­fully. “I mean, I’d un­der­stand if that was just a ra­tio­nal­iza­tion of your dis­com­fort at fork­ing over five dol­lars for what seemed like a tiny prob­a­bil­ity, when I hadn’t done my duty to pre­sent you with a cor­re­spond­ing amount of ev­i­dence be­fore de­mand­ing pay­ment. But you… you’re act­ing like an AI would if it was ac­tu­ally pro­grammed with a lev­er­age penalty on hy­pothe­ses!”

“Ex­actly,” you say. “I’m for­bid­den a pri­ori to be­lieve I can ever do that much good.”

“Why?” the Mug­ger says cu­ri­ously. “I mean, all I have to do is press this but­ton here and a googol­plex lives will be saved.” The figure within the blaz­ing por­tal above points to a green but­ton on the con­sole be­fore it.

“Like I said,” you ex­plain again, “the prior prob­a­bil­ity is just too in­finites­i­mal for the mas­sive ev­i­dence you’re show­ing me to over­come it—”

The Mug­ger shrugs, and van­ishes in a puff of pur­ple mist.

The por­tal in the sky above closes, tak­ing with the con­sole and the green but­ton.

(The screams go on from the street out­side.)

A few days later, you’re sit­ting in your office at the physics in­sti­tute where you work, when one of your col­leagues bursts in through your door, seem­ing highly ex­cited. “I’ve got it!” she cries. “I’ve figured out that whole dark en­ergy thing! Look, these sim­ple equa­tions retro­d­ict it ex­actly, there’s no way that could be a co­in­ci­dence!”

At first you’re also ex­cited, but as you pore over the equa­tions, your face con­figures it­self into a frown. “No...” you say slowly. “Th­ese equa­tions may look ex­tremely sim­ple so far as com­pu­ta­tional com­plex­ity goes—and they do ex­actly fit the petabytes of ev­i­dence our telescopes have gath­ered so far—but I’m afraid they’re far too im­prob­a­ble to ever be­lieve.”

“What?” she says. “Why?”

“Well,” you say rea­son­ably, “if these equa­tions are ac­tu­ally true, then our de­scen­dants will be able to ex­ploit dark en­ergy to do com­pu­ta­tions, and ac­cord­ing to my back-of-the-en­velope calcu­la­tions here, we’d be able to cre­ate around a googol­plex peo­ple that way. But that would mean that we, here on Earth, are in a po­si­tion to af­fect a googol­plex peo­ple—since, if we blow our­selves up via a nan­otech­nolog­i­cal war or (cough) make cer­tain other er­rors, those googol­plex peo­ple will never come into ex­is­tence. The prior prob­a­bil­ity of us be­ing in a po­si­tion to im­pact a googol­plex peo­ple is on the or­der of one over googol­plex, so your equa­tions must be wrong.”

“Hmm...” she says. “I hadn’t thought of that. But what if these equa­tions are right, and yet some­how, ev­ery­thing I do is ex­actly bal­anced, down to the googolth dec­i­mal point or so, with re­spect to how it im­pacts the chance of mod­ern-day Earth par­ti­ci­pat­ing in a chain of events that leads to cre­at­ing an in­ter­galac­tic civ­i­liza­tion?”

“How would that work?” you say. “There’s only seven billion peo­ple on to­day’s Earth—there’s prob­a­bly been only a hun­dred billion peo­ple who ever ex­isted to­tal, or will ex­ist be­fore we go through the in­tel­li­gence ex­plo­sion or what­ever—so even be­fore an­a­lyz­ing your ex­act po­si­tion, it seems like your lev­er­age on fu­ture af­fairs couldn’t rea­son­ably be less than a one in ten trillion part of the fu­ture or so.”

“But then given this phys­i­cal the­ory which seems ob­vi­ously true, my acts might im­ply ex­pected util­ity differ­en­tials on the or­der of 1010100-13,” she ex­plains, “and I’m not al­lowed to be­lieve that no mat­ter how much ev­i­dence you show me.”

This prob­lem may not be as bad as it looks; with some fur­ther rea­son­ing, the lev­er­age penalty may lead to more sen­si­ble be­hav­ior than de­picted above.

Robin Han­son has sug­gested that the logic of a lev­er­age penalty should stem from the gen­eral im­prob­a­bil­ity of in­di­vi­d­u­als be­ing in a unique po­si­tion to af­fect many oth­ers (which is why I called it a lev­er­age penalty). At most 10 out of 3↑↑↑3 peo­ple can ever be in a po­si­tion to be “solely re­spon­si­ble” for the fate of 3↑↑↑3 peo­ple if “solely re­spon­si­ble” is taken to im­ply a causal chain that goes through no more than 10 peo­ple’s de­ci­sions; i.e. at most 10 peo­ple can ever be solely10 re­spon­si­ble for any given event. Or if “fate” is taken to be a suffi­ciently ul­ti­mate fate that there’s at most 10 other de­ci­sions of similar mag­ni­tude that could cu­mu­late to de­ter­mine some­one’s out­come util­ity to within ±50%, then any given per­son could have their fate10 de­ter­mined on at most 10 oc­ca­sions. We would surely agree, while as­sign­ing pri­ors at the dawn of rea­son­ing, that an agent ran­domly se­lected from the pool of all agents in Real­ity has at most a 100/​X chance of be­ing able to be solely10 re­spon­si­ble for the fate10 of X peo­ple. Any rea­son­ing we do about uni­verses, their com­plex­ity, sen­sory ex­pe­riences, and so on, should main­tain this net bal­ance. You can even strip out the part about agents and carry out the rea­son­ing on pure causal nodes; the chance of a ran­domly se­lected causal node be­ing in a unique100 po­si­tion on a causal graph with re­spect to 3↑↑↑3 other nodes ought to be at most 100/​3↑↑↑3 for finite causal graphs. (As for in­finite causal graphs, well, if prob­lems arise only when in­tro­duc­ing in­finity, maybe it’s in­finity that has the prob­lem.)

Sup­pose we ap­ply the Han­so­nian lev­er­age penalty to the face-value sce­nario of our own uni­verse, in which there are ap­par­ently no aliens and the galax­ies we can reach in the fu­ture con­tain on the or­der of 1080 atoms; which, if the in­tel­li­gence ex­plo­sion goes well, might be trans­formed into on the very loose or­der of… let’s ig­nore a lot of in­ter­me­di­ate calcu­la­tions and just call it the equiv­a­lent of 1080 cen­turies of life. (The neu­rons in your brain perform lots of op­er­a­tions; you don’t get only one com­put­ing op­er­a­tion per el­e­ment, be­cause you’re pow­ered by the Sun over time. The uni­verse con­tains a lot more ne­gen­tropy than just 1080 bits due to things like the grav­i­ta­tional po­ten­tial en­ergy that can be ex­tracted from mass. Plus we should take into ac­count re­versible com­put­ing. But of course it also takes more than one com­put­ing op­er­a­tion to im­ple­ment a cen­tury of life. So I’m just go­ing to xe­rox the num­ber 1080 for use in these calcu­la­tions, since it’s not sup­posed to be the main fo­cus.)

Wouldn’t it be ter­ribly odd to find our­selves—where by ‘our­selves’ I mean the hun­dred billion hu­mans who have ever lived on Earth, for no more than a cen­tury or so apiece—solely100,000,000,000 re­spon­si­ble for the fate10 of around 1080 units of life? Isn’t the prior prob­a­bil­ity of this some­where around 10-68?

Yes, ac­cord­ing to the lev­er­age penalty. But a prior prob­a­bil­ity of 10-68 is not an in­sur­mountable episte­molog­i­cal bar­rier. If you’re tak­ing things at face value, 10-68 is just 226 bits of ev­i­dence or there­abouts, and your eyes are send­ing you a megabit per sec­ond. Be­com­ing con­vinced that you, yes you are an Earth­ling is epistem­i­cally doable; you just need to see a stream of sen­sory ex­pe­riences which is 1068 times more prob­a­ble if you are an Earth­ling than if you are some­one else. If we take ev­ery­thing at face value, then there could be around 1080 cen­turies of life over the his­tory of the uni­verse, and only 1011 of those cen­turies will be lived by crea­tures who dis­cover them­selves oc­cu­py­ing or­ganic bod­ies. Tak­ing ev­ery­thing at face value, the sen­sory ex­pe­riences of your life are unique to Earth­lings and should im­me­di­ately con­vince you that you’re an Earth­ling—just look­ing around the room you oc­cupy will provide you with sen­sory ex­pe­riences that plau­si­bly be­long to only 1011 out of 1080 life-cen­turies.

If we don’t take ev­ery­thing at face value, then there might be such things as an­ces­tor simu­la­tions, and it might be that your ex­pe­rience of look­ing around the room is some­thing that hap­pens in 1020 an­ces­tor simu­la­tions for ev­ery time that it hap­pens in ‘base level’ re­al­ity. In this case your prob­a­ble lev­er­age on the fu­ture is diluted (though it may be large even post-dilu­tion). But this is not some­thing that the Han­so­nian lev­er­age penalty forces you to be­lieve—not when the pu­ta­tive stakes are still as small as 1080. Con­cep­tu­ally, the Han­so­nian lev­er­age penalty doesn’t in­ter­act much with the Si­mu­la­tion Hy­poth­e­sis (SH) at all. If you don’t be­lieve SH, then you think that the ex­pe­riences of crea­tures like yours are rare in the uni­verse and hence pre­sent strong, con­vinc­ing ev­i­dence for you oc­cu­py­ing the lev­er­age-priv­ileged po­si­tion of an Earth­ling—much stronger ev­i­dence than its prior im­prob­a­bil­ity. (There’s some sep­a­rate an­thropic is­sues here about whether or not this is it­self ev­i­dence for SH, but I don’t think that ques­tion is in­trin­sic to lev­er­age penalties per se.)

A key point here is that even if you ac­cept a Han­son-style lev­er­age penalty, it doesn’t have to man­i­fest as an in­escapable com­mand­ment of mod­esty. You need not re­fuse to be­lieve (in your deep and ir­re­vo­ca­ble hu­mil­ity) that you could be some­one as spe­cial as an An­cient Earth­ling. Even if Earth­lings mat­ter in the uni­verse—even if we oc­cupy a unique po­si­tion to af­fect the fu­ture of galax­ies—it is still pos­si­ble to en­counter pretty con­vinc­ing ev­i­dence that you’re an Earth­ling. Uni­verses the size of 1080 do not pose prob­lems to con­ven­tional de­ci­sion-the­o­retic rea­son­ing, or to con­ven­tional episte­mol­ogy.

Things play out similarly if—still tak­ing ev­ery­thing at face value—you’re won­der­ing about the chance that you could be spe­cial even for an Earth­ling, be­cause you might be one of say 104 peo­ple in the his­tory of the uni­verse who con­tribute a ma­jor amount to an x-risk re­duc­tion pro­ject which ends up ac­tu­ally sav­ing the galax­ies. The vast ma­jor­ity of the im­prob­a­bil­ity here is just in be­ing an Earth­ling in the first place! Thus most of the clever ar­gu­ments for not tak­ing this high-im­pact pos­si­bil­ity at face value would also tell you not to take be­ing an Earth­ling at face value, since Earth­lings as a whole are much more unique within the to­tal tem­po­ral his­tory of the uni­verse than you are sup­pos­ing your­self to be unique among Earth­lings. But given ¬SH, the prior im­prob­a­bil­ity of be­ing an Earth­ling can be over­come by a few megabits of sen­sory ex­pe­rience from look­ing around the room and query­ing your mem­o­ries—it’s not like 1080 is enough fu­ture be­ings that the num­ber of agents ran­domly hal­lu­ci­nat­ing similar ex­pe­riences out­weighs the num­ber of real Earth­lings. Similarly, if you don’t think lots of Earth­lings are hal­lu­ci­nat­ing the ex­pe­rience of go­ing to a dona­tion page and click­ing on the Pay­pal but­ton for an x-risk char­ity, that sen­sory ex­pe­rience can eas­ily serve to dis­t­in­guish you as one of 104 peo­ple donat­ing to an x-risk philan­thropy.

Yes, there are var­i­ous clever-sound­ing lines of ar­gu­ment which in­volve not tak­ing things at face value—“Ah, but maybe you should con­sider your­self as an in­dis­t­in­guish­able part of this here large refer­ence class of de­luded peo­ple who think they’re im­por­tant.” Which I con­sider to be a bad idea be­cause it ren­ders you a per­ma­nent Mug­gle by putting you into an in­escapable refer­ence class of self-de­luded peo­ple and then dis­miss­ing all your fur­ther thoughts as in­suffi­cient ev­i­dence be­cause you could just be de­lud­ing your­self fur­ther about whether these are good ar­gu­ments. Nor do I be­lieve the world can only be saved by good peo­ple who are in­ca­pable of dis­t­in­guish­ing them­selves from a large class of crack­pots, all of whom have no choice but to con­tinue based on the tiny prob­a­bil­ity that they are not crack­pots. (For more on this see Be­ing Half-Ra­tional About Pas­cal’s Wager Is Even Worse.) In this case you are a Pas­cal’s Mug­gle not be­cause you’ve ex­plic­itly as­signed a prob­a­bil­ity like one over googol­plex, but be­cause you took an im­prob­a­bil­ity like 10-6 at un­ques­tion­ing face value and then clev­erly ques­tioned all the ev­i­dence which could’ve over­come that prior im­prob­a­bil­ity, and so, in prac­tice, you can never climb out of the episte­molog­i­cal sink­hole. By the same to­ken, you should con­clude that you are just self-de­luded about be­ing an Earth­ling since real Earth­lings are so rare and priv­ileged in their lev­er­age.

In gen­eral, lev­er­age penalties don’t trans­late into ad­vice about mod­esty or that you’re just de­lud­ing your­self—they just say that to be ra­tio­nally co­her­ent, your pic­ture of the uni­verse has to im­ply that your sen­sory ex­pe­riences are at least as rare as the cor­re­spond­ing mag­ni­tude of your lev­er­age.

Which brings us back to Pas­cal’s Mug­ger, in the origi­nal alley­way ver­sion. The Han­so­nian lev­er­age penalty seems to im­ply that to be co­her­ent, ei­ther you be­lieve that your sen­sory ex­pe­riences are re­ally ac­tu­ally 1 in a googol­plex—that only 1 in a googol­plex be­ings ex­pe­riences what you’re ex­pe­rienc­ing—or else you re­ally can’t take the situ­a­tion at face value.

Sup­pose the Mug­ger is tel­ling the truth, and a googol­plex other peo­ple are be­ing simu­lated. Then there are at least a googol­plex peo­ple in the uni­verse. Per­haps some of them are hal­lu­ci­nat­ing a situ­a­tion similar to this one by sheer chance? Rather than tel­ling you flatly that you can’t have a large im­pact, the Han­so­nian lev­er­age penalty im­plies a co­her­ence re­quire­ment on how uniquely you think your sen­sory ex­pe­riences iden­tify the po­si­tion you be­lieve your­self to oc­cupy. When it comes to be­liev­ing you’re one of 1011 Earth­lings who can im­pact 1080 other life-cen­turies, you need to think your sen­sory ex­pe­riences are unique to Earth­lings—iden­tify Earth­lings with a like­li­hood ra­tio on the or­der of 1069. This is quite achiev­able, if we take the ev­i­dence at face value. But when it comes to im­prob­a­bil­ity on the or­der of 1/​3↑↑↑3, the prior im­prob­a­bil­ity is in­escapable—your sen­sory ex­pe­riences can’t pos­si­bly be that unique—which is as­sumed to be ap­pro­pri­ate be­cause al­most-ev­ery­one who ever be­lieves they’ll be in a po­si­tion to help 3↑↑↑3 peo­ple will in fact be hal­lu­ci­nat­ing. Boltz­mann brains should be much more com­mon than peo­ple in a unique po­si­tion to af­fect 3↑↑↑3 oth­ers, at least if the causal graphs are finite.

Fur­ther­more—al­though I didn’t re­al­ize this part un­til re­cently—ap­ply­ing Bayesian up­dates from that start­ing point may par­tially avert the Pas­cal’s Mug­gle effect:

Mug­ger: “Give me five dol­lars, and I’ll save 3↑↑↑3 lives us­ing my Ma­trix Pow­ers.”

You: “Nope.”

Mug­ger: “Why not? It’s a re­ally large im­pact.”

You: “Yes, and I as­sign a prob­a­bil­ity on the or­der of 1 in 3↑↑↑3 that I would be in a unique po­si­tion to af­fect 3↑↑↑3 peo­ple.”

Mug­ger: “Oh, is that re­ally the prob­a­bil­ity that you as­sign? Be­hold!”

(A gap opens in the sky, edged with blue fire.)

Mug­ger: “Now what do you think, eh?”

You: “Well… I can’t ac­tu­ally say this ob­ser­va­tion has a like­li­hood ra­tio of 3↑↑↑3 to 1. No stream of ev­i­dence that can en­ter a hu­man brain over the course of a cen­tury is ever go­ing to have a like­li­hood ra­tio larger than, say, 101026 to 1 at the ab­surdly most, as­sum­ing one megabit per sec­ond of sen­sory data, for a cen­tury, each bit of which has at least a 1-in-a-trillion er­ror prob­a­bil­ity. I’d prob­a­bly start to be dom­i­nated by Boltz­mann brains or other ex­otic minds well be­fore then.”

Mug­ger: “So you’re not con­vinced.”

You: “In­deed not. The prob­a­bil­ity that you’re tel­ling the truth is so tiny that God couldn’t find it with an elec­tron micro­scope. Here’s the five dol­lars.”

Mug­ger: “Done! You’ve saved 3↑↑↑3 lives! Con­grat­u­la­tions, you’re never go­ing to top that, your peak life ac­com­plish­ment will now always lie in your past. But why’d you give me the five dol­lars if you think I’m ly­ing?”

You: “Well, be­cause the ev­i­dence you did pre­sent me with had a like­li­hood ra­tio of at least a billion to one—I would’ve as­signed less than 10-9 prior prob­a­bil­ity of see­ing this when I woke up this morn­ing—so in ac­cor­dance with Bayes’s The­o­rem I pro­moted the prob­a­bil­ity from 1/​3↑↑↑3 to at least 109/​3↑↑↑3, which when mul­ti­plied by an im­pact of 3↑↑↑3, yields an ex­pected value of at least a billion lives saved for giv­ing you five dol­lars.”

I con­fess that I find this line of rea­son­ing a bit sus­pi­cious—it seems overly clever. But on the level of in­tu­itive virtues of ra­tio­nal­ity, it does seem less stupid than the origi­nal Pas­cal’s Mug­gle; this muggee is at least be­hav­iorally re­act­ing to the ev­i­dence. In fact, they’re re­act­ing in a way ex­actly pro­por­tional to the ev­i­dence—they would’ve as­signed the same net im­por­tance to hand­ing over the five dol­lars if the Mug­ger had offered 3↑↑↑4 lives, so long as the strength of the ev­i­dence seemed the same.

(Any­one who tries to ap­ply the les­sons here to ac­tual x-risk re­duc­tion char­i­ties (which I think is prob­a­bly a bad idea), keep in mind that the vast ma­jor­ity of the im­prob­a­ble-po­si­tion-of-lev­er­age in any x-risk re­duc­tion effort comes from be­ing an Earth­ling in a po­si­tion to af­fect the fu­ture of a hun­dred billion galax­ies, and that sen­sory ev­i­dence for be­ing an Earth­ling is what gives you most of your be­lief that your ac­tions can have an out­sized im­pact.)

So why not just run with this—why not just de­clare the de­ci­sion-the­o­retic prob­lem re­solved, if we have a rule that seems to give rea­son­able be­hav­ioral an­swers in prac­tice? Why not just go ahead and pro­gram that rule into an AI?

Well… I still feel a bit ner­vous about the idea that Pas­cal’s Muggee, af­ter the sky splits open, is hand­ing over five dol­lars while claiming to as­sign prob­a­bil­ity on the or­der of 109/​3↑↑↑3 that it’s do­ing any good.

I think that my own re­ac­tion in a similar situ­a­tion would be along these lines in­stead:

Mug­ger: “Give me five dol­lars, and I’ll save 3↑↑↑3 lives us­ing my Ma­trix Pow­ers.”

Me: “Nope.”

Mug­ger: “So then, you think the prob­a­bil­ity I’m tel­ling the truth is on the or­der of 1/​3↑↑↑3?”

Me: “Yeah… that prob­a­bly has to fol­low. I don’t see any way around that re­vealed be­lief, given that I’m not ac­tu­ally giv­ing you the five dol­lars. I’ve heard some peo­ple try to claim silly things like, the prob­a­bil­ity that you’re tel­ling the truth is coun­ter­bal­anced by the prob­a­bil­ity that you’ll kill 3↑↑↑3 peo­ple in­stead, or some­thing else with a con­ve­niently equal and op­po­site util­ity. But there’s no way that things would bal­ance out ex­actly in prac­tice, if there was no a pri­ori math­e­mat­i­cal re­quire­ment that they bal­ance. Even if the prior prob­a­bil­ity of your sav­ing 3↑↑↑3 peo­ple and kil­ling 3↑↑↑3 peo­ple, con­di­tional on my giv­ing you five dol­lars, ex­actly bal­anced down to the log(3↑↑↑3) dec­i­mal place, the like­li­hood ra­tio for your tel­ling me that you would “save” 3↑↑↑3 peo­ple would not be ex­actly 1:1 for the two hy­pothe­ses down to the log(3↑↑↑3) dec­i­mal place. So if I as­signed prob­a­bil­ities much greater than 1/​3↑↑↑3 to your do­ing some­thing that af­fected 3↑↑↑3 peo­ple, my ac­tions would be over­whelm­ingly dom­i­nated by even a tiny differ­ence in like­li­hood ra­tio ele­vat­ing the prob­a­bil­ity that you saved 3↑↑↑3 peo­ple over the prob­a­bil­ity that you did some­thing bad to them. The only way this hy­poth­e­sis can’t dom­i­nate my ac­tions—re­ally, the only way my ex­pected util­ity sums can con­verge at all—is if I as­sign prob­a­bil­ity on the or­der of 1/​3↑↑↑3 or less. I don’t see any way of es­cap­ing that part.”

Mug­ger: “But can you, in your mor­tal un­cer­tainty, truly as­sign a prob­a­bil­ity as low as 1 in 3↑↑↑3 to any propo­si­tion what­ever? Can you truly be­lieve, with your er­ror-prone neu­ral brain, that you could make 3↑↑↑3 state­ments of any kind one af­ter an­other, and be wrong, on av­er­age, about once?”

Me: “Nope.”

Mug­ger: “So give me five dol­lars!”

Me: “Nope.”

Mug­ger: “Why not?”

Me: “Be­cause even though I, in my mor­tal un­cer­tainty, will even­tu­ally be wrong about all sorts of things if I make enough state­ments one af­ter an­other, this fact can’t be used to in­crease the prob­a­bil­ity of ar­bi­trary state­ments be­yond what my prior says they should be, be­cause then my prior would sum to more than 1. There must be some kind of re­quired con­di­tion for tak­ing a hy­poth­e­sis se­ri­ously enough to worry that I might be over­con­fi­dent about it—”

Mug­ger: “Then be­hold!”

(A gap opens in the sky, edged with blue fire.)

Mug­ger: “Now what do you think, eh?”

Me (star­ing up at the sky): ”...whoa.” (Pause.) “You turned into a cat.”

Mug­ger: “What?”

Me: “Pri­vate joke. Okay, I think I’m go­ing to have to re­think a lot of things. But if you want to tell me about how I was wrong to as­sign a prior prob­a­bil­ity on the or­der of 1/​3↑↑↑3 to your sce­nario, I will shut up and listen very care­fully to what you have to say about it. Oh, and here’s the five dol­lars, can I pay an ex­tra twenty and make some other re­quests?”

(The thought bub­ble pops, and we re­turn to two peo­ple stand­ing in an alley, the sky above perfectly nor­mal.)

Mug­ger: “Now, in this sce­nario we’ve just imag­ined, you were tak­ing my case se­ri­ously, right? But the ev­i­dence there couldn’t have had a like­li­hood ra­tio of more than 101026 to 1, and prob­a­bly much less. So by the method of imag­i­nary up­dates, you must as­sign prob­a­bil­ity at least 10-1026 to my sce­nario, which when mul­ti­plied by a benefit on the or­der of 3↑↑↑3, yields an uni­mag­in­able bo­nanza in ex­change for just five dol­lars—”

Me: “Nope.”

Mug­ger: “How can you pos­si­bly say that? You’re not be­ing log­i­cally co­her­ent!”

Me: “I agree that I’m not be­ing log­i­cally co­her­ent, but I think that’s ac­cept­able in this case.”

Mug­ger: “This ought to be good. Since when are ra­tio­nal­ists al­lowed to de­liber­ately be log­i­cally in­co­her­ent?”

Me: “Since we don’t have in­finite com­put­ing power—”

Mug­ger: “That sounds like a fully gen­eral ex­cuse if I ever heard one.”

Me: “No, this is a spe­cific con­se­quence of bounded com­put­ing power. Let me start with a sim­pler ex­am­ple. Sup­pose I be­lieve in a set of math­e­mat­i­cal ax­ioms. Since I don’t have in­finite com­put­ing power, I won’t be able to know all the de­duc­tive con­se­quences of those ax­ioms. And that means I will nec­es­sar­ily fall prey to the con­junc­tion fal­lacy, in the sense that you’ll pre­sent me with a the­o­rem X that is a de­duc­tive con­se­quence of my ax­ioms, but which I don’t know to be a de­duc­tive con­se­quence of my ax­ioms, and you’ll ask me to as­sign a prob­a­bil­ity to X, and I’ll as­sign it 50% prob­a­bil­ity or some­thing. Then you pre­sent me with a brilli­ant lemma Y, which clearly seems like a likely con­se­quence of my math­e­mat­i­cal ax­ioms, and which also seems to im­ply X—once I see Y, the con­nec­tion from my ax­ioms to X, via Y, be­comes ob­vi­ous. So I as­sign P(X&Y) = 90%, or some­thing like that. Well, that’s the con­junc­tion fal­lacy—I as­signed P(X&Y) > P(X). The thing is, if you then ask me P(X), af­ter I’ve seen Y, I’ll re­ply that P(X) is 91% or at any rate some­thing higher than P(X&Y). I’ll have changed my mind about what my prior be­liefs log­i­cally im­ply, be­cause I’m not log­i­cally om­ni­scient, even if that looks like as­sign­ing prob­a­bil­ities over time which are in­co­her­ent in the Bayesian sense.”

Mug­ger: “And how does this work out to my not get­ting five dol­lars?”

Me: “In the sce­nario you’re ask­ing me to imag­ine, you pre­sent me with ev­i­dence which I cur­rently think Just Plain Shouldn’t Hap­pen. And if that ac­tu­ally does hap­pen, the sen­si­ble way for me to re­act is by ques­tion­ing my prior as­sump­tions and the rea­son­ing which led me as­sign such low prob­a­bil­ity. One way that I han­dle my lack of log­i­cal om­ni­science—my finite, er­ror-prone rea­son­ing ca­pa­bil­ities—is by be­ing will­ing to as­sign in­finites­i­mal prob­a­bil­ities to non-priv­ileged hy­pothe­ses so that my prior over all pos­si­bil­ities can sum to 1. But if I ac­tu­ally see strong ev­i­dence for some­thing I pre­vi­ously thought was su­per-im­prob­a­ble, I don’t just do a Bayesian up­date, I should also ques­tion whether I was right to as­sign such a tiny prob­a­bil­ity in the first place—whether it was re­ally as com­plex, or un­nat­u­ral, as I thought. In real life, you are not ever sup­posed to have a prior im­prob­a­bil­ity of 10-100 for some fact dis­t­in­guished enough to be writ­ten down in ad­vance, and yet en­counter strong ev­i­dence, say 1010 to 1, that the thing has ac­tu­ally hap­pened. If some­thing like that hap­pens, you don’t do a Bayesian up­date to a pos­te­rior of 10-90. In­stead you ques­tion both whether the ev­i­dence might be weaker than it seems, and whether your es­ti­mate of prior im­prob­a­bil­ity might have been poorly cal­ibrated, be­cause ra­tio­nal agents who ac­tu­ally have well-cal­ibrated pri­ors should not en­counter situ­a­tions like that un­til they are ten billion days old. Now, this may mean that I end up do­ing some non-Bayesian up­dates: I say some hy­poth­e­sis has a prior prob­a­bil­ity of a quadrillion to one, you show me ev­i­dence with a like­li­hood ra­tio of a billion to one, and I say ‘Guess I was wrong about that quadrillion to one thing’ rather than be­ing a Mug­gle about it. And then I shut up and listen to what you have to say about how to es­ti­mate prob­a­bil­ities, be­cause on my wor­ld­view, I wasn’t ex­pect­ing to see you turn into a cat. But for me to make a su­per-up­date like that—re­flect­ing a pos­te­rior be­lief that I was log­i­cally in­cor­rect about the prior prob­a­bil­ity—you have to re­ally ac­tu­ally show me the ev­i­dence, you can’t just ask me to imag­ine it. This is some­thing that only log­i­cally in­co­her­ent agents ever say, but that’s all right be­cause I’m not log­i­cally om­ni­scient.”

At some point, we’re go­ing to have to build some sort of ac­tual prior into, you know, some sort of ac­tual self-im­prov­ing AI.

(Scary thought, right?)

So far as I can presently see, the logic re­quiring some sort of lev­er­age penalty—not just so that we don’t pay \$5 to Pas­cal’s Mug­ger, but also so that our ex­pected util­ity sums con­verge at all—seems clear enough that I can’t yet see a good al­ter­na­tive to it (feel wel­come to sug­gest one), and Robin Han­son’s ra­tio­nale is by far the best I’ve heard.

In fact, what we ac­tu­ally need is more like a com­bined lev­er­age-and-com­plex­ity penalty, to avoid sce­nar­ios like this:

Mug­ger: “Give me \$5 and I’ll save 3↑↑↑3 peo­ple.”

You: “I as­sign prob­a­bil­ity ex­actly 1/​3↑↑↑3 to that.”

Mug­ger: “So that’s one life saved for \$5, on av­er­age. That’s a pretty good bar­gain, right?”

You: “Not by com­par­i­son with x-risk re­duc­tion char­i­ties. But I also like to do good on a smaller scale now and then. How about a penny? Would you be will­ing to save 3↑↑↑3/​500 lives for a penny?”

Mug­ger: “Eh, fine.”

You: “Well, the prob­a­bil­ity of that is 500/​3↑↑↑3, so here’s a penny!” (Goes on way, whistling cheer­fully.)

Ad­ding a com­plex­ity penalty and a lev­er­age penalty is nec­es­sary, not just to avert this ex­act sce­nario, but so that we don’t get an in­finite ex­pected util­ity sum over a 1/​3↑↑↑3 prob­a­bil­ity of sav­ing 3↑↑↑3 lives, 1/​(3↑↑↑3 + 1) prob­a­bil­ity of sav­ing 3↑↑↑3 + 1 lives, and so on. If we com­bine the stan­dard com­plex­ity penalty with a lev­er­age penalty, the whole thing should con­verge.

Prob­a­bil­ity penalties are epistemic fea­tures—they af­fect what we be­lieve, not just what we do. Maps, ideally, cor­re­spond to ter­ri­to­ries. Is there any ter­ri­tory that this com­plex­ity+lev­er­age penalty can cor­re­spond to—any state of a sin­gle re­al­ity which would make these the true fre­quen­cies? Or is it only in­ter­pretable as pure un­cer­tainty over re­al­ities, with there be­ing no sin­gle re­al­ity that could cor­re­spond to it? To put it an­other way, the com­plex­ity penalty and the lev­er­age penalty seem un­re­lated, so per­haps they’re mu­tu­ally in­con­sis­tent; can we show that the union of these two the­o­ries has a model?

As near as I can figure, the cor­re­spond­ing state of af­fairs to a com­plex­ity+lev­er­age prior im­prob­a­bil­ity would be a Teg­mark Level IV mul­ti­verse in which each re­al­ity got an amount of mag­i­cal-re­al­ity-fluid cor­re­spond­ing to the com­plex­ity of its pro­gram (1/​2 to the power of its Kol­mogorov com­plex­ity) and then this mag­i­cal-re­al­ity-fluid had to be di­vided among all the causal el­e­ments within that uni­verse—if you con­tain 3↑↑↑3 causal nodes, then each node can only get 1/​3↑↑↑3 of the to­tal re­al­ness of that uni­verse. (As always, the term “mag­i­cal re­al­ity fluid” re­flects an at­tempt to de­mar­cate a philo­soph­i­cal area where I feel quite con­fused, and try to use cor­re­spond­ingly blatantly wrong ter­minol­ogy so that I do not mis­take my rea­son­ing about my con­fu­sion for a solu­tion.) This setup is not en­tirely im­plau­si­ble be­cause the Born prob­a­bil­ities in our own uni­verse look like they might be­have like this sort of mag­i­cal-re­al­ity-fluid—quan­tum am­pli­tude flow­ing be­tween con­figu­ra­tions in a way that pre­serves the to­tal amount of re­al­ness while di­vid­ing it be­tween wor­lds—and per­haps ev­ery other part of the mul­ti­verse must nec­es­sar­ily work the same way for some rea­son. It seems worth not­ing that part of what’s mo­ti­vat­ing this ver­sion of the ‘ter­ri­tory’ is that our sum over all real things, weighted by re­al­ity-fluid, can then con­verge. In other words, the rea­son why com­plex­ity+lev­er­age works in de­ci­sion the­ory is that the union of the two the­o­ries has a model in which the to­tal mul­ti­verse con­tains an amount of re­al­ity-fluid that can sum to 1 rather than be­ing in­finite. (Though we need to sup­pose that ei­ther (a) only pro­grams with a finite num­ber of causal nodes ex­ist, or (2) pro­grams can di­vide finite re­al­ity-fluid among an in­finite num­ber of nodes via some mea­sure that gives ev­ery ex­pe­rience-mo­ment a well-defined rel­a­tive amount of re­al­ity-fluid. Again see caveats about ba­sic philo­soph­i­cal con­fu­sion—per­haps our map needs this prop­erty over its un­cer­tainty but the ter­ri­tory doesn’t have to work the same way, etcetera.)

If an AI’s over­all ar­chi­tec­ture is also such as to en­able it to carry out the “You turned into a cat” effect—where if the AI ac­tu­ally ends up with strong ev­i­dence for a sce­nario it as­signed su­per-ex­po­nen­tial im­prob­a­bil­ity, the AI re­con­sid­ers its pri­ors and the ap­par­ent strength of ev­i­dence rather than ex­e­cut­ing a blind Bayesian up­date, though this part is for­mally a tad un­der­speci­fied—then at the mo­ment I can’t think of any­thing else to add in.

In other words: This is my best cur­rent idea for how a prior, e.g. as used in an AI, could yield de­ci­sion-the­o­retic con­ver­gence over ex­plo­sively large pos­si­ble wor­lds.

How­ever, I would still call this a semi-open FAI prob­lem (edit: wide-open) be­cause it seems quite plau­si­ble that some­body is go­ing to kick holes in the over­all view I’ve just pre­sented, or come up with a bet­ter solu­tion, pos­si­bly within an hour of my post­ing this—the pro­posal is both re­cent and weak even by my stan­dards. I’m also wor­ried about whether it turns out to im­ply any­thing crazy on an­thropic prob­lems. Over to you, read­ers.

• I don’t like to be a bearer of bad news here, but it ought to be stated. This whole lev­er­age ra­tio idea is very ob­vi­ously an in­tel­li­gent kludge /​ patch /​ work around be­cause you have two base level the­o­ries that ei­ther don’t work to­gether or don’t work in­di­vi­d­u­ally.

You already know that some­thing doesn’t work. That’s what the origi­nal post was about and that’s what this post tries to ad­dress. But this is a clunky in­el­e­gant patch, that’s fine for a pro­ject or a web­site, but given be­lief in the rest of your writ­ings on AI, this is high stakes. At those stakes say­ing “we know it doesn’t work, but we patched the bugs we found” is not ac­cept­able.

The com­bi­na­tion of your best guess at pick­ing the rigtht de­ci­sion the­ory and your best guess at episte­mol­ogy pro­duces ab­surd con­clu­sions. Note that you al­lready know this. This knowl­edge which you already have mo­ti­vated this post.

The next step is to iden­tify which is wrong, the de­ci­sion the­ory or the episte­mol­ogy. After that you need to find some­thing that’s not wrong to re­place it. That sucks, it’s prob­a­bly ex­treamly hard, and it prob­a­bly sets you back to square one on mul­ti­ple points. But you can’t know that one of your foun­da­tions is wrong and just keep go­ing. Once you know you are wrong you need to act con­sis­tently with that.

• This whole lev­er­age ra­tio idea is very ob­vi­ously an in­tel­li­gent kludge /​ patch /​ work around

I’m not sure that the kludge works any­way, since there are still some “high im­pact” sce­nar­ios which don’t get kludged out. Let’s imag­ine the mug­ger’s pitch is as fol­lows. “I am the Lord of the Ma­trix, and guess what—you’re in it! I’m in the pro­cess of run­ning a huge num­ber of simu­la­tions of hu­man civ­i­liza­tion, in se­ries, and in each run of the simu­la­tion I am mak­ing a very spe­cial offer to some care­fully se­lected peo­ple within it. If you are pre­pared to hand over \$5 to me, I will kindly pre­vent one dust speck from en­ter­ing the eye of one per­son in each of the next google­plex simu­la­tions that I run! Doesn’t that sound like a great offer?”

Now, rather nat­u­rally, you’re go­ing to tell him to get lost. And in the wor­lds where there re­ally is a Ma­trix Lord, and he’s tel­ling the truth, the ap­proached sub­jects al­most always tell him to get lost as well (the Lord is care­ful in whom he ap­proaches), which means that google­plexes of pre­ventable dust specks hit google­plexes of eyes. Each re­jec­tion of the offer causes a lower to­tal util­ity than would be ob­tained from ac­cept­ing it. And if those wor­lds have a mea­sure > 1/​google­plex, there is on the face of it a net loss in ex­pected util­ity. More likely, we’re just go­ing to get non-con­ver­gent ex­pected util­ities again.

The gen­eral is­sue is that the causal struc­ture of the hy­po­thet­i­cal world is highly lin­ear. A rea­son­able pro­por­tion of nodes (per­haps 1 in a billion) do in­deed have the abil­ity to af­fect a colos­sal num­ber of other nodes in such a world. So the high util­ity out­come doesn’t get sup­pressed by a lo­ca­tional penalty.

• This whole lev­er­age ra­tio idea is very ob­vi­ously an in­tel­li­gent kludge /​ patch /​ work around be­cause you have two base level the­o­ries that ei­ther don’t work to­gether or don’t work in­di­vi­d­u­ally.

I’d be more wor­ried about that if I couldn’t (ap­par­ently) vi­su­al­ize what a cor­re­spond­ing Teg­mark Level IV uni­verse looks like. If the union of two the­o­ries has a model, they can’t be mu­tu­ally in­con­sis­tent. Whether this cor­re­spond­ing mul­ti­verse is plau­si­ble is a differ­ent prob­lem.

• Why is de­ci­sion/​prob­a­bil­ity the­ory al­lowed to con­strain the space of “phys­i­cal” mod­els? It seems that the proper the­ory should not de­pend on meta­phys­i­cal as­sump­tions.

If they are start­ing to re­quire un­cer­tain meta­phys­i­cal as­sump­tions, I think that counts as “not work­ing to­gether”.

• Me­ta­phys­i­cal as­sump­tions are one thing: this one in­volves nor­ma­tive as­sump­tions. There is zero rea­son to think we evolved val­ues that can make any sense at all of sav­ing 3^^^3 peo­ple. The soft­ware we shipped with can­not take num­bers like that in it’s do­main. That we can think up thought ex­per­i­ments that con­fuse our eth­i­cal in­tu­itions is already in­cred­ibly likely. Com­ing up with kludgey meth­ods to make de­ci­sions that give in­tu­itively cor­rect an­swers to the thought ex­per­i­ments while pre­serv­ing nor­mal nor­ma­tive rea­son­ing and then—from there—con­clud­ing some­thing about what the uni­verse must be like is a re­ally odd epistemic po­si­tion to take.

• I’m not fa­mil­iar with any cer­tain meta­phys­i­cal as­sump­tions. And the con­straint here is along the lines of “things con­verge” where it is at least plau­si­ble that re­al­ity has to con­verge too. (Small edit made to fi­nal para­graphs to re­flect this.)

• It seems that the proper the­ory should not de­pend on meta­phys­i­cal as­sump­tions.

That’s the part that starts grat­ing on me. Espe­cially when Eliezer men­tions Teg­mark Level IV with a straight face. I as­sume that I do not grok his mean­ing in ful­l­ness. If he means what I think he means, it would be a great dis­ap­point­ment.

• shminux,

It’s just a fact that you en­dorse a very differ­ent the­ory of “re­al­ity” than Eliezer. Why dis­guise your rea­son­able dis­agree­ment with him by claiming that you don’t un­der­stand him?

You talk like you don’t no­tice when highly-qual­ified-physi­cist shminux is talk­ing and when av­er­age-arm­chair-philoso­pher shminux is talk­ing.

Which is an­noy­ing to me in par­tic­u­lar be­cause physi­cist shminux knows a lot more than I, and I should pay at­ten­tion to what he says in or­der to be less wrong, while philoso­pher shminux is not en­ti­tled to the same weight. So I’d like some mark­ers of which one is talk­ing.

• I thought I was pretty clear re the “mark­ers of which one is talk­ing”. But let me re­cap.

Eliezer has thought about metaethics, de­ci­sion the­o­ries and AI de­sign for much much longer time and much much more se­ri­ously than I have. I can see that when I read what he writes about the is­sues I have not even thought of. While I can­not tell if it is cor­rect, I can cer­tainly tell that there is a fair amount of learn­ing I still have to do if I wanted to be in­ter­est­ing. This is the same feel­ing I used to get (and still get on oc­ca­sion) when talk­ing with an ex­pert in, say, Gen­eral Rel­a­tivity, be­fore I learned the sub­ject in suffi­cient depth. Now that I have some ex­per­tise in the area, I see the situ­a­tion from the other side, as well. I can of­ten rec­og­nize a stan­dard am­a­teur­ish ar­gu­ment be­fore the per­son mak­ing it has finished. I of­ten know ex­actly what im­plicit false premises lead to this ar­gu­ment, be­cause I had been there my­self. If I am lucky, I can suc­cess­fully point out the prob­le­matic as­sump­tions to the am­a­teur in ques­tion, pro­vided I can sim­plify it to the proper level. If so, the re­ac­tion I get is “that’s so cool… so deep… I’ll go and pon­der it, Thank you, Master!”, the same thing I used to feel when hear­ing an ex­pert an­swer my am­a­teur­ish ques­tions.

As far as Eliezer’s area of ex­per­tise is con­cerned, I am on the wrong side of the gulf. Thus I am happy to learn what I can from him in this area and be grat­ified if my hum­ble sug­ges­tions prove use­ful on oc­ca­sion.

I am much more skep­ti­cal about his forays into Quan­tum Me­chan­ics, Rel­a­tivity and some other ar­eas of physics I have more than pass­ing fa­mil­iar­ity with. I do not get the feel­ing that what he says is “deep”, and only oc­ca­sion­ally that it is “in­ter­est­ing”. Hence I am happy to dis­count his mus­ings about MWI as am­a­teur­ish.

There is this grey area be­tween the two, which could be thought of as philos­o­phy of sci­ence. While I am far from an ex­pert in the area, I have put in a fair amount of effort to un­der­stand what the lead­ing edge is. What I find is war­ring camps of hand-wav­ing “ex­perts” with few in­ter­est­ing in­sights and no way to con­vince the ri­val school of any­thing. Th­ese in­ter­est­ing in­sights mostly hap­pen in some­thing more prop­erly called math, lin­guis­tics or cog­ni­tive sci­ence, not philos­o­phy proper. There is no feel­ing of awe you get from listen­ing to a true ex­pert in a cer­tain field. Ex­pert physi­cists who ven­ture into philos­o­phy, like Teg­mark and Page, quickly lose their aura of ex­per­tise and seem mere mor­tals with lit­tle or no ad­van­tage over other am­a­teurs.

When Eliezer talks about some­thing meta­phys­i­cal re­lated to MWI and Teg­mark IV, or any kind of an­throp­ics, I sus­pect that he is out of his depth, be­cause he sounds as such. How­ever, know­ing that he is an ex­pert in a some­what re­lated area makes me think that I may well have missed some­thing im­por­tant, and so I give him the benefit of a doubt and try to figure out what I may have missed. If the only differ­ence is that I “en­dorse a very differ­ent the­ory of “re­al­ity” than Eliezer”, and if this is in­deed only the mat­ter of en­dorse­ment, and there is no way to tell ex­per­i­men­tally who is right, now or in the far fu­ture, then his “the­ory of re­al­ity” be­comes much less rele­vant to me and there­fore much less in­ter­est­ing. Oh, and here I don’t mean re­al­ism vs in­stru­men­tal­ism, I mean falsifi­able mod­els of the “real ex­ter­nal world”, as op­posed to any­thing Everett-like or Bar­bour-like.

• Even if the field X is con­fused, to con­fi­dently dis­miss sub­the­ory Y you must know some­thing con­fi­dently about Y from within this con­fu­sion, such as that Y is in­con­sis­tent or nonre­duc­tion­ist or some­thing. I of­ten oc­cupy this men­tal state my­self but I’m aware that it’s ‘ar­ro­gant’ and set­ting my­self above ev­ery­one in field X who does think Y is plau­si­ble—for ex­am­ple, I am ar­ro­gant with re­spect to re­spected but el­derly physi­cists who think sin­gle-world in­ter­pre­ta­tions of QM are plau­si­ble, or any­one who thinks our con­fu­sion about the ul­ti­mate na­ture of re­al­ity can keep the God sub­the­ory in the run­ning. Our ad­mit­ted con­fu­sion does not per­mit that par­tic­u­lar an­swer to re­main plau­si­ble.

I don’t think any­one I take se­ri­ously would deny that the field of an­throp­ics /​ mag­i­cal-re­al­ity-fluid is con­fused. What do you think you know about all com­putable pro­cesses, or all log­i­cal the­o­ries with mod­els, ex­ist­ing, which makes that ob­vi­ously im­per­mit­ted? In case it’s not clear, I wasn’t en­dors­ing Teg­mark Level IV as the ob­vi­ous truth the way I con­sider MWI ob­vi­ous, nor yet en­dors­ing it at all, rather I was point­ing out that with some fur­ther speci­fi­ca­tion a ver­sion of T4 could provide a model in which fre­quen­cies would go as the prob­a­bil­ities as­signed by the com­plex­ity+lev­er­age penalty, which would not nec­es­sar­ily make it true. It is not clear to me what epistemic state you could oc­cupy from which this would justly dis­ap­point you in me, un­less you con­sid­ered T4 ob­vi­ously for­bid­den even from within our con­fu­sion. And of course I’m fine with your be­ing ar­ro­gant about that, so long as you re­al­ize you’re be­ing ar­ro­gant and so long as you have the episte­molog­i­cal fire­power to back it up.

• Even if the field X is con­fused, to con­fi­dently dis­miss sub­the­ory Y you must know some­thing con­fi­dently about Y from within this con­fu­sion, such as that Y is in­con­sis­tent or nonre­duc­tion­ist or some­thing.

Maybe I was un­clear. I don’t dis­miss Y=TL4 as wrong, I ig­nore it as untestable and there­fore use­less for jus­tify­ing any­thing in­ter­est­ing, like how an AI ought to deal with tiny prob­a­bil­ities of enor­mous util­ities. I agree that I am “ar­ro­gant” here, in the sense that I dis­count an opinion of a smart and pop­u­lar MIT prof as mis­guided. The pos­tu­late “math­e­mat­i­cal ex­is­tence = phys­i­cal ex­is­tence” raises a cat­e­gory er­ror ex­cep­tion for me, as one is, in your words, logic, the other is physics. In fact, I don’t un­der­stand why priv­ilege math to be­gin with. Maybe the uni­verse in­deed does not run on math (man, I still chuckle ev­ery time I re­call that omake). Maybe the trou­ble we have with un­der­stand­ing the world is that we rely on math too much (sorry, get­ting too Cho­pra here). Maybe the ma­trix lord was a sloppy pro­gram­mer whose bugs and self-con­tra­dic­tory as­sump­tions man­i­fest them­selves to us as black hole sin­gu­lar­i­ties, which are hid­den from view only be­cause the code main­tain­ers did a pass­able job of act­ing on the QA re­ports. There are many ideas which are just as pretty and just as un­jus­tifi­able as TL4. I don’t pre­tend to fully grok the “com­plex­ity+lev­er­age penalty” idea, ex­cept to say that your dark en­ergy ex­am­ple makes me think less of it, as it seems to rely on con­sid­er­a­tions I find du­bi­ous (that any model with the po­ten­tial of af­fect­ing gazillions of peo­ple in the far fu­ture if ac­cu­rate is ex­tremely un­likely de­spite be­ing the cur­rently best map available). Is it ar­ro­gant? Prob­a­bly. Is it wrong? Not un­less you prove the al­ter­na­tive right.

• Maybe I was un­clear. I don’t dis­miss Y=TL4 as wrong, I ig­nore it as untestable and there­fore use­less for jus­tify­ing any­thing in­ter­est­ing, like how an AI ought to deal with tiny prob­a­bil­ities of enor­mous util­ities.

He’s not say­ing that the lev­er­age penalty might be cor­rect be­cause we might live in a cer­tain type of Teg­mark IV, he’s say­ing that the fact that the lev­er­age penalty would be cor­rect if we did live in Teg­mark IV + some other as­sump­tions shows (a) that it is a con­sis­tent de­ci­sion pro­ce­dure and¹ (b) it is the sort of de­ci­sion pro­ce­dure that emerges rea­son­ably nat­u­rally and is thus a more rea­son­able hy­poth­e­sis than if we didn’t know it comes up nat­u­ar­ally like that.

It is pos­si­ble that it is hard to com­mu­ni­cate here since Eliezer is mak­ing analo­gies to model the­ory, and I would as­sume that you are not fa­mil­iar with model the­ory.

¹ The word ‘and’ isn’t re­ally cor­rect here. It’s very likely that EY means one of (a) and (b), and pos­si­bly both.

• (Yep. More a than b, it still feels pretty un­nat­u­ral to me.)

• Huh. This whole ex­change makes me more cer­tain than I am miss­ing some­thing cru­cial, but read­ing and dis­sect­ing it re­peat­edly does not seem to help. And ap­par­ently it’s not the is­sue of not know­ing enough math. I guess the men­tal block I can’t get over is “why TL4?”. Or maybe “what other men­tal con­structs could one use in place of TL4 to make a similar ar­gu­ment?”

Maybe pa­per-ma­chine or some­one else on #less­wrong will be able to clar­ify this.

• Or maybe “what other men­tal con­structs could one use in place of TL4 to make a similar ar­gu­ment?”

Have you got one?

• Not sure why you are ask­ing, but yes, I pointed some out 5 lev­els up. They clearly have a com­plex­ity penalty, but I am not sure how much vs TL4. At least I know that the “sloppy pro­gram­mer” con­struct is finite (though pos­si­bly cir­cu­lar). I am not sure how to even be­gin to es­ti­mate the Kol­mogorov com­plex­ity of “ev­ery­thing math­e­mat­i­cally pos­si­ble ex­ists phys­i­cally”. What Tur­ing ma­chine would out­put all pos­si­ble math­e­mat­i­cal struc­tures?

• What Tur­ing ma­chine would out­put all pos­si­ble math­e­mat­i­cal struc­tures?

“Loop in­finitely, in­cre­ment­ing `count` from 1: [Let `steps` be `count`. Iter­ate all le­gal pro­grams un­til `steps` = 0 into `prog`: [Load sub­ma­chine state from “cache tape”. Ex­e­cute one step of `prog`, writ­ing out­put to “out­put tape”. Save ma­chine state onto “cache tape”. Decre­ment `steps`.] ]”

The out­put of ev­ery pro­gram is found on the out­put tape (albeit at in­ter­vals). I’m sure one could de­sign the Tur­ing ma­chine so that it re­ordered the out­put tape with ev­ery piece of data writ­ten so that they’re in or­der too, if you want that. Or make it copy­paste the en­tire out­put so far to the end of the tape, so that ev­ery num­ber of eval­u­a­tion steps for ev­ery Tur­ing ma­chine has its own tape lo­ca­tion. Seemed a lit­tle waste­ful though.

edit: THANK YOU GWERN . This is in­deed what I was think­ing of :D

• Hey, don’t look at me. I’m with you on “Ex­is­tence of T4 is untestable there­fore bor­ing.”

• It is pos­si­ble that it is hard to com­mu­ni­cate here since Eliezer is mak­ing analo­gies to model the­ory, and I would as­sume that you are not fa­mil­iar with model the­ory.

You are right, I am out of my depth math-wise. Maybe that’s why I can’t see the rele­vance of an untestable the­ory to AI de­sign.

• Maybe that’s why I can’t see the rele­vance of an untestable the­ory to AI de­sign.

It seems to be the prob­lem that is rele­vant to AI de­sign. How does an ex­pected util­ity max­imis­ing agent han­dle edge cases and in­finites­i­mals given log­i­cal un­cer­tainty and bounded ca­pa­bil­ities? If you get that wrong then Rocks Fall and Every­one Dies. The rele­vance of any given the­ory of how such things can be mod­el­led is then based on ei­ther suit­abil­ity for use in an AI de­sign (or con­ceiv­ably the im­pli­ca­tions if an AI con­structed and used said model).

• (Also yep.)

• TL4, or at least (TL4+some mea­sure the­ory that gives calcu­la­ble and sen­si­ble an­swers), is not en­tirely un­falsifi­able. For in­stance, it pre­dicts that a ran­dom ob­server (you) should live in a very “big” uni­verse. Since we have plau­si­ble rea­sons to be­lieve TL0-TL3 (or at least, I think we do), and I have a very hard time imag­in­ing spe­cific laws of physics that give “big­ger” causal webs than you get from TL0-TL3, that gives me some weak ev­i­dence for TL4; it could have been falsified but wasn’t.

It seems plau­si­ble that that’s the only ev­i­dence we’ll ever get re­gard­ing TL4. If so, I’m not sure that ei­ther of the terms “testable” or “untestable” ap­ply. “Testable” means “sus­cep­ti­ble to re­pro­ducible ex­per­i­ment”; “untestable” means “un­sus­cep­ti­ble to ex­per­i­ment”; so what do you call some­thing in be­tween, which is sus­cep­ti­ble only to limited and ir­re­pro­ducible ev­i­dence? Qu­a­sitestable?

Of course, you could still per­haps say “I ig­nore it as only qu­a­sitestable and there­fore use­less for jus­tify­ing any­thing in­ter­est­ing”.

• TL4 seems testable by ask­ing what a ‘ran­domly cho­sen’ ob­server would ex­pect to see. In fact, the sim­plest ver­sion seems falsified by the lack of ob­served dis­con­ti­nu­ities in physics (of the ‘clothes turn into a crocodile’ type).

Var­i­ants of TL4 that might hold seem untestable right now. But we could see them as ideas or di­rec­tions for grop­ing to­wards a the­ory, rather than com­plete hy­pothe­ses. Or it might hap­pen that when we un­der­stand an­throp­ics bet­ter, we’ll see an ob­vi­ous test. (Or the origi­nal hy­poth­e­sis might turn out to work, but I strongly doubt that.)

• If he means what I think he means, it would be a great dis­ap­point­ment.

’Splain yo’self.

• See my re­ply to TimS.

• Mug­ger: Give me five dol­lars, and I’ll save 3↑↑↑3 lives us­ing my Ma­trix Pow­ers.

Me: I’m not sure about that.

Mug­ger: So then, you think the prob­a­bil­ity I’m tel­ling the truth is on the or­der of 1/​3↑↑↑3?

Me: Ac­tu­ally no. I’m just not sure I care as much about your 3↑↑↑3 simu­lated peo­ple as much as you think I do.

Mug­ger: “This should be good.”

Me: There’s only some­thing like n=10^10 neu­rons in a hu­man brain, and the num­ber of pos­si­ble states of a hu­man brain ex­po­nen­tial in n. This is stupidly tiny com­pared to 3↑↑↑3, so most of the lives you’re sav­ing will be heav­ily du­pli­cated. I’m not re­ally sure that I care about du­pli­cates that much.

Mug­ger: Well I didn’t say they would all be hu­mans. Haven’t you read enough Sci-Fi to know that you should care about all pos­si­ble sen­tient life?

Me: Of course. But the same sort of rea­son­ing im­plies that, ei­ther there are a lot of du­pli­cates, or else most of the peo­ple you are talk­ing about are in­com­pre­hen­si­bly large, since there aren’t that many small Tur­ing ma­chines to go around. And it’s not at all ob­vi­ous to me that you can de­scribe ar­bi­trar­ily large minds whose ex­is­tence I should care about with­out us­ing up a lot of com­plex­ity. More gen­er­ally, I can’t see any way to de­scribe wor­lds which I care about to a de­gree that vastly out­grows their com­plex­ity. My val­ues are com­pli­cated.

• I’m not re­ally sure that I care about du­pli­cates that much.

Bostrom would prob­a­bly try to ar­gue that you do. See Bostrom (2006).

• Am I crazy, or does Bostrom’s ar­gu­ment in that pa­per fall flat al­most im­me­di­ately, based on a bad moral ar­gu­ment?

His first, and seem­ingly most com­pel­ling, ar­gu­ment for Du­pli­ca­tion over Unifi­ca­tion is that, as­sum­ing an in­finite uni­verse, it’s cer­tain (with prob­a­bil­ity 1) that there is already an iden­ti­cal por­tion of the uni­verse where you’re tor­tur­ing the per­son in front of you. Given Unifi­ca­tion, it’s mean­ingless to dis­t­in­guish be­tween that por­tion and this por­tion, given their phys­i­cal iden­ti­cal­ness, so tor­tur­ing the per­son is morally blame­less, as you’re not in­creas­ing the num­ber of unique ob­servers be­ing tor­tured. Du­pli­ca­tion makes the two in­stances of the per­son dis­tinct due to their differ­ing spa­tial lo­ca­tions, even if ev­ery other phys­i­cal and men­tal as­pect is iden­ti­cal, so tor­tur­ing is still adding to the suffer­ing in the uni­verse.

How­ever, you can flip this over triv­ially and come to a ter­rible con­clu­sion. If Du­pli­ca­tion is true, you merely have to simu­late a per­son un­til they ex­pe­rience a mo­ment of pure he­do­nic bliss, in some eth­i­cally cor­rect man­ner that ev­ery­one agrees is morally good to ex­pe­rience and en­joy. Then, copy the frag­ment of the simu­la­tion cov­er­ing the ex­pe­rienc­ing of that emo­tion, and du­pli­cate it end­lessly. Each du­pli­cate is dis­tinct, and so you’re in­creas­ing the amount of joy in the uni­verse ev­ery time you make a copy. It would be a net win, in fact, if you kil­led ev­ery hu­man and re­placed the earth with a com­puter do­ing noth­ing but run­ning copies of that one per­son ex­pe­rienc­ing a mo­ment of bliss. Unifi­ca­tion takes care of this, by not­ing that du­pli­cat­ing some­one adds, at most, a sin­gle bit of in­for­ma­tion to the uni­verse, so spam­ming the uni­verse with copies of the happy mo­ment counts ei­ther the same as the sin­gle ex­pe­rience, or at most a triv­ial amount more.

Am I think­ing wrong here?

• How­ever, you can flip this over triv­ially and come to a ter­rible con­clu­sion. If Du­pli­ca­tion is true, you merely have to simu­late a per­son un­til they ex­pe­rience a mo­ment of pure he­do­nic bliss, in some eth­i­cally cor­rect man­ner that ev­ery­one agrees is morally good to ex­pe­rience and en­joy. Then, copy the frag­ment of the simu­la­tion cov­er­ing the ex­pe­rienc­ing of that emo­tion, and du­pli­cate it end­lessly.

True just if your sum­mum bonum is ex­actly an ag­gre­gate of mo­ments of hap­piness ex­pe­rienced.

I take the po­si­tion that it is not.

I don’t think one even has to re­sort to a po­si­tion like “only one copy counts”.

• True, but that’s then strik­ing more at the heart of Bostrom’s ar­gu­ment, rather than my counter-ar­gu­ment, which was just flip­ping Bostrom around. (Un­less your sum­mum malum is sig­nifi­cantly differ­ent, such that du­pli­cate tor­tures and du­pli­cate good-things-equiv­a­lent-to-tor­ture-in-emo­tional-effect still sum differ­ently?)

• His first, and seem­ingly most com­pel­ling, ar­gu­ment for Du­pli­ca­tion over Unifi­ca­tion is that, as­sum­ing an in­finite uni­verse, it’s cer­tain (with prob­a­bil­ity 1) that there is already an iden­ti­cal por­tion of the uni­verse where you’re tor­tur­ing the per­son in front of you. Given Unifi­ca­tion, it’s mean­ingless to dis­t­in­guish be­tween that por­tion and this por­tion, given their phys­i­cal iden­ti­cal­ness, so tor­tur­ing the per­son is morally blame­less, as you’re not in­creas­ing the num­ber of unique ob­servers be­ing tor­tured.

I’d ar­gue that the tor­ture por­tion is not iden­ti­cal to the not-tor­ture por­tion and that the differ­ence is caused by at least one event in the com­mon prior his­tory of both por­tions of the uni­verse where they di­verged. Unifi­ca­tion only makes coun­ter­fac­tual wor­lds real; it does not cause ev­ery agent to ex­pe­rience ev­ery coun­ter­fac­tual world. Agents are differ­en­ti­ated by the choices they make and agents who perform tor­ture are not the same agents as those who ab­stain from tor­ture. The differ­ence can be made ar­bi­trar­ily small, for in­stance by choos­ing an agent with a 50% prob­a­bil­ity of com­mit­ting tor­ture based on the out­come of a quan­tum coin flip, but the moral ques­tion in that case is why an agent would choose to be­come 50% likely to com­mit tor­ture in the first place. Some coun­ter­fac­tual agents will choose to be­come 50% likely to com­mit tor­ture, but they will be very differ­ent than the agents who are 1% likely to com­mit tor­ture.

• I think you’re in­ter­pret­ing Bostrom slightly wrong. You seem to be read­ing his ar­gu­ment (or per­haps just my short dis­til­la­tion of it) as ar­gu­ing that you’re not cur­rently tor­tur­ing some­one, but there’s an iden­ti­cal sec­tion of the uni­verse el­se­where where you are tor­tur­ing some­one, so you might as well start tor­tur­ing now.

As you note, that’s con­tra­dic­tory—if you’re not cur­rently tor­tur­ing, then your sec­tion of the uni­verse must not be iden­ti­cal to the sec­tion where the you-copy is tor­tur­ing.

In­stead, as­sume that you are cur­rently tor­tur­ing some­one. Bostrom’s ar­gu­ment is that you’re not mak­ing the uni­verse worse, be­cause there’s a you-copy which is tor­tur­ing an iden­ti­cal per­son el­se­where in the uni­verse. At most one of your copies is ca­pa­ble of tak­ing blame for this; the rest are just run­ning the same calcu­la­tions “a sec­ond time”, so to say. (Or at least, that’s what he’s ar­gu­ing that Unifi­ca­tion would say, and us­ing this as a rea­son to re­ject it and turn to Du­pli­ca­tion, so each copy is morally cul­pable for caus­ing new suffer­ing.)

• I think it not un­likely that if we have a suc­cess­ful in­tel­li­gence ex­plo­sion and sub­se­quently dis­cover a way to build some­thing 4^^^^4-sized, then we will figure out a way to grow into it, one step at a time. This 4^^^^4-sized su­per­tran­shu­man mind then should be able to dis­crim­i­nate “in­ter­est­ing” from “bor­ing” 3^^^3-sized things. If you could con­vince the 4^^^^4-sized thing to write down a list of all non­bor­ing 3^^^3-sized things in its spare time, then you would have a for­mal way to say what an “in­ter­est­ing 3^^^3-sized thing” is, with de­scrip­tion length (the de­scrip­tion length of hu­man­ity = the de­scrip­tion length of our ac­tual uni­verse) + (the ad­di­tional de­scrip­tion length to give hu­man­ity ac­cess to a 4^^^^4-sized com­puter—which isn’t much be­cause ac­cess to a uni­ver­sal Tur­ing ma­chine would do the job and more).

Thus, I don’t think that it needs a 3^^^3-sized de­scrip­tion length to pick out in­ter­est­ing 3^^^3-sized minds.

• Me: Ac­tu­ally no. I’m just not sure I care as much about your 3↑↑↑3 simu­lated peo­ple as much as you think I do.

Mug­ger: So then, you think the prob­a­bil­ity that you should care as much about my 3↑↑↑3 simu­lated peo­ple as I thought you did is on the or­der of 1/​3↑↑↑3?

• After think­ing about it a bit more I de­cided that I ac­tu­ally do care about simu­lated peo­ple al­most ex­actly as the mug­ger thought I did.

• I’m not re­ally sure that I care about du­pli­cates that much.

Didn’t you feel sad when Yoona-939 was ter­mi­nated, or wish all hap­piness for Sonmi-451?

• All the other Yoona-939s were fine, right? And that Yoona-939 was ter­mi­nated quickly enough to pre­vent di­ver­gence, wasn’t she?

(my point is, you’re mak­ing it seem like you’re break­ing the de­gen­er­acy by la­bel­ing them. But their be­ing iden­ti­cal is deep)

• But now she’s… you know… now she’s… (wipes away tears) slightly less real.

• You hit pretty strong diminish­ing re­turns on ex­is­tence once you’ve hit the ‘at least one copy’ point.

• Clones aren’t du­pli­cates. They may have started out as du­pli­cates but they were not by the time the reader is in­tro­duced to them.

• de­scribe ar­bi­trar­ily large minds whose ex­is­tence I should care about with­out us­ing up a lot of complexity

Benja’s method is bet­ter and more clearly right, but here’s an­other in­ter­est­ing one. Start from me now. At ev­ery fu­ture mo­ment when there are two pos­si­ble valuable next ex­pe­riences for me, make two copies of me, have the first ex­pe­rience one and the sec­ond the other. Allow me to grow if it’s valuable. Con­tinue branch­ing and grow­ing un­til 3^^^3 be­ings have been gen­er­ated.

• it’s not at all ob­vi­ous to me that you can de­scribe ar­bi­trar­ily large minds whose ex­is­tence I should care about with­out us­ing up a lot of com­plex­ity.

“The kind of mind I would like to grow into if I had 3^^^3 years”

• I agree with most of this. I think it is plau­si­ble that the value of a sce­nario is in some sense up­per-bounded by its de­scrip­tion length, so that we need on the or­der of googol­plex bits to de­scribe a googol­plex of value.

We can sep­a­rately ask if this solves the prob­lem. One may want a the­ory which solves the prob­lem re­gard­less of util­ity func­tion; or, aiming lower, one may be satis­fied to find a class of util­ity func­tions which seem to cap­ture hu­man in­tu­ition well enough.

• Up­per-bound­ing util­ity by de­scrip­tion com­plex­ity doesn’t ac­tu­ally cap­ture the in­tu­ition, since a sim­ple uni­verse could give rise to many com­plex minds.

• This post has not at all mi­s­un­der­stood my sug­ges­tion from long ago, though I don’t think I thought about it very much at the time. I agree with the thrust of the post that a lev­er­age fac­tor seems to deal with the ba­sic prob­lem, though of course I’m also some­what ex­pect­ing more sce­nar­ios to be pro­posed to up­set the ap­par­ent re­s­olu­tion soon.

• A sim­plified ver­sion of the ar­gu­ment here:

• The util­ity func­tion isn’t up for grabs.

• There­fore, we need un­bounded util­ity.

• Oops! If we al­low un­bounded util­ity, we can get non-con­ver­gence in our ex­pec­ta­tion.

• Since we’ve already es­tab­lished that the util­ity func­tion is not up for grabs, let’s try and mod­ify the prob­a­bil­ity to fix this!

My re­sponse to this is that the prob­a­bil­ity dis­tri­bu­tion is even less up for grabs. The util­ity, at least, is ex­plic­itly there to re­flect our prefer­ences. If we see that a util­ity func­tion is caus­ing our agent to take the wrong ac­tions, then it makes sense to change it to bet­ter re­flect the ac­tions we wish our agent to take.

The prob­a­bil­ity dis­tri­bu­tion, on the other hand, is a map that should re­flect the ter­ri­tory as well as pos­si­ble! It should not be mod­ified on ac­count of badly-be­haved util­ity com­pu­ta­tions.

This may be taken as an ar­gu­ment in fa­vor of mod­ify­ing the util­ity func­tion; Sniffnoy makes a case for bounded util­ity in an­other com­ment.

It could al­ter­na­tively be taken as a case for mod­ify­ing the de­ci­sion pro­ce­dure. Per­haps nei­ther the prob­a­bil­ity nor the util­ity are “up for grabs”, but how we use them should be mod­ified.

One (some­what crazy) op­tion is to take the me­dian ex­pec­ta­tion rather than the mean ex­pec­ta­tion: we judge ac­tions by com­put­ing the low­est util­ity score that we have 50% chance of mak­ing or beat­ing, rather than by com­put­ing the av­er­age. This makes the com­pu­ta­tion in­sen­si­tive to ex­treme (high or low) out­comes with small prob­a­bil­ities. Un­for­tu­nately, it also makes the com­pu­ta­tion in­sen­si­tive to ex­treme (high or low) op­tions with 49% prob­a­bil­ities: it would pre­fer a gam­ble with a 49% prob­a­bil­ity of util­ity −3^^^3 and 51% prob­a­bil­ity of util­ity +1, to a gam­ble with 51% prob­a­bil­ity of util­ity 0, and 49% prob­a­bil­ity of +3^^^3.

But per­haps there are more well-mo­ti­vated al­ter­na­tives.

• If we see that a util­ity func­tion is caus­ing our agent to take the wrong ac­tions, then it makes sense to change it to bet­ter re­flect the ac­tions we wish our agent to take.

If the agent defines its util­ity in­di­rectly in terms of de­signer’s prefer­ence, a dis­agree­ment in eval­u­a­tion of a de­ci­sion by agent’s util­ity func­tion and de­signer’s prefer­ence doesn’t eas­ily in­di­cate that de­signer’s eval­u­a­tion is more ac­cu­rate, and if it’s not, then the de­signer should defer to the agent’s judg­ment in­stead of ad­just­ing its util­ity.

The prob­a­bil­ity dis­tri­bu­tion, on the other hand, is a map that should re­flect the ter­ri­tory as well as pos­si­ble! It should not be mod­ified on ac­count of badly-be­haved util­ity com­pu­ta­tions.

Similarly, if the agent is good at build­ing its map, it might have a bet­ter map than the de­signer, so a dis­agree­ment is not eas­ily re­solved in fa­vor of the de­signer. On the other hand, there can be a bug in agent’s world mod­el­ing code in which case it should be fixed! And similarly, if there is a bug in agent’s in­di­rect util­ity defi­ni­tion, it too should be fixed. The ar­gu­ments seem analo­gous to me, so why would prefer­ence be more eas­ily de­bugged than world model?

• My re­sponse to this is that the prob­a­bil­ity dis­tri­bu­tion is even less up for grabs.

Really? In prac­tice I have a great deal of un­cer­tainty about both my util­ity func­tion and my prob­a­bil­ity es­ti­mates. Ac­cu­rate prob­a­bil­ity es­ti­mates re­quire the abil­ity to ac­cu­rately model the world, and this seems in­cred­ibly hard in gen­eral. It’s not at all clear to me that in­stru­men­tal ra­tio­nal­ity means trust­ing your cur­rent prob­a­bil­ity es­ti­mates if you have rea­son to be­lieve that fu­ture ev­i­dence will dras­ti­cally change them or that they’re cor­rupted for some other rea­son (even an oth­er­wise flawlessly de­signed AI has to worry about cos­mic rays flip­ping the bits in its mem­ory or, Omega for­bid, its source code).

• I am definitely not say­ing “trust your cur­rent prob­a­bil­ity es­ti­mates”.

What I’m say­ing is that prob­a­bil­ity should re­flect re­al­ity as closely as pos­si­ble, whereas util­ity should re­flect prefer­ences as closely as pos­si­ble.

Mod­ify­ing the prefer­ence func­tion in an ad-hoc way to get the right be­hav­ior is a bad idea, but mod­ify­ing our ex­pec­ta­tion about how re­al­ity ac­tu­ally might be is even worse. The prob­a­bil­ity func­tion should be mod­ified ex­clu­sively in re­sponse to con­sid­er­a­tions about how re­al­ity might be. The util­ity func­tion should be mod­ified ex­clu­sively in re­sponse to con­sid­er­a­tions about our prefer­ences.

• Hm, a lin­ear “lev­er­age penalty” sounds an awful lot like adding the com­plex­ity of lo­cat­ing you of the pool of pos­si­bil­ities to the to­tal com­plex­ity.

Thing 2: con­sider the case of the other peo­ple on that street when the Pas­cal’s Mug­gle-ing hap­pens. Sup­pose they could over­hear what is be­ing said. Since they have no lev­er­age of their own, are they free to as­sign a high prob­a­bil­ity to the mug­gle helping 3^^^3 peo­ple? Do a few of them start for­ward to in­terfere, only to be held back by the cooler heads who re­al­ize that all who in­terfere will sud­denly have the prob­a­bil­ity of suc­cess re­duced by a fac­tor of 3^^^3?

• This is in­deed a good ar­gu­ment for view­ing the lev­er­age penalty as a spe­cial case of a lo­ca­tional penalty (which I think is more or less what Han­son pro­posed to be­gin with).

• Sup­pose we had a planet of 3^^^3 peo­ple (their uni­verse has novel phys­i­cal laws). There is a planet-wide lot­tery. Cather­ine wins. There was a 1/​3^^^3 chance of this hap­pen­ing. The lotto rep­re­sen­ta­tive comes up to her and asks her to hand over her ID card for ver­ifi­ca­tion.

All over the planet, as a fun prank, a small pro­por­tion of peo­ple have been dress­ing up as lotto rep­re­sen­ta­tives and run­ning away with peo­ples’ ID cards. This is very rare—only one per­son in 3^^3 does this to­day.

If the lot­tery prize is 3^^3 times bet­ter than get­ting your ID card stolen, should Cather­ine trust the lotto offi­cial? No, be­cause there are 3^^^3/​3^^3 pranksters, and only 1 real offi­cial, and 3^^^3/​3^^3 is 3^^(3^^3 − 3), which is a whole lot of pranksters. She hangs on to her card, and doesn’t get the prize. Maybe if the re­ward were 3^^^3 times greater than the penalty, we could fi­nally get some lot­tery win­ners to ac­tu­ally col­lect their win­nings.

All of which is to say, I don’t think there’s any lo­ca­tional penalty—the crowd near the mug­gle should have ex­actly the same prob­a­bil­ity as­sign­ments as her, just as the crowd near Cather­ine has the same prob­a­bil­ity as­sign­ments as her about whether this is a prankster or the real offi­cial. I think the penalty is the ra­tio of lotto offi­cials to pranksters (con­di­tional on a hy­poth­e­sis like “the lot­tery has taken place”). If the hy­poth­e­sis is clever, though, it could prob­a­bly evade this penalty (hy­poth­e­size a smaller pop­u­la­tion with a re­ward of 3^^^3 years of util­ity-satis­fac­tion, maybe, or 3^^^3 new peo­ple cre­ated), and so what in­tu­itively seems like a defense against pas­cal’s mug­ging may not be.

• Really? I was go­ing to say that the ar­gu­ment need not men­tion the mug­gle at all, since the mug­ger is also one per­son among 3^^^3.

• I have a prob­lem with call­ing this a “semi-open FAI prob­lem”, be­cause even if Eliezer’s pro­posed solu­tion turns out to be cor­rect, it’s still a wide open prob­lem to de­velop ar­gu­ments that can al­low us to be con­fi­dent enough in it to in­cor­po­rate it into an FAI de­sign. This would be true even if no­body can see any holes in it or have any bet­ter ideas, and dou­bly true given that some FAI re­searchers con­sider a differ­ent ap­proach (which as­sumes that there is no such thing as “re­al­ity-fluid”, that ev­ery­thing in the mul­ti­verse just ex­ists and as a mat­ter of prefer­ence we do not /​ can not care about all parts of it in equal mea­sure, #4 in this post) to be at least as plau­si­ble as Eliezer’s cur­rent ap­proach.

• You’re right. Edited.

• In my view, we could make act-based agents with­out an­swer­ing this or any similar ques­tions. So I’m much less in­ter­ested in an­swer­ing them then I used to be. (There are pos­si­ble ap­proaches that do have to an­swer all of these ques­tions, but at this point they seem very much less promis­ing to me.)

We’ve briefly dis­cussed this is­sue in the ab­stract, but I’m cu­ri­ous to get your take in a con­crete case. Does that seem right to you? Do you think that we need to un­der­stand is­sues like this one, and have con­fi­dence in that un­der­stand­ing, prior to build­ing pow­er­ful AI sys­tems?

• FAI de­signs that re­quire high con­fi­dence solu­tions to many philo­soph­i­cal prob­lems also do not seem very promis­ing to me at this point. I en­dorse look­ing for al­ter­na­tive ap­proaches.

I agree that act-based agents seem to re­quire fewer high con­fi­dence solu­tions to philo­soph­i­cal prob­lems. My main con­cern with act-based agents is that these de­signs will be in com­pe­ti­tion with fully au­tonomous AGIs (ei­ther al­ter­na­tive de­signs, or act-based agents that evolve into full au­ton­omy due to in­ad­e­quate care of their own­ers/​users) to colonize the uni­verse. The de­pen­dence on hu­mans and lack of full au­ton­omy in act-based agents seem likely to cause a sig­nifi­cant weak­ness in at least one cru­cial area of this com­pe­ti­tion, such as gen­eral speed/​effi­ciency/​cre­ativity, war­fare (con­ven­tional, cy­ber, psy­cholog­i­cal, biolog­i­cal, nano, etc.), co­op­er­a­tion/​co­or­di­na­tion, self-im­prove­ment, and space travel. So even if these agents turn out to be “safe”, I’m not op­ti­mistic that we “win” in the long run.

My own idea is to aim for FAI de­signs that can cor­rect their philo­soph­i­cal er­rors, au­tonomously, the same way that we hu­mans can. Ideally, we’d fully un­der­stand how hu­mans rea­son about philo­soph­i­cal prob­lems and how philos­o­phy nor­ma­tively ought to be done be­fore pro­gram­ming or teach­ing that to an AI. But re­al­is­ti­cally, due to time pres­sure, we might have to set­tle for some­thing sub­op­ti­mal like teach­ing through ex­am­ples of hu­man philo­soph­i­cal rea­son­ing. Of course there’s lots of ways for this kind of AI to go wrong as well, so I also con­sider it to be a long shot.

Do you think that we need to un­der­stand is­sues like this one, and have con­fi­dence in that un­der­stand­ing, prior to build­ing pow­er­ful AI sys­tems?

Let me ask you a re­lated ques­tion. Sup­pose act-based de­signs are as suc­cess­ful as you ex­pect them to be. We still need to un­der­stand is­sues like the one de­scribed in Eliezer’s post (or solve the meta-prob­lem of un­der­stand­ing philo­soph­i­cal rea­son­ing) at some point, right? When do you think that will be? In other words, how much time do you think suc­cess­fully cre­at­ing act-based agents buys us?

• Sup­pose act-based de­signs are as suc­cess­ful as you ex­pect them to be.

It’s not so much that I have con­fi­dence in these ap­proaches, but that I think (1) they are the most nat­u­ral to ex­plore at the mo­ment, and (2) is­sues that seem like they can be cleanly avoided for these ap­proaches seem less likely to be fun­da­men­tal ob­struc­tions in gen­eral.

We still need to un­der­stand is­sues like the one de­scribed in Eliezer’s post (or solve the meta-prob­lem of un­der­stand­ing philo­soph­i­cal rea­son­ing) at some point, right? When do you think that will be?

When­ever such is­sues bear di­rectly on our de­ci­sion-mak­ing in such a way that mak­ing er­rors would be re­ally bad. For ex­am­ple, when we en­counter a situ­a­tion where we face a small prob­a­bil­ity of a very large pay­off, then it mat­ters how well we un­der­stand the par­tic­u­lar trade­off at hand. The goal /​ best case is that the de­vel­op­ment of AI doesn’t de­pend on sort­ing out these kinds of con­sid­er­a­tions for its own sake, only in­so­far as the AI has to ac­tu­ally make crit­i­cal choices that de­pend on these con­sid­er­a­tions.

The de­pen­dence on hu­mans and lack of full au­ton­omy in act-based agents seem likely to cause a sig­nifi­cant weak­ness in at least one cru­cial area of this com­pe­ti­tion,

I wrote a lit­tle bit about effi­ciency here. I don’t see why an ap­proval-di­rected agent would be at a se­ri­ous dis­ad­van­tage com­pared to an RL agent (though I do see why an imi­ta­tion learner would be at a dis­ad­van­tage by de­fault, and why an ap­proval-di­rected agent may be un­satis­fy­ing from a safety per­spec­tive for non-philo­soph­i­cal rea­sons).

Ideally you would syn­the­size data in ad­vance in or­der to op­er­ate with­out ac­cess to coun­ter­fac­tual hu­man feed­back at run­time—it’s not clear if this is pos­si­ble, but it seems at least plau­si­ble. But it’s also not clear to me it is nec­es­sary, as long as we can tol­er­ate very mod­est (<1%) over­head from over­sight.

Of course if such a pe­riod goes on long enough then it will be a prob­lem, but that is a slow-burn­ing prob­lem that a su­per­in­tel­li­gent civ­i­liza­tion can ad­dress at its leisure. In terms of tech­ni­cal solu­tions, any­thing we can think of now will eas­ily be thought of in this fu­ture sce­nario. It seems like the only thing we re­ally lose is the op­tion of tech­nolog­i­cal re­lin­quish­ment or se­ri­ous slow-down, which don’t look very at­trac­tive/​fea­si­ble at the mo­ment.

• The goal /​ best case is that the de­vel­op­ment of AI doesn’t de­pend on sort­ing out these kinds of con­sid­er­a­tions for its own sake, only in­so­far as the AI has to ac­tu­ally make crit­i­cal choices that de­pend on these con­sid­er­a­tions.

Isn’t a cru­cial con­sid­er­a­tion here how soon af­ter the de­vel­op­ment of AI they will be faced with such choices? If the an­swer is “soon” then it seems that we should try to solve the prob­lems ahead of time or try to de­lay AI. What’s your es­ti­mate? And what do you think the first such choices will be?

• What’s your es­ti­mate? And what do you think the first such choices will be?

I think that we are fac­ing some is­sues all of the time (e.g. some of these ques­tions prob­a­bly bear on “how much should we pri­ori­tize fast tech­nolog­i­cal de­vel­op­ment?” or “how con­cerned should we be with physics dis­asters?” or so on), but that it will be a long time be­fore we face re­ally big ex­pected costs from get­ting these wrong. My best guess is that we will get to do many-cen­turies-of-cur­rent-hu­man­ity worth of think­ing be­fore we re­ally need to get any of these ques­tions right.

I don’t have a clear sense of what the first choices will be. My view is largely com­ing from not see­ing any se­ri­ous can­di­dates for crit­i­cal choices.

Any­thing to do with ex­pan­sion into space looks like it will be very far away in sub­jec­tive time (though per­haps not far in cal­en­dar time). Maybe there is some stuff with simu­la­tions, or value drift, but nei­ther of those look very big in ex­pec­ta­tion for now. Maybe all of these is­sues to­gether make 5% differ­ence in ex­pec­ta­tion over the next few hun­dred sub­jec­tive is­sues? (Though this is a pretty un­sta­ble es­ti­mate.)

• How did you ar­rive at the con­clu­sion that we’re not fac­ing big ex­pected costs with these ques­tions? It seems to me that for ex­am­ple the con­struc­tion of large nu­clear ar­se­nals and lack of suffi­cient safe­guards against nu­clear war has already caused a large ex­pected cost, and may have been based on one or more in­cor­rect philo­soph­i­cal un­der­stand­ings (e.g., to the ques­tion of, what is the right amount of con­cern for dis­tant strangers and fu­ture peo­ple). Similarly with “how much should we pri­ori­tize fast tech­nolog­i­cal de­vel­op­ment?” But this is just from in­tu­ition since I don’t re­ally know how to com­pute ex­pected costs when the un­cer­tain­ties in­volved have a large moral or nor­ma­tive com­po­nent.

My best guess is that we will get to do many-cen­turies-of-cur­rent-hu­man­ity worth of think­ing be­fore we re­ally need to get any of these ques­tions right.

Do you ex­pect tech­nolog­i­cal de­vel­op­ment to have plateaued by then (i.e., AIs will have in­vented es­sen­tially all tech­nolo­gies fea­si­ble in this uni­verse)? If so, do you think there won’t be any tech­nolo­gies among them that would let some group of peo­ple/​AIs unilat­er­ally al­ter the fu­ture of the uni­verse ac­cord­ing to their un­der­stand­ing of what is nor­ma­tive? (For ex­am­ple, in­ten­tion­ally or ac­ci­den­tally de­stroy civ­i­liza­tion, or win a de­ci­sive war against the rest of the world.) Or do you think some­thing like a world gov­ern­ment will have been cre­ated to con­trol the use of such tech­nolo­gies?

• How did you ar­rive at the con­clu­sion that we’re not fac­ing big ex­pected costs with these ques­tions?

There are lots of things we don’t know, and my de­fault pre­sump­tion is for er­rors to be non-as­tro­nom­i­cally-costly, un­til there are ar­gu­ments oth­er­wise.

I agree that philo­soph­i­cal prob­lems have some stronger claim to caus­ing as­tro­nom­i­cal dam­age, and so I am more scared of philo­soph­i­cal er­rors than e.g. our lack of effec­tive pub­lic policy, our weak co­or­di­na­tion mechanisms, global warm­ing, the dis­mal state of com­puter se­cu­rity.

But I don’t see re­ally strong ar­gu­ments for philo­soph­i­cal er­rors caus­ing great dam­age, and so I’m skep­ti­cal that we are fac­ing big ex­pected costs (big com­pared to the biggest costs we can iden­tify and in­ter­vene on, amongst them AI safety).

That is, there seems to be a pretty good case that AI may be built soon, and that we lack the un­der­stand­ing to build AI sys­tems that do what we want, that we will nev­er­the­less build AI sys­tems to help us get what we want in the short term, and that in the long run this will rad­i­cally re­duce the value of the uni­verse. The cases for philo­soph­i­cal er­rors caus­ing dam­age are over­all much more spec­u­la­tive, have lower stakes, and are less ur­gent.

the con­struc­tion of large nu­clear ar­se­nals and lack of suffi­cient safe­guards against nu­clear war has already caused a large ex­pected cost, and may have been based on one or more in­cor­rect philo­soph­i­cal understandings

I agree that philo­soph­i­cal progress would very slightly de­crease the prob­a­bil­ity of nu­clear trou­ble, but this looks like a very small effect. (Orders of mag­ni­tude smaller than the effects from say in­creased global peace and sta­bil­ity, which I’d prob­a­bly list as a higher pri­or­ity right now than re­solv­ing philo­soph­i­cal un­cer­tainty.) It’s pos­si­ble we dis­agree about the me­chan­ics of this par­tic­u­lar situ­a­tion.

Do you ex­pect tech­nolog­i­cal de­vel­op­ment to have plateaued by then (i.e., AIs will have in­vented es­sen­tially all tech­nolo­gies fea­si­ble in this uni­verse)?

No. I think that 200 years of sub­jec­tive time prob­a­bly amounts 5-10 more dou­blings of the econ­omy, and that tech­nolog­i­cal change is a plau­si­ble rea­son that philo­soph­i­cal er­ror would even­tu­ally be­come catas­trophic.

I said “best guess” but this re­ally is a pretty wild guess about the rele­vant timescales.

in­ten­tion­ally or ac­ci­den­tally de­stroy civilization

As with the spe­cial case of nu­clear weapons, I think that philo­soph­i­cal er­ror is a rel­a­tively small in­put into world-de­struc­tion.

win a de­ci­sive war against the rest of the world

I don’t ex­pect this to cause philo­soph­i­cal er­rors to be­come catas­trophic. I guess the con­cern is that the war will be won by some­one who doesn’t much care about the fu­ture, thereby in­creas­ing the prob­a­bil­ity that re­sources are con­trol­led by some­one who prefers not un­dergo any fur­ther re­flec­tion? I’m will­ing to talk about this sce­nario more, but at face value the prospect of a de­ci­sive mil­i­tary vic­tory wouldn’t bump philo­soph­i­cal er­ror above AI risk as a con­cern for me.

I’m open to end­ing up with a more pes­simistic view about the con­se­quences of philo­soph­i­cal er­ror, ei­ther by think­ing through more pos­si­ble sce­nar­ios in which it causes dam­age or by con­sid­er­ing more ab­stract ar­gu­ments.

But if I end up with a view more like yours, I don’t know if it would change my view on AI safety. It still feels like the AI con­trol prob­lem is a differ­ent is­sue which can be con­sid­ered sep­a­rately.

• How does this style of rea­son­ing work on some­thing more like the origi­nal Pas­cal’s Wager prob­lem?

Sup­pose a (to all ap­pear­ances) perfectly or­di­nary per­son goes on TV and says “I am an avatar of the Dark Lords of the Ma­trix. Please send me \$5. When I shut down the simu­la­tion in a few months, I will sub­ject those who send me the money to [LARGE NUMBER] years of hap­piness, and those who do not to [LARGE NUMBER] years of pain”.

Here you can’t solve the prob­lem by point­ing out the very large num­bers of peo­ple in­volved, be­cause there aren’t very high num­bers of peo­ple in­volved. Your prob­a­bil­ity should de­pend only on your prob­a­bil­ity that this is a simu­la­tion, your prob­a­bil­ity that the simu­la­tors would make a weird re­quest like this, and your prob­a­bil­ity that this per­son’s spe­cific weird re­quest is likely to be it. None of these num­bers help you get down to a 1/​[LARGE NUMBER] level.

I’ve avoided say­ing 3^^^3, be­cause maybe there’s some fun­da­men­tal con­straint on com­put­ing power that makes it im­pos­si­ble for simu­la­tors to simu­late 3^^^3 years of hap­piness in any amount of time they might con­ceiv­ably be will­ing to ded­i­cate to the prob­lem. But they might be able to simu­late some num­ber of years large enough to out­weigh our prior against any given weird re­quest com­ing from the Dark Lords of the Ma­trix.

(also, it seems less than 3^^^3-level cer­tain that there’s no clever trick to get effec­tively in­finite com­put­ing power or effec­tively in­finite com­put­ing time, like the sub­strate­less com­pu­ta­tion in Per­mu­ta­tion City)

• When we jump to the ver­sion in­volv­ing causal nodes hav­ing Large lev­er­age over other nodes in a graph, there aren’t Large num­bers of dis­tinct peo­ple in­volved, but there’s Large num­bers of life-cen­turies in­volved and those mo­ments of thought and life have to be in­stan­ti­ated by causal nodes.

(also, it seems less than 3^^^3-level cer­tain that there’s no clever trick to get effec­tively in­finite com­put­ing power or effec­tively in­finite com­put­ing time, like the sub­strate­less com­pu­ta­tion in Per­mu­ta­tion City)

In­finity makes my calcu­la­tions break down and cry, at least at the mo­ment.

• Imag­ine some­one makes the fol­low­ing claims:

• I’ve in­vented an im­mor­tal­ity drug

• I’ve in­vented a near-light-speed spaceship

• The space­ship has re­ally good life sup­port/​recycling

• The space­ship is self-re­pairing and draws power from in­ter­stel­lar hydrogen

• I’ve dis­cov­ered the Uni­verse will last at least an­other 3^^^3 years

Then they threaten, un­less you give them \$5, to kid­nap you, give you the im­mor­tal­ity drug, stick you in the space­ship, launch it at near-light speed, and have you stuck (pre­sum­ably bound in an un­com­fortable po­si­tion) in the space­ship for the 3^^^3 years the uni­verse will last.

(okay, there are lots of con­tin­gent fea­tures of the uni­verse that will make this not work, but imag­ine some­thing bet­ter. Pocket di­men­sion, maybe?)

If their claims are true, then their threat seems cred­ible even though it in­volves a large amount of suffer­ing. Can you ex­plain what you mean by life-cen­turies be­ing in­stan­ti­ated by causal nodes, and how that makes the mad­man’s threat less cred­ible?

• Are you sure it wouldn’t be ra­tio­nal to pay up? I mean, if the guy looks like he could do that for \$5, I’d rather not take chances. If you pay, and it turns out he didn’t have all that equip­ment for tor­ture, you could just sue him and get that \$5 back, since he defrauded you. If he starts mak­ing up rules about how you can never ever tell any­one else about this, or later check val­idity of his claim or he’ll kid­nap you, you should, for game-the­o­ret­i­cal rea­sons not abide, since be­ing the kinda agent that ac­cepts those terms makes you valid tar­get for such frauds. Rea­sons for not abid­ing be­ing the same as for sin­gle-box­ing.

• If what he says is true, then there will be 3^^^3 years of life in the uni­verse. Then, as­sum­ing this an­thropic frame­work is cor­rect, it’s very un­likely to find your­self at the be­gin­ning rather than at any other point in time, so this pro­vides 3^^^3-sized ev­i­dence against this sce­nario.

• I’m not en­tirely sure that the dooms­day ar­gu­ment also ap­plies to differ­ent time slices of the same per­son, given that Eliezer in 2013 re­mem­bers be­ing Eliezer in 2012 but not vice versa.

• The space­ship has re­ally good life sup­port/​re­cy­cling The space­ship is self-re­pairing and draws power from in­ter­stel­lar hydrogen

That re­quires a MTTF of 3^^^3 years, or a per-year prob­a­bil­ity of failure of roughly 1/​3^^^3.

I’ve dis­cov­ered the Uni­verse will last at least an­other 3^^^3 years

This im­plies that phys­i­cal prop­er­ties like the cos­molog­i­cal con­stant and the half-life of pro­tons can be mea­sured to a pre­ci­sion of roughly 1/​3^^^3 rel­a­tive er­ror.

To me it seems like both of those claims have prior prob­a­bil­ity ~ 1/​3^^^3. (How many space­ships would you have to build and how long would you have to test them to get an MTTF es­ti­mate as large as 3^^^3? How many mea­sure­ments do you have to make to get the stan­dard de­vi­a­tion be­low 1/​3^^^3?)

• Say the be­ing that suffers for 3^^^3 sec­onds is morally rele­vant but not in the same ob­server mo­ment refer­ence class as hu­mans for some rea­son. (IIRC putting all pos­si­ble ob­servers in the same refer­ence class leads to bizarre con­clu­sions...? I can’t im­me­di­ately re-de­rive why that would be.) But any­way it re­ally seems that the mag­i­cal causal juice is the im­por­tant thing here, not the an­thropic/​ex­pe­ri­en­tial na­ture or lack thereof of the highly-causal nodes, in which case the an­thropic solu­tion isn’t quite hug­ging the real query.

• IIRC putting all pos­si­ble ob­servers in the same refer­ence class leads to bizarre con­clu­sions...? I can’t im­me­di­ately re-de­rive why that would be.

The only rea­son that I have ever thought of is that our refer­ence class should in­tu­itively con­sist of only sen­tient be­ings, but that non­sen­tient be­ings should still be able to rea­son. Is this what you were think­ing of? Whether it ap­plies in a given con­text may de­pend on what ex­actly you mean by a refer­ence class in that con­text.

• If it can rea­son but isn’t sen­tient then it maybe doesn’t have “ob­server” mo­ments, and maybe isn’t it­self morally rele­vant—Eliezer seems to think that way any­way. I’ve been try­ing some­thing like, maybe mess­ing with the non-sen­tient ob­server has a 3^^^3 utilon effect on hu­man util­ity some­how, but that seems psy­cholog­i­cally-ar­chi­tec­turally im­pos­si­ble for hu­mans in a way that might end up be­ing fun­da­men­tal. (Like, you ei­ther have to make 3^^^3 hu­mans, which defeats the pur­pose of the ar­gu­ment, or make a sin­gle hu­man have a 3^^^3 times bet­ter life with­out length­en­ing it, which seems im­pos­si­ble.) Over­all I’m hav­ing a re­ally sur­pris­ing amount of difficulty think­ing up an ex­am­ple where you have a lot of causal im­por­tance but no an­thropic counter-ev­i­dence.

Any­way, does “an­thropic” even re­ally have any­thing to do with qualia? The way peo­ple talk about it it clearly does, but I’m not sure it even shows up in the defi­ni­tion—a non-sen­tient op­ti­mizer could to­tally make an­thropic up­dates. (That said I guess Hofs­tadter and other strange loop func­tion­al­ists would dis­agree.) Have I just been wrongly as­sum­ing that ev­ery­one else was in­clud­ing “qualia” as fun­da­men­tal to an­throp­ics?

• Yeah, this whole line of rea­son­ing fails if you can get to 3^^^3 utilons with­out cre­at­ing ~3^^^3 sen­tients to dis­tribute them among.

Over­all I’m hav­ing a re­ally sur­pris­ing amount of difficulty think­ing up an ex­am­ple where you have a lot of causal im­por­tance but no an­thropic counter-ev­i­dence.

I’m not sure what you mean. If you use an an­thropic the­ory like what Eliezer is us­ing here (e.g. SSA, UDASSA) then an amount of causal im­por­tance that is large com­pared to the rest of your refer­ence class im­plies few similar mem­bers of the refer­ence class, which is an­thropic counter-ev­i­dence, so of course it would be im­pos­si­ble to think of an ex­am­ple. Even if non­sen­tients can con­tribute to util­ity, if I can cre­ate 3^^^3 utilons us­ing non­sen­tients, than some other peo­ple prob­a­bly can to, so I don’t have a lot of causal im­por­tance com­pared to them.

Any­way, does “an­thropic” even re­ally have any­thing to do with qualia? The way peo­ple talk about it it clearly does, but I’m not sure it even shows up in the defi­ni­tion—a non-sen­tient op­ti­mizer could to­tally make an­thropic up­dates.

This is the con­tra­pos­i­tive of the grand­par­ent. I was say­ing that if we as­sume that the refer­ence class is sen­tients, then non­sen­tients need to rea­son us­ing differ­ent rules i.e. a differ­ent refer­ence class. You are say­ing that if non­sen­tients should rea­son us­ing the same rules, then the refer­ence class can­not com­prise only sen­tients. I ac­tu­ally agree with the lat­ter much more strongly, and I only brought up the former be­cause it seemed similar to the ar­gu­ment you were try­ing to re­mem­ber.

There are re­ally two sep­a­rate ques­tions here, that of how to rea­son an­throp­i­cally and that of how magic re­al­ity-fluid is dis­tributed. Con­fus­ing these is com­mon, since the same sort of con­sid­er­a­tions af­fect both of them and since they are both badly un­der­stood, though I would say that due to UDT/​ADT, we now un­der­stand the former much bet­ter, while ac­knowl­edg­ing the pos­si­bil­ity of un­known un­knowns. (Our cur­rent state of knowl­edge where we con­fuse these ac­tu­ally feels a lot like peo­ple who have never learnt to sep­a­rate the de­scrip­tive and the nor­ma­tive.)

The way Eliezer pre­sented things in the post, it is not en­tirely clear which of the two he meant to be re­spon­si­ble for the lev­er­age penalty. It seems like he meant for it to be an epistemic con­sid­er­a­tion due to an­thropic rea­son­ing, but this seems ob­vi­ously wrong given UDT. In the Teg­mark IV model that he de­scribes, the lev­er­age penalty is caused by re­al­ity-fluid, but it seems like he only in­tended that as an anal­ogy. It seems a lot more prob­a­ble to me though, and it is pos­si­ble that Eliezer would ex­press un­cer­tainty as to whether the lev­er­age penalty is ac­tu­ally caused by re­al­ity-fluid, so that it is a bit more than an anal­ogy. There is also a third math­e­mat­i­cally equiv­a­lent pos­si­bil­ity where the lev­er­age penalty is about val­ues, and we just care less about in­di­vi­d­ual peo­ple when there are more of them, but Eliezer ob­vi­ously does not hold that view.

• I’m not sure what you mean. If you use an an­thropic the­ory like what Eliezer is us­ing here (e.g. SSA, UDASSA)

A com­ment: it is not clear to me that Eliezer is in­tend­ing to use SSA or UDASSA here. The “magic re­al­ity fluid” mea­sure looks more like SIA, but with a prior based on Levin com­plex­ity rather than Kol­mogorov com­plex­ity—see my com­ment here. Or—in an equiv­a­lent for­mu­la­tion—he’s us­ing Kol­mogorov + SSA but with an ex­tremely broad “refer­ence class” (the class of all causal nodes, most of which aren’t ob­servers in any an­thropic sense). This is still not UDASSA.

To get some­thing like UDASSA, we shouldn’t dis­tribute the weight 2^-#p of each pro­gram p uniformly among its ex­e­cu­tion steps. In­stead we should con­sider us­ing an­other pro­gram q to pick out an ex­e­cu­tion step or a se­quence of steps (i.e. a sub-pro­gram s) from p, and then give the com­bi­na­tion of q,p a weight 2^-(#p+#q). This means each sub-pro­gram s will get a to­tal prior weight of Sum {p, q: q(p) = s & s is a sub-pro­gram of p} 2^-(#p + #q).

When up­dat­ing on your ev­i­dence E, con­sider the class S(E) of all sub-pro­grams which cor­re­spond to an AI pro­gram hav­ing that ev­i­dence, and nor­mal­ize. The pos­te­rior prob­a­bil­ity you are in a par­tic­u­lar uni­verse p’ then be­comes pro­por­tional to Sum {q: q(p’) is a sub-pro­gram of p’ and a mem­ber of S(E)} 2^-(#p’ + #q).

This looks rather differ­ent to what I dis­cussed in my other com­ment, and it maybe han­dles an­thropic prob­lems a bit bet­ter. I can’t see there is any shift ei­ther to­wards very big uni­verses (no pre­sump­tu­ous philoso­pher) or to­wards dense com­pu­tro­n­ium uni­verses, where we are simu­la­tions. There does ap­pear to be a Great Filter or “Dooms­day” shift, since it is still a form of SSA, but this is miti­gated by the con­sid­er­a­tion that we may be part of a refer­ence class (pro­gram q) which prefer­en­tially se­lects pre-AI biolog­i­cal ob­servers, as op­posed to any old ob­servers.

• I agree with this; the ‘e.g.’ was meant to point to­ward the most similar the­o­ries that have names, not pin down ex­actly what Eliezer is do­ing here. I though that it would be bet­ter to re­fer to the class of similar the­o­ries here since there is enough un­cer­tainty that we don’t re­ally have de­tails.

• (As always, the term “mag­i­cal re­al­ity fluid” re­flects an at­tempt to de­mar­cate a philo­soph­i­cal area where I feel quite con­fused, and try to use cor­re­spond­ingly blatantly wrong ter­minol­ogy so that I do not mis­take my rea­son­ing about my con­fu­sion for a solu­tion.)

This seems like a re­ally use­ful strat­egy!

• Agreed—place­hold­ers and kludges should look like place­hold­ers and kludges. I be­came a hap­pier pro­gram­mer when I re­al­ised this, be­cause up un­til then I was always con­flicted about how much time I should spend mak­ing some un­satis­fy­ing piece of code look beau­tiful.

• Friendly neigh­bor­hood Ma­trix Lord check­ing in!

I’d like to apol­o­gize for the be­hav­ior of my friend in the hy­po­thet­i­cal. He likes to make illu­sory promises. You should re­al­ize that re­gard­less of what he may tell you, his choice of whether to hit the green but­ton is in­de­pen­dent of your choice of what to do with your \$5. He may hit the green but­ton and save 3↑↑↑3 lives, or he may not, at his whim. Your \$5 can not be re­li­ably ex­pected to in­fluence his de­ci­sion in any way you can pre­dict.

You are no doubt ac­cus­tomed to think­ing about en­force­able con­tracts be­tween par­ties, since those are a sta­ple of your game the­o­retic liter­a­ture as well as your sto­ry­tel­ling tra­di­tions. Often, your liter­a­ture omits the req­ui­site pre­con­di­tions for a bind­ing con­tract since they are im­plicit or taken for granted in typ­i­cal cases. Ma­trix Lords are highly atyp­i­cal coun­ter­par­ties, how­ever, and it would be a mis­take to carry over those as­sump­tions merely be­cause his state­ments re­sem­ble the syn­tac­tic form of an offer be­tween hu­mans.

Did my Ma­trix Lord friend (who you just met a few min­utes ago!) vol­un­teer to have his green save-the-mul­ti­tudes but­ton and your \$5 placed un­der the con­trol of a mu­tu­ally trust­wor­thy third party es­crow agent who will re­li­ably up­hold the stated bar­gain?

Alter­nately, if my Ma­trix Lord friend breaches his con­tract with you, is some­one Even More Pow­er­ful stand­ing by to forcibly rem­edy the non-perfor­mance?

Ab­sent ei­ther of the above con­di­tions, is my Ma­trix Lord friend par­ti­ci­pat­ing in an iter­ated trad­ing game wherein cheat­ing on to­day’s deal will sub­ject him to less at­trac­tive terms on fu­ture deals, such that the net pre­sent value of his fu­ture earn­ings would be diminished by more than the amount he can steal from you to­day?

Since none of these three crite­ria seem to ap­ply, there is no deal to be made here. The power asym­me­try en­ables him to do what­ever he feels like re­gard­less of your ac­tions, and he is just toy­ing with you! Do you re­ally think your \$5 means any­thing to him? He’ll spend it mak­ing 3↑↑↑3 pa­per­clips for all you know.

Your \$5 will not ex­ert any pre­dictable causal in­fluence on the fate of the hy­po­thet­i­cal 3↑↑↑3 Ma­trix Lord hostages. De­ci­sion the­ory doesn’t even be­gin to ap­ply.

You should stick to tak­ing boxes from Omega; at least she has an es­tab­lished rep­u­ta­tion for pay­ing out as promised.

• You should stick to tak­ing boxes from Omega; at least she has an es­tab­lished rep­u­ta­tion for pay­ing out as promised.

Caveat emp­tor, the boxes she gave me always were empty!

• I don’t at all think that this is cen­tral to the prob­lem, but I do think you’re equat­ing “bits” of sen­sory data with “bits” of ev­i­dence far too eas­ily. There is no law of prob­a­bil­ity the­ory that for­bids you from as­sign­ing prob­a­bil­ity 1/​3^^^3 to the next bit in your in­put stream be­ing a zero—so as far as prob­a­bil­ity the­ory is con­cerned, there is noth­ing wrong with re­ceiv­ing only one in­put bit and as a re­sult end­ing up be­liev­ing a hy­poth­e­sis that you as­signed prob­a­bil­ity 1/​3^^^3 be­fore.

Similarly, prob­a­bil­ity the­ory al­lows you to as­sign prior prob­a­bil­ity 1/​3^^^3 to see­ing the blue hole in the sky, and there­fore be­liev­ing the mug­ger af­ter see­ing it hap­pen any­way. This may not be a good thing to do on other prin­ci­ples, but prob­a­bil­ity the­ory does not for­bid it. ETA: In par­tic­u­lar, if you feel be­tween a rock and a bad place in terms of pos­si­ble solu­tions to Pas­cal’s Mug­gle, then you can at least con­sider as­sign­ing prob­a­bil­ities this way even if it doesn’t nor­mally seem like a good idea.

• There is no law of prob­a­bil­ity the­ory that for­bids you from as­sign­ing prob­a­bil­ity 1/​3^^^3 to the next bit in your in­put stream be­ing a zero

True, but it seems crazy to be that cer­tain about what you’ll see. It doesn’t seem that un­likely to hal­lu­ci­nate that hap­pen­ing. It doesn’t seem that un­likely for all the pho­tons and phonons to just hap­pen to con­verge in some pat­tern that makes it look and sound ex­actly like a Ma­trix Lord.

You’re ba­si­cally as­sum­ing that your sen­sory equip­ment is vastly more re­li­able than you have ev­i­dence to be­lieve, just be­cause you want to make sure that if you get a pos­i­tive, you won’t just as­sume it’s a false pos­i­tive.

• Ac­tu­ally, there is such a law. You can­not rea­son­ably start, when you are born into this world, naked, with­out any sen­sory ex­pe­riences, ex­pect­ing that the next bit you ex­pe­rience is much more likely to be 1 rather than 0. If you en­counter one hun­dred zillion bits and they all are 1, you still wouldn’t as­sign 1/​3^^^3 prob­a­bil­ity to next bit you see be­ing 0, if you’re ra­tio­nal enough.

Of course, this is mud­ded by the fact that you’re not born into this world with­out pri­ors and all kinds of stuff that weights on your shoulders. Evolu­tion has done billions of years worth of R&D on your pri­ors, to get them straight. How­ever, the gap these evolu­tion-set pri­ors would have to cross to get even close to that ab­surd 1/​3^^^3… It’s a the­o­ret­i­cal pos­si­bil­ity that’s by no stretch a re­al­is­tic one.

• Just thought of some­thing:

How sure are we that P(there are N peo­ple) is not at least as small as 1/​N for suffi­ciently large N, even with­out a lev­er­age penalty? The OP seems to be ar­gu­ing that the com­plex­ity penalty on the prior is in­suffi­cient to gen­er­ate this low prob­a­bil­ity, since it doesn’t take much ad­di­tional com­plex­ity to gen­er­ate sce­nar­ios with ar­bi­trar­ily more peo­ple. Yet it seems to me that af­ter some suffi­ciently large num­ber, P(there are N peo­ple) must drop faster than 1/​N. This is be­cause our prior must be nor­mal­ized. That is:

Sum(all non-nega­tive in­te­gers N) of P(there are N peo­ple) = 1.

If there was some in­te­ger M such that for all n > M, P(there are n peo­ple) >= 1/​n, the above sum would not con­verge. If we are to have a nor­mal­ized prior, there must be a faster-than-1/​N fal­loff to the func­tion P(there are N peo­ple).

In fact, if one de­mands that my pri­ors in­di­cate that my ex­pected av­er­age num­ber of peo­ple in the uni­verse/​mul­ti­verse is finite, then my pri­ors must diminish faster than 1/​N^2. (So that that the sum of N*P(there are N peo­ple) con­verges).

TL:DR If your pri­ors are such that the prob­a­bil­ity of there be­ing 3^^^3 peo­ple is not smaller than 1/​(3^^^3), then you don’t have a nor­mal­ized dis­tri­bu­tion of pri­ors. If your pri­ors are such that the prob­a­bil­ity of there be­ing 3^^^3 peo­ple is not smaller than 1/​((3^^^3)^2) then your ex­pected num­ber of peo­ple in the mul­ti­verse is di­ver­gent/​in­finite.

• Hm. Tech­ni­cally for EU differ­en­tials to con­verge we only need that the num­ber of peo­ple we ex­pect­edly af­fect sums to some­thing finite, but hav­ing a finite ex­pected num­ber of peo­ple ex­ist­ing in the mul­ti­verse would cer­tainly ac­com­plish that.

• The prob­lem is that the Solomonoff prior picks out 3^^^3 as much more likely than most of the num­bers of the same mag­ni­tude be­cause it has much lower Kol­mogorov com­plex­ity.

• I’m not fa­mil­iar with Kol­mogorov com­plex­ity, but isn’t the aparent sim­plic­ity of 3^^^3 just an ar­ti­fact of what no­ta­tion we hap­pen to have in­vented? I mean, “^^^” is not re­ally a ba­sic op­er­a­tion in ar­ith­metic. We have a nice com­pact way of de­scribing what steps are needed to get from a num­ber we in­tu­itively grok, 3, to 3^^^3, but I’m not sure it’s safe to say that makes it sim­ple in any sig­nifi­cant way. For one thing, what would make 3 a sim­ple num­ber in the first place?

• I’m not fa­mil­iar with Kol­mogorov com­plex­ity, but

In the nicest pos­si­ble way, shouldn’t you have stopped right there? Shouldn’t the ap­pear­ance of this un­fa­mil­iar and formidable-look­ing word have told you that I wasn’t ap­peal­ing to some in­tu­itive no­tion of com­plex­ity, but to a par­tic­u­lar for­mal­i­sa­tion that you would need to be fa­mil­iar with to challenge? If in­stead of com­ment­ing you’d Googled that term, you would have found the Wikipe­dia ar­ti­cle that an­swered this and your next ques­tion.

• You can as a rough es­ti­mate of the com­plex­ity of a num­ber take the amount of lines of the short­est pro­gram that would com­pute the num­ber from ba­sic op­er­a­tions. More for­mally, sub­sti­tute lines of a pro­gram with states of a Tur­ing Ma­chine.

• But what num­bers are you al­lowed to start with on the com­pu­ta­tion? Why can’t I say that, for ex­am­ple, 12,345,346,437,682,315,436 is one of the num­bers I can do com­pu­ta­tion from (as a start­ing point), and thus it has ex­tremely small com­plex­ity?

• You could say this—do­ing so would be like de­scribing your own lan­guage in which things in­volv­ing 12,345,346,437,682,315,436 can be ex­pressed con­cisely.

So Kol­mogorov com­plex­ity is some­what lan­guage-de­pen­dent. How­ever, given two lan­guages in which you can de­scribe num­bers, you can com­pute a con­stant such that the com­plex­ity of any num­ber is off by at most that con­stant be­tween the two lan­guages. (The con­stant is more or less the com­plex­ity of de­scribing one lan­guage in the other). So things aren’t ac­tu­ally too bad.

But if we’re just talk­ing about Tur­ing ma­chines, we pre­sum­ably ex­press num­bers in bi­nary, in which case writ­ing “3” can be done very eas­ily, and all you need to do to spec­ify 3^^^3 is to make a Tur­ing ma­chine com­put­ing ^^^.

• How­ever, given two lan­guages in which you can de­scribe num­bers, you can com­pute a con­stant such that the com­plex­ity of any num­ber is off by at most that con­stant be­tween the two lan­guages.

But can’t this con­stant it­self be ar­bi­trar­ily large when talk­ing about ar­bi­trary num­bers? (Of course, for any spe­cific num­ber, it is limited in size.)

• Well… Given any num­ber N, you can in prin­ci­ple in­vent a pro­gram­ming lan­guage where the pro­gram `do_it` out­puts N.

• The con­stant de­pends on the two lan­guages, but not on the num­ber. As army1987 points out, if you pick the num­ber first, and then make up lan­guages, then the differ­ence can be ar­bi­trar­ily large. (You could go in the other di­rec­tion as well: if your lan­guage speci­fies that no num­ber less than 3^^^3 can be en­tered as a con­stant, then it would prob­a­bly take ap­prox­i­mately log(3^^^3) bits to spec­ify even small num­bers like 1 or 2.)

But if you pick the lan­guages first, then you can com­pute a con­stant based on the lan­guages, such that for all num­bers, the op­ti­mal de­scrip­tion lengths in the two lan­guages differ by at most a con­stant.

• The con­text this in which this comes up here gen­er­ally re­quires some­thing like “there’s a way to com­pare the com­plex­ity of num­bers which always pro­duces the same re­sults in­de­pen­dent of lan­guage, ex­cept in a finite set of cases. Since that set is finite and my ar­gu­ment doesn’t de­pend on any spe­cific num­ber, I can always base my ar­gu­ment on a case that’s not in that set.”

If that’s how you’re us­ing it, then you don’t get to pick the lan­guages first.

• You do get to pick the lan­guages first be­cause there is a large but finite (say no more than 10^6) set of rea­son­able lan­guages-mod­ulo-triv­ial-de­tails that could form the ba­sis for such a mea­sure­ment.

• This is an awful lot of words to ex­pend to no­tice that

(1) So­cial in­ter­ac­tions need to be mod­eled in a game-the­o­retic set­ting, not straight­for­ward ex­pected payoff

(2) Distri­bu­tions of ex­pected val­ues mat­ter. (Hint: p(N) = 1/​N is a re­ally bad model as it doesn’t con­verge).

(3) Utility func­tions are nei­ther lin­ear nor sym­met­ric. (Hint: ex­tinc­tion is not sym­met­ric with dou­bling the pop­u­la­tion.)

(4) We don’t ac­tu­ally have an agreed-upon util­ity func­tion any­way; big num­bers plus a not-well-agreed-on fuzzy no­tion is a great way to pro­duce coun­ter­in­tu­itive re­sults. The de­tails don’t re­ally mat­ter; as fuzzy ap­proaches in­finity, you get non­in­tu­itive­ness.

It’s much more valuable to ad­dress some of these im­perfec­tions in the setup of the prob­lem than con­tin­u­ing to wade through the logic with bad as­sump­tions in hand.

• Re­lated: Would an AI con­clude it’s likely to be a Boltz­mann brain? ;)

• Every­one’s a Boltz­mann brain to some de­gree.

• Or even if the AI ex­pe­rienced an in­tel­li­gence ex­plo­sion the dan­ger is that it would not be­lieve it had re­ally be­come so im­por­tant be­cause the prior odds of you be­ing the most im­por­tant thing that will prob­a­bly ever ex­ist is so low.

Edit: The AI could note that it uses a lot more com­put­ing power than any other sen­tient and so give it­self an an­oth­ropic weight much greater than 1.

• With re­spect to this be­ing a “dan­ger,” don’t Boltz­mann brains have a de­ci­sion-the­o­retic weight of zero?

• Why zero? If you came to be­lieve there was a 99.99999% chance you are cur­rently dream­ing wouldn’t it effect your choices?

• Nick Beck­stead’s finished but as-yet un­pub­lished dis­ser­ta­tion has much to say on this topic. Here is Beck­stead’s sum­mary of chap­ters 6 and 7 of his dis­ser­ta­tion:

[My ar­gu­ment for the over­whelming im­por­tance of shap­ing the far fu­ture] asks us to be happy with hav­ing a very small prob­a­bil­ity of avert­ing an ex­is­ten­tial catas­tro­phe [or bring­ing about some other large, pos­i­tive “tra­jec­tory change”], on the grounds that the ex­pected value of do­ing so is ex­tremely enor­mous, even though there are more con­ven­tional ways of do­ing good which have a high prob­a­bil­ity of pro­duc­ing very good, but much less im­pres­sive, out­comes. Essen­tially, we’re asked to choose a long shot over a high prob­a­bil­ity of some­thing very good. In ex­treme cases, this can seem ir­ra­tional on the grounds that it’s in the same bal­l­park as ac­cept­ing a ver­sion of Pas­cal’s Wager.

In chap­ter 6, I make this worry more pre­cise and con­sider the costs and benefits of try­ing to avoid the prob­lem. When mak­ing de­ci­sions un­der risk, we make trade-offs be­tween how good out­comes might be and how likely it is that we get good out­comes. There are three gen­eral kinds of ways to make these trade­offs. On two of these ap­proaches, we try to max­i­mize ex­pected value. On one of the two ap­proaches, we hold that there are limits to how good (or bad) out­comes can be. On this view, no mat­ter how bad an out­come is, it could always get sub­stan­tially worse, and no mat­ter how good an out­come is, it could always get sub­stan­tially bet­ter. On the other ap­proach, there are no such limits, at least in one of these di­rec­tions. Either out­comes could get ar­bi­trar­ily good, or they could get ar­bi­trar­ily bad. On the third ap­proach, we give up on rank­ing out­comes in terms of their ex­pected value.

The main con­clu­sion of chap­ter 6 is that all of these ap­proaches have ex­tremely un­palat­able im­pli­ca­tions. On the ap­proach where there are up­per and lower limits, we have to be timid — un­will­ing to ac­cept ex­tremely small risks in or­der to enor­mously in­crease po­ten­tial pos­i­tive pay­offs. Im­plau­si­bly, this re­quires ex­treme risk aver­sion when cer­tain ex­tremely good out­comes are pos­si­ble, and ex­treme risk seek­ing when cer­tain ex­tremely bad out­comes are pos­si­ble, and it re­quires mak­ing one’s rank­ing of prospects de­pen­dent on how well things go in re­mote re­gions of space and time.

In the sec­ond case, we have to be reck­less — prefer­ring very low prob­a­bil­ities of ex­tremely good out­comes to very high prob­a­bil­ities of less good, but still ex­cel­lent, out­comes — or rank prospects non-tran­si­tively. I then show that, if a the­ory is reck­less, what it would be best to do, ac­cord­ing to that the­ory, de­pends al­most en­tirely upon what would be best in terms of con­sid­er­a­tions in­volv­ing in­finite value, no mat­ter how im­plau­si­ble it is that we can bring about any in­finitely good or bad out­comes, pro­vided it is not cer­tain. In this sense, there re­ally is some­thing deeply Pas­calian about the reck­less ap­proach.

Some might view this as a re­duc­tio of ex­pected util­ity the­ory. How­ever, I show that the only way to avoid be­ing both reck­less and timid is to rank out­comes in a cir­cle, claiming that A is bet­ter than B, which is bet­ter than C,. . . , which is bet­ter than Z, which is bet­ter than A. Thus, if we want to avoid these two other prob­lems, we have to give up not only on ex­pected util­ity the­ory, but we also have to give up on some very ba­sic as­sump­tions about how we should rank al­ter­na­tives. This makes it much less clear that we can sim­ply treat these prob­lems as a failure of ex­pected util­ity the­ory.

What does that have to do with the rough fu­ture-shap­ing ar­gu­ment? The prob­lem is that my for­mal­iza­tion of the rough fu­ture-shap­ing ar­gu­ment com­mits us to be­ing reck­less. Why? By Pe­riod In­de­pen­dence [the as­sump­tion that “By and large, how well his­tory goes as a whole is a func­tion of how well things go dur­ing each pe­riod of his­tory”], ad­di­tional good pe­ri­ods of his­tory are always good, how good it is to have ad­di­tional pe­ri­ods does not de­pend on how many you’ve already had, and there is no up­per limit (in prin­ci­ple) to how many good pe­ri­ods of his­tory there could be. There­fore, there is no up­per limit to how good out­comes can be. And that leaves us with reck­less­ness, and all the at­ten­dant the­o­ret­i­cal difficul­ties.

At this point, we are left with a challeng­ing situ­a­tion. On one hand, my for­mal­iza­tion of the rough fu­ture-shap­ing ar­gu­ment seemed plau­si­ble. How­ever, we have an ar­gu­ment that if its as­sump­tions are true, then what it is best to do de­pends al­most en­tirely on in­finite con­sid­er­a­tions. That’s a very im­plau­si­ble con­clu­sion. At the same time, the con­clu­sion does not ap­pear to be easy to avoid, since the al­ter­na­tives are the so-called timid ap­proach and rank­ing al­ter­na­tives non-tran­si­tively.

In chap­ter 7, I dis­cuss how im­por­tant it would be to shape the far fu­ture given these three differ­ent pos­si­bil­ities (reck­less­ness, timidity, and non-tran­si­tive rank­ings of al­ter­na­tives). As we have already said, in the case of reck­less­ness, the best de­ci­sion will be the de­ci­sion that is best in terms of in­finite con­sid­er­a­tions. In the first part of the chap­ter, I high­light some difficul­ties for say­ing what would be best with re­spect to in­finite con­sid­er­a­tions, and ex­plain how what is best with re­spect to in­finite con­sid­er­a­tions may de­pend on whether our uni­verse is in­finitely large, and whether it makes sense to say that one of two in­finitely good out­comes is bet­ter than the other.

In the sec­ond part of the chap­ter, I ex­am­ine how a timid ap­proach to as­sess­ing the value of prospects bears on the value of shap­ing the far fu­ture. The an­swer to this ques­tion de­pends on many com­pli­cated is­sues, such as whether we want to ac­cept some­thing similar to Pe­riod In­de­pen­dence in gen­eral even if Pe­riod In­de­pen­dence must fail in ex­treme cases, whether the uni­verse is in­finitely large, whether we should in­clude events far out­side of our causal con­trol when ag­gre­gat­ing value across space and time, and what the up­per limit for the value of out­comes is.

In the third part of the chap­ter, I con­sider the pos­si­bil­ity of us­ing the reck­less ap­proach in con­texts where it seems plau­si­ble and us­ing the timid ap­proach in the con­texts where it seems plau­si­ble. This ap­proach, I ar­gue, is more plau­si­ble in prac­tice than the al­ter­na­tives. I do not ar­gue that this mixed strat­egy is ul­ti­mately cor­rect, but in­stead ar­gue that it is the best available op­tion in light of our cog­ni­tive limi­ta­tions in effec­tively for­mal­iz­ing and im­prov­ing our pro­cesses for think­ing about in­finite ethics and long shots.

• Two quick thoughts:

• Any two the­o­ries can be made com­pat­i­ble if al­low­ing for some ad­di­tional cor­rec­tion fac­tor (e.g. a “lev­er­age penalty”) de­signed to make them com­pat­i­ble. As such, all the work rests with “is the lev­er­age penalty jus­tified?”

• For said jus­tifi­ca­tion, there has to some sort of jus­tifi­able ter­ri­tory-level rea­son­ing, in­clud­ing “does it carve re­al­ity at its joints?” and such, “is this the world we live in?”.

The prob­lem I see with the lev­er­age penalty is that there is no Bayesian up­dat­ing way that will get you to such a low prior. It’s the mir­ror from “can never pro­cess enough bits to get away from such a low prior”, namely “can never pro­cess enough bits to get to as­sign­ing such low pri­ors” (the blade cuts both ways).

The rea­son for that is in part that your en­tire level of con­fi­dence you have in the gov­ern­ing laws of physics, and the causal struc­ture and de­pen­dency graphs and such is pred­i­cated on the sen­sory bit­stream of your pre­vi­ous life—no more, it’s a strictly up­per bound. You can gain con­fi­dence that a prior to af­fect a google­plex peo­ple is that low only by us­ing that life­time bit­stream you have ac­cu­mu­lated—but then the trap shuts, just as you can’t get out of such a low prior, you can­not use any con­fi­dence you gained in the cur­rent sys­tem by ways of your life­time sen­sory in­put to get to such a low prior. You can be very sure you can’t af­fect that many, based on your un­der­stand­ing of how causal nodes are in­ter­con­nected, but you can’t be that sure (since you base your un­der­stand­ing on a com­par­a­tively much smaller num­ber of bits of ev­i­dence):

It’s a prior ex machina, with lit­tle more jus­tifi­ca­tion than just say­ing “I don’t deal with num­bers that large/​small in my de­ci­sion mak­ing”.

• Is it just me, or is ev­ery­one here overly con­cerned with com­ing up with patches for this spe­cific case and not the more gen­eral prob­lem? If util­ities can grow vastly larger than the prior prob­a­bil­ity of the situ­a­tion that con­tains them, then an ex­pected util­ity sys­tem will be­come al­most use­less. Act­ing on situ­a­tions with prob­a­bil­ities as tiny as can pos­si­bly be rep­re­sented in that sys­tem, since the math would vastly out­weigh the ex­pected util­ity from act­ing on any­thing else.

I’ve heard peo­ple come up with ap­par­ent re­s­olu­tions to this prob­lem. Like counter bal­anc­ing ev­ery pos­si­ble situ­a­tion with an equally low prob­a­bil­ity situ­a­tion that has vast nega­tive util­ity. There are a lot of prob­lems with this though. What if the util­ities don’t ex­actly coun­ter­bal­ance? An ex­tra bit to rep­re­sent a nega­tive util­ity for ex­am­ple, might add to the com­plex­ity and there­fore the prior prob­a­bil­ity. Or even a tiny amount of ev­i­dence for one sce­nario over the other would com­pletely up­set it.

And even if that isn’t the case, your util­ity might not have nega­tive. Maybe you only value the num­ber of pa­per­clips in the uni­verse. The worst that can hap­pen is you end up in a uni­verse with no pa­per­clips. You can’t have nega­tive pa­per­clips, so the low­est util­ity you can have is 0. Or maybe your pos­i­tive and nega­tive val­ues don’t ex­actly match up. Fear is a bet­ter mo­ti­va­tor than re­ward, for ex­am­ple. The fear of hav­ing peo­ple suffer may have more nega­tive util­ity than the op­po­site sce­nario of just as many peo­ple liv­ing happy lives or some­thing (and since they are both differ­ent sce­nar­ios with more differ­ences than a sin­gle num­ber, they would have differ­ent prior prob­a­bil­ities to be­gin with.)

Re­s­olu­tions that in­volve tweak­ing the prob­a­bil­ity of differ­ent events is just cheat­ing since the prob­a­bil­ity shouldn’t change if the uni­verse hasn’t. It’s how you act on those prob­a­bil­ities that we should be con­cerned about. And chang­ing the util­ity func­tion is pretty much cheat­ing too. You can make all sorts of ar­bi­trary tweaks that would solve the prob­lem, like hav­ing a max­i­mum util­ity or some­thing. But if you re­ally found out you lived in a uni­verse where 3^^^3 lives ex­isted (per­haps aliens have been breed­ing ex­ten­sively, or we re­ally do live in a simu­la­tion, etc), are you just sup­posed to stop car­ing about all life since it ex­ceeds your max­i­mum amount of car­ing?

I apol­o­gize if I’m only re­it­er­at­ing ar­gu­ments that have already been gone over. But it’s con­cern­ing to me that peo­ple are fo­cus­ing on ex­tremely sketchy patches to a spe­cific case of this prob­lem, and not the more gen­eral prob­lem, that any ex­pected util­ity func­tion be­comes ap­par­ently worth­less in a prob­a­bil­is­tic uni­verse like ours.

EDIT: I think I might have a solu­tion to the prob­lem and posted it here.

• What if the util­ities don’t ex­actly coun­ter­bal­ance?

The idea is that it’d be great to have a for­mal­ism where they do by con­struc­tion.

Also, when there’s no third party, it’s not dis­tinct enough from Pas­cal’s Wager as to de­mand ex­tra ter­minol­ogy that fo­cusses on the third party, such as “Pas­cal’s Mug­ging”. If it is just agent do­ing con­tem­pla­tions by it­self, that’s the agent mak­ing a wa­ger on it’s hy­pothe­ses, not get­ting mugged by some­one.

I’ll just go ahead and use “Pas­cal Scam” to de­scribe a situ­a­tion where an in-dis­t­in­guished agent promises un­usu­ally huge pay off, and the mark er­ro­neously gives in due to some com­bi­na­tion of bad pri­ors and bad util­ity eval­u­a­tion. The com­mon er­rors seem to be 1: omit the con­se­quence of keep­ing the money for a more dis­t­in­guished agent, 2: as­sign too high prior, 3: and, when pick­ing be­tween ap­proaches, ig­nore the huge cost of act­ing in a man­ner which en­courages dis­in­for­ma­tion. All those er­rors act in favour of the scam­mer (and some are op­tional), while non-er­ro­neous pro­cess­ing would as­sign huge nega­tive util­ity to pay­ing up even given high pri­ors.

• The idea is that it’d be great to have a for­mal­ism where they do by con­struc­tion.

There is no real way of do­ing that with­out chang­ing your prob­a­bil­ity func­tion or your util­ity func­tion. How­ever you can’t change those. The real prob­lem is with the ex­pected util­ity func­tion and I don’t see any way of fix­ing it, though per­haps I missed some­thing.

Also, when there’s no third party, it’s not dis­tinct enough from Pas­cal’s Wager as to de­mand ex­tra ter­minol­ogy that fo­cusses on the third party, such as “Pas­cal’s Mug­ging”. If it is just agent do­ing con­tem­pla­tions by it­self, that’s the agent mak­ing a wa­ger on it’s hy­pothe­ses, not get­ting mugged by some­one.

Any agent sub­ject to Pas­cal’s Mug­ging would fall pray to this prob­lem first, and it would be far worse. While the mug­ger is giv­ing his sce­nario, the agent could imag­ine an even more un­likely sce­nario. Say one where the mug­ger ac­tu­ally gives him 3^^^^^^3 units of util­ity if he does some ar­bi­trary task, in­stead of 3^^^3. This pos­si­bil­ity im­me­di­ately gets so much util­ity that it far out­weighs any­thing the mug­ger has to say af­ter that. Then the agent may imag­ine an even more un­likely sce­nario where it gets 3^^^^^^^^^^3 units of util­ity, and so on.

I don’t re­ally know what an agent would do if the ex­pected util­ity of any ac­tion ap­proached in­finity. Per­haps it would gen­er­ally work out as some things would ap­proach in­finity faster than oth­ers. I ad­mit I didn’t con­sider that. But I don’t know if that would nec­es­sar­ily be the case. Even if it is it seems “wrong” for ex­pected util­ities of ev­ery­thing to be in­finite and only tiny prob­a­bil­ities to mat­ter for any­thing. And if so then it would work out for the pas­cal’s mug­ging sce­nario too I think.

• There is no real way of do­ing that with­out chang­ing your prob­a­bil­ity func­tion or your util­ity func­tion. How­ever you can’t change those.

Last time I checked, pri­ors were fairly sub­jec­tive even here. We don’t know what is the best way to as­sign pri­ors. Things like “Solomonoff in­duc­tion” de­pend to ar­bi­trary choice of ma­chine.

Any agent sub­ject to Pas­cal’s Mug­ging would fall pray to this prob­lem first, and it would be far worse.

Nope, peo­ple who end up 419-scammed or waste a lot of money in­vest­ing into some­one like Ran­del L Mills or An­drea Rossi live through their life ok un­til they read a harm­ful string in a harm­ful set of cir­cum­stances (bunch of other be­liev­ers around for ex­am­ple).

• Last time I checked, pri­ors were fairly sub­jec­tive even here. We don’t know what is the best way to as­sign pri­ors. Things like “Solomonoff in­duc­tion” de­pend to ar­bi­trary choice of ma­chine.

Pri­ors are in­deed up for grabs, but a set of pri­ors about the uni­verse ought be con­sis­tent with it­self, no? A set of pri­ors based only on com­plex­ity may in­deed not be the best set of pri­ors—that’s what all the dis­cus­sions about “lev­er­age penalties” and the like are about, en­hanc­ing Solomonoff in­duc­tion with some­thing ex­tra. But what you seem to sug­gest is a set of pri­ors about the uni­verse that are de­signed for the ex­press pur­poses of mak­ing hu­man util­ity calcu­la­tions bal­ance out? Wouldn’t such a set of pri­ors re­quire the an­thro­por­phiza­tion of the uni­verse, and effec­tively mean sac­ri­fic­ing all sense of epistemic ra­tio­nal­ity?

• The best “pri­ors” about the uni­verse are 1 for what that uni­verse right around you is, and 0 for ev­ery­thing else. Other pri­ors are a com­pro­mise, an en­g­ineer­ing de­ci­sion.

What I am think­ing is that

• there is a con­sid­er­ably bet­ter way to as­sign pri­ors which we do not know of yet—the way which will as­sign equal prob­a­bil­ities to each side of a die if it has no rea­son to pre­fer one over the other—the way that does cor­re­spond to sym­me­tries in the ev­i­dence.

• We don’t know that there will still be same prob­lem when we have a non-stupid way to as­sign pri­ors (es­pe­cially as the non-stupid way ought to be con­sid­er­ably more sym­met­ric). And it may be that some value sys­tems are in­trin­si­cally in­co­her­ent. Sup­pose you wanted to max­i­mize blerg with­out know­ing what blerg even re­ally is. That wouldn’t be pos­si­ble, you can’t max­i­mize some­thing with­out hav­ing a mea­sure of it. But I still can tell you i’d give you 3^^^^3 blergs for a dol­lar, with­out ei­ther of us know­ing what blerg is sup­posed to be or when­ever 3^^^^3 blergs even make sense (if blerg is an unique good book of up to 1000 page length, it doesn’t be­cause du­pli­cates aren’t blerg).

• Last time I checked, pri­ors were fairly sub­jec­tive even here. We don’t know what is the best way to as­sign pri­ors. Things like “Solomonoff in­duc­tion” de­pend to ar­bi­trary choice of ma­chine.

True, but the goal of a prob­a­bil­ity func­tion is to rep­re­sent the ac­tual prob­a­bil­ity of an event hap­pen­ing as closely as pos­si­ble. The map should cor­re­spond to the ter­ri­tory. If your map is good, you shouldn’t change it un­less you ob­serve ac­tual changes in the ter­ri­tory.

Nope, peo­ple who end up 419-scammed or waste a lot of money in­vest­ing into some­one like Ran­del L Mills or An­drea Rossi live through their life ok un­til they read a harm­ful string in a harm­ful set of cir­cum­stances (bunch of other be­liev­ers around for ex­am­ple).

I don’t know if those things have such ex­tremes in low prob­a­bil­ity vs high util­ity to be called pas­cal’s mug­ging. But even so, the hu­man brain doesn’t op­er­ate on any­thing like Solomonoff in­duc­tion, Bayesian prob­a­bil­ity the­ory, or ex­pected util­ity max­i­miza­tion.

• The ac­tual prob­a­bil­ity is ei­ther 0 or 1 (ei­ther hap­pens or doesn’t hap­pen). Values in-be­tween quan­tify ig­no­rance and par­tial knowl­edge (e.g. when you have no rea­son to pre­fer one side of the die to the other), or, at times, are cho­sen very ar­bi­trar­ily (what is the prob­a­bil­ity that a physics the­ory is “cor­rect”).

I don’t know if those things have such ex­tremes in low prob­a­bil­ity vs high util­ity to be called pas­cal’s mug­ging.

New names for same things are kind of an­noy­ing, to be hon­est, es­pe­cially ill cho­sen… if it hap­pens by your own con­tem­pla­tion, I’d call it Pas­cal’s Wager. Mug­ging im­plies some­one mak­ing threats, scam is more gen­eral and can in­volve promises of re­ward. Either way the key is the high pay­off propo­si­tion wreck­ing some havoc, ei­ther through it’s prior prob­a­bil­ity be­ing too high, other propo­si­tions hav­ing been omit­ted, or the like.

But even so, the hu­man brain doesn’t op­er­ate on any­thing like Solomonoff in­duc­tion, Bayesian prob­a­bil­ity the­ory, or ex­pected util­ity max­i­miza­tion.

Peo­ple are still agents, though.

• The ac­tual prob­a­bil­ity is ei­ther 0 or 1 (ei­ther hap­pens or doesn’t hap­pen).

Yes but the goal is to as­sign what­ever out­come that will ac­tu­ally hap­pen with the high­est prob­a­bil­ity as pos­si­ble, us­ing what­ever in­for­ma­tion we have. The fact that some out­comes re­sult in ridicu­lously huge util­ity gains does not im­ply any­thing about how likely they are to hap­pen, so there is no rea­son that should be taken into ac­count (un­less it ac­tu­ally does, in which case it should.)

New names for same things are kind of an­noy­ing, to be hon­est, es­pe­cially ill cho­sen… if it hap­pens by your own con­tem­pla­tion, I’d call it Pas­cal’s Wager. Mug­ging im­plies some­one mak­ing threats, scam is more gen­eral and can in­volve promises of re­ward. Either way the key is the high pay­off propo­si­tion wreck­ing some havoc, ei­ther through it’s prior prob­a­bil­ity be­ing too high, other propo­si­tions hav­ing been omit­ted, or the like.

Pas­cal’s mug­ging was an ab­surd sce­nario with ab­surd re­wards that ap­proach in­finity. What you are talk­ing about is just nor­mal ev­ery­day scams. Most scams do not promise such huge re­wards or have such low prob­a­bil­ities (if you didn’t know any bet­ter it is fea­si­ble that some­one could have an awe­some in­ven­tion or need your help with trans­ac­tion fees.)

And the prob­lem with scams is that peo­ple over­es­ti­mate their prob­a­bil­ity. If they were to con­sider how many emails in the world are ac­tu­ally from Nige­rian Princes vs scam­mers, or how many peo­ple promise awe­some in­ven­tions with­out any proof they will ac­tu­ally work, they would re­con­sider. In pas­cal’s mug­ging, you fall for it even af­ter hav­ing con­sid­ered the prob­a­bil­ity of it hap­pen­ing in de­tail.

Your prob­a­bil­ity es­ti­ma­tion could be ab­solutely cor­rect. Maybe 1 out of a trillion times a per­son meets some­one claiming to be a ma­trix lord, they are ac­tu­ally tel­ling the truth. And they still end up get­ting scammed, so that the 1 in a trillionth counter-fac­tual of them­selves gets in­finite re­ward.

But even so, the hu­man brain doesn’t op­er­ate on any­thing like Solomonoff in­duc­tion, Bayesian prob­a­bil­ity the­ory, or ex­pected util­ity max­i­miza­tion.

Peo­ple are still agents, though.

They are agents, but they aren’t sub­ject to this spe­cific prob­lem be­cause we don’t re­ally use ex­pected util­ity max­i­miza­tion. At best maybe some kind of poor ap­prox­i­ma­tion of it. But it is a prob­lem for build­ing AIs or any kind of com­puter sys­tem that makes de­ci­sions based on prob­a­bil­ities.

• Maybe 1 out of a trillion times a per­son meets some­one claiming to be a ma­trix lord, they are ac­tu­ally tel­ling the truth

I think you’re con­sid­er­ing a differ­ent prob­lem than Pas­cal’s Mug­ging, if you’re tak­ing it as a given that the prob­a­bil­ities are in­deed 1 in a trillion (or for that mat­ter 1 in 10). The origi­nal prob­lem doesn’t make such an as­sump­tion.

What you have in mind, the case of definitely known prob­a­bil­ities, seems to me more like The LifeS­pan dilemma where e.g. “an un­bounded util­ity on lifes­pan im­plies will­ing­ness to trade an 80% prob­a­bil­ity of liv­ing some large num­ber of years for a 1/​(3^^^3) prob­a­bil­ity of liv­ing some suffi­ciently longer lifes­pan”

• The wiki page on it seems to sug­gest that this is the prob­lem.

If an agent’s util­ities over out­comes can po­ten­tially grow much faster than the prob­a­bil­ity of those out­comes diminishes, then it will be dom­i­nated by tiny prob­a­bil­ities of hugely im­por­tant out­comes; spec­u­la­tions about low-prob­a­bil­ity-high-stakes sce­nar­ios will come to dom­i­nate his moral de­ci­sion mak­ing… The agent would always have to take those kinds of ac­tions with far-fetched re­sults, that have low but non-neg­ligible prob­a­bil­ities but ex­tremely high re­turns.

This is seen as an un­rea­son­able re­sult. In­tu­itively, one is not in­clined to ac­quiesce to the mug­ger’s de­mands—or even pay all that much at­ten­tion one way or an­other—but what kind of prior does this im­ply?

Also this

Peter de Blanc has proven[1] that if an agent as­signs a finite prob­a­bil­ity to all com­putable hy­pothe­ses and as­signs un­bound­edly large finite util­ities over cer­tain en­vi­ron­ment in­puts, then the ex­pected util­ity of any out­come is un­defined.

which is pretty con­cern­ing.

I’m cu­ri­ous what you think the prob­lem with Pas­cal’s Mug­ging is though. That you can’t eas­ily es­ti­mate the prob­a­bil­ity of such a situ­a­tion? Well that is true of any­thing and isn’t re­ally unique to Pas­cal’s Mug­ging. But we can still ap­prox­i­mate prob­a­bil­ities. A nec­es­sary evil to live in a prob­a­bil­is­tic world with­out the abil­ity to do perfect Bayesian up­dates on all available in­for­ma­tion, or un­bi­ased pri­ors.

• I ab­hor us­ing un­nec­es­sary novel jar­gon.

I’m cu­ri­ous what you think the prob­lem with Pas­cal’s Mug­ging is though.

Bad math be­ing in­ter­nally bad, that’s the prob­lem. Noth­ing to do with any wor­lds, real or imag­i­nary, just a case of in­ter­nally bad math—util­ities are un­defined, it is un­defined if you pay up or not, the ac­tions cho­sen are un­defined. Akin to max­i­miz­ing blerg with­out any defi­ni­tion of what blerg even is—max­i­miz­ing “ex­pected util­ity” with­out hav­ing defined it.

Speed prior works, for ex­am­ple (it breaks some as­sump­tions of Blanc. Namely, the prob­a­bil­ity is not bounded from be­low by any com­putable func­tion of length of the hy­poth­e­sis).

• util­ities are un­defined, it is un­defined if you pay up or not, the ac­tions cho­sen are un­defined. Akin to max­i­miz­ing blerg with­out any defi­ni­tion of what blerg even is—max­i­miz­ing “ex­pected util­ity” with­out hav­ing defined it.

Call it un­defined if you like, but I’d still pre­fer 3^^^3 peo­ple not suffer. It would be pretty weird to ar­gue that hu­man lives de­cay in util­ity based on how many there are. If you found out that the uni­verse was big­ger than you thought, that there re­ally were far more hu­mans in the uni­verse some­how, would you just stop car­ing about hu­man life?

It would also be pretty hard to ar­gue that at least some small amount of money isn’t worth giv­ing in or­der to save a hu­man life, or that giv­ing a small amount of money isn’t worth a small prob­a­bil­ity of sav­ing enough lives to make up for the small prob­a­bil­ity.

• It would be pretty weird to ar­gue that hu­man lives de­cay in util­ity based on how many there are.

Well, sup­pose there’s mind up­loads, and one mind up­load is very wor­ried about him­self so he runs him­self mul­ti­ply re­dun­dant with 5 ex­act copies. Should this up­load be a minor util­ity mon­ster?

3^^^3 is far more than there are pos­si­ble peo­ple.

If you found out that the uni­verse was big­ger than you thought, that there re­ally were far more hu­mans in the uni­verse some­how, would you just stop car­ing about hu­man life?

Bounded doesn’t mean it just hits a cap and stays there. Also, if you scale all util­ities that you can effect down it changes noth­ing about ac­tions (an­other con­fu­sion—map­ping the util­ity to how much one cares).

And yes there are definitely cases where money are worth small prob­a­bil­ity of sav­ing lives, and ev­ery­one agrees on such—e.g. if we find out that an as­ter­oid has cer­tain chance to hit Earth, we’d give money to space agen­cies, even when chance is rather minute (we’d not give money to cold fu­sion crack­pots though). There’s noth­ing fun­da­men­tally wrong with spend­ing a bit to avert a small prob­a­bil­ity of some­thing ter­rible hap­pen­ing. The prob­lem arises when the prob­a­bil­ity is over­es­ti­mated, when the con­se­quences are poorly eval­u­ated, etc. It is ac­tively harm­ful for ex­am­ple to en­courage boys to cry wolf need­lessly. I’m think­ing peo­ple sort of feel in­nately that if they are giv­ing money away—los­ing—some gi­ant fair­ness fairy is go­ing to make the re­sult more likely good than bad for ev­ery­one. World doesn’t work like this; all those naive folks who jump on op­por­tu­nity to give money to some­one promis­ing to save the world, no mat­ter how ig­no­rant, un­e­d­u­cated, or crack­potty that per­son is in the fields where cor­rect­ness can be checked at all, they are in­creas­ing risk, not de­creas­ing.

• It would be pretty weird to ar­gue that hu­man lives de­cay in util­ity based on how many there are.

Maybe not as weird as all that. Given a forced choice be­tween kil­ling A and B where I know noth­ing about them, I flip a coin, but add the knowl­edge that A is a du­pli­cate of C and B is not a du­pli­cate of any­one, and I choose A quite eas­ily. I con­clude from this that I value unique hu­man lives quite a lot more than I value non-unique hu­man lives. As oth­ers have pointed out, the num­ber of unique hu­man lives is finite, and the num­ber of lives I con­sider worth liv­ing nec­es­sar­ily even lower, so the more peo­ple there are liv­ing lives worth liv­ing, the less unique any in­di­vi­d­ual is, and there­fore the less I value any in­di­vi­d­ual life. (In­so­far as my val­ues are con­sis­tent, any­way. Which of course they aren’t, but this whole “lets pre­tend” game of util­ity calcu­la­tion that we en­joy play­ing de­pends on treat­ing them as though they were.)

• There is no ev­i­dence for the ac­tual ex­is­tence of neatly walled-of and un­up­date­able util­ity func­tions or prob­a­bil­ity func­tions, any more than there is for a luz’.

• Utility and prob­a­bil­ity func­tions are not perfect or neatly walled off. But that doesn’t mean you should change them to fix a prob­lem with your ex­pected util­ity func­tion. The goal of a prob­a­bil­ity func­tion is to rep­re­sent the ac­tual prob­a­bil­ity of an event hap­pen­ing as closely as pos­si­ble. And the goal of a util­ity func­tion is to rep­re­sent what you states you would pre­fer the uni­verse to be in. This also shouldn’t change un­less you’ve ac­tu­ally changed your prefer­ences.

• And the goal of a util­ity func­tion is to rep­re­sent what you states you would pre­fer the uni­verse to be in. This also shouldn’t change un­less you’ve ac­tu­ally changed your prefer­ences.

There’s plenty of ev­i­dence of peo­ple chang­ing their prefer­ences over sig­nifi­cant pe­ri­ods of time: it would be weird not to. And I am well aware that the the­ory of sta­ble util­ity func­tions is stan­dardly patched up with a fur­ther the­ory of ter­mi­nal val­ues, for which there is also no di­rect ev­i­dence.

• There’s plenty of ev­i­dence of peo­ple chang­ing their prefer­ences over sig­nifi­cant pe­ri­ods of time: it would be weird not to.

Of course peo­ple can change their prefer­ences. But if your prefer­ences are not con­sis­tent you will likely end up in situ­a­tions that are less prefer­able than if you had the same prefer­ences the en­tire time. It also makes you a po­ten­tial money pump.

And I am well aware that the the­ory of sta­ble util­ity func­tions is stan­dardly patched up with a fur­ther the­ory of ter­mi­nal val­ues, for which there is also no di­rect ev­i­dence.

What? Ter­mi­nal val­ues are not a patch for util­ity func­tions. It’s ba­si­cally an­other word that means the same thing, what state you would pre­fer the world to end up in. And how can there be ev­i­dence for a de­ci­sion the­ory?

• Ter­mi­nal val­ues are not a patch for util­ity func­tions.

Well, I’ve cer­tainly seen dis­cus­sions here in which the ob­served in­con­sis­tency among our pro­fessed val­ues is treated as a non-prob­lem on the grounds that those are mere in­stru­men­tal val­ues, and our ter­mi­nal val­ues are pre­sumed to be more con­sis­tent than that.

In­so­far as sta­ble util­ity func­tions de­pend on con­sis­tent val­ues, it’s not un­rea­son­able to de­scribe such dis­cus­sions as posit­ing con­sis­tent ter­mi­nal val­ues in or­der to sup­port a be­lief in sta­ble util­ity func­tions.

• Well, how is this differ­ent from chang­ing our prefer­ences to util­ity func­tions to fix prob­lems with our naive prefer­ences?

• I don’t know what you mean. All I’m say­ing is that you shouldn’t change your prefer­ences be­cause of a prob­lem with your ex­pected util­ity func­tion. Your prefer­ences are just what you want. Utility func­tions are just a math­e­mat­i­cal way of ex­press­ing that.

• Hu­man prefer­ences don’t nat­u­rally satisfy the VNM ax­ioms, thus by ex­press­ing them as a util­ity func­tion you’ve already changed them.

• I don’t see why our prefer­ences can’t be ex­pressed by a util­ity func­tion even as they are. The only rea­son it wouldn’t work out is if there were cir­cu­lar prefer­ences, and I don’t think most peo­ples prefer­ences would work out to be truly cir­cu­lar if they were to think about the spe­cific oc­cur­rence and de­cide what they re­ally preferred.

Though map­ping out which out­comes are more preferred than oth­ers is not enough to as­sign them an ac­tual util­ity, you’d some­how have to guess how much more prefer­able one out­come is to an­other quan­ti­ta­tively.But even then I think most peo­ple could if they thought about it enough. The prob­lem is that our util­ity func­tions are com­plex and we don’t re­ally know what they are, not that they don’t ex­ist.

• I don’t see why our prefer­ences can’t be ex­pressed by a util­ity func­tion even as they are. The only rea­son it wouldn’t work out is if there were cir­cu­lar prefer­ences, and I don’t think most peo­ples prefer­ences would work out to be truly cir­cu­lar if they were to think about the spe­cific oc­cur­rence and de­cide what they re­ally preferred.

Or they might vi­o­late the in­de­pen­dence ax­iom, but in any case what do you mean by ” think about the spe­cific oc­cur­rence and de­cide what they re­ally preferred”, since the re­sult of such think­ing is likely to de­pend on the ex­act or­der they thought about things in.

• I think the sim­pler solu­tion is just to use a bounded util­ity func­tion. There are sev­eral things sug­gest­ing we do this, and I re­ally don’t see any rea­son to not do so, in­stead of go­ing through con­tor­tions to make un­bounded util­ity work.

Con­sider the pa­per of Peter de Blanc that you link—it doesn’t say a com­putable util­ity func­tion won’t have con­ver­gent util­ities, but rather that it will iff said func­tion is bounded. (At least, in the re­stricted con­text defined there, though it seems fairly gen­eral.) You could try to es­cape the con­di­tions of the the­o­rem, or you could just con­clude that util­ity func­tions should be bounded.

Let’s go back and ask the ques­tion of why we’re us­ing prob­a­bil­ities and util­ities in the first place. Is it be­cause of Sav­age’s The­o­rem? But the util­ity func­tion out­put by Sav­age’s The­o­rem is always bounded.

OK, maybe we don’t ac­cept Sav­age’s ax­iom 7, which is what forces util­ity func­tions to be bounded. But then we can only be sure that com­par­ing ex­pected util­ities is the right thing to do for finite gam­bles, not for in­finite ones, so talk­ing about sums con­verg­ing or not—well, it’s some­thing that shouldn’t even come up. Or al­ter­na­tively, if we do en­counter a situ­a­tion with in­finitely many choices, each of differ­ing util­ity, we sim­ply don’t know what to do.

Maybe we’re not bas­ing this on Sav­age’s the­o­rem at all—maybe we sim­ply take prob­a­bil­ity for granted (or just take for granted that it should be a real num­ber and ground it in some­thing like Cox’s the­o­rem—af­ter all, like Sav­age’s the­o­rem, Cox’s the­o­rem only re­quires that prob­a­bil­ity be finitely ad­di­tive) and are then de­riv­ing util­ity from the VNM the­o­rem. The VNM the­o­rem doesn’t pro­hibit un­bounded util­ities. But the VNM the­o­rem once again only tells us how to han­dle finite gam­bles—it doesn’t tell us that in­finite gam­bles should also be han­dled via ex­pected util­ity.

OK, well, maybe we don’t care about the par­tic­u­lar ground­ing—we’re just go­ing to use prob­a­bil­ity and util­ity be­cause it’s the best frame­work we know, and we’ll make the prob­a­bil­ity countably ad­di­tive and use ex­pected util­ity in all cases hey, why not, seems nat­u­ral, right? (In that case, the AI may want to even­tu­ally re­con­sider whether prob­a­bil­ity and util­ity re­ally is the best frame­work to use, if it is ca­pa­ble of do­ing so.) But even if we throw all that out, we still have the prob­lem de Blanc raises. And, um, all the other prob­lems that have been raised with un­bounded util­ity. (And if we’re just us­ing prob­a­bil­ity and util­ity to make things nice, well, we should prob­a­bly use bounded util­ity to make things nicer.)

I re­ally don’t see any par­tic­u­lar rea­son util­ity has to be un­bounded ei­ther. Eliezer Yud­kowsky seems to keep us­ing this as­sump­tion that util­ity should be un­bounded, or just not nec­es­sar­ily bounded, but I’ve yet to see any jus­tifi­ca­tion for this. I can find one dis­cus­sion where, when the ques­tion of bounded util­ity func­tions came up, Eliezer re­sponded, “[To avert a cer­tain prob­lem] the bound would also have to be sub­stan­tially less than 3^^^^3.”—but this in­di­cates a mi­s­un­der­stand­ing of the idea of util­ity, be­cause util­ity func­tions can be ar­bi­trar­ily (pos­i­tively) rescaled or re­cen­tered. In­di­vi­d­ual util­ity “num­bers” are not mean­ingful; only ra­tios of util­ity differ­ences. If a util­ity func­tion is bounded, you can as­sume the bounds are 0 and 1. Talk about the value of the bound is as mean­ingless as any­thing else us­ing ab­solute util­ity num­bers; they’re not amounts of fun or some­thing.

Sure, if you’re tak­ing a to­tal-util­i­tar­ian view­point, then your (de­ci­sion-the­o­retic) util­ity func­tion has to be un­bounded, be­cause you’re sum­ming a quan­tity over an ar­bi­trar­ily large set. (I mean, I guess phys­i­cal limi­ta­tions im­pose a bound, but they’re not log­i­cal limi­ta­tions, so we want to be able to as­sign val­ues to situ­a­tions where they don’t hold.) (As op­posed to the in­di­vi­d­ual “util­ity” func­tions that your’e sum­ming, which is a differ­ent sort of “util­ity” that isn’t ac­tu­ally well-defined at pre­sent.) But to­tal util­i­tar­i­anism—or util­i­tar­i­anism in gen­eral—is on much shak­ier ground than de­ci­sion-the­o­retic util­ity func­tions and what we can do with them or prove about them. To in­sist that util­ity be un­bounded based on to­tal util­i­tar­i­anism (or any form of util­i­tar­i­anism) while ig­nor­ing the solid things we can say seems back­wards.

Not ev­ery­thing has to scale lin­early, af­ter all. There seems to be this idea out there that util­ity must be un­bounded be­cause there are con­stants C_1 and C_2 such that adding to the world of per­son of “util­ity” (in the util­i­tar­ian sense) C_1 must in­crease your util­ity (in the de­ci­sion-the­o­retic sense) by C_2, but this doesn’t need to be so. This to me seems a lot like in­sist­ing “Well, no mat­ter how fast I’m go­ing, I can always toss a base­ball for­ward in my di­rec­tion at 1 foot per sec­ond rel­a­tive to me; so it will be go­ing 1 foot per sec­ond faster than me, so the set of pos­si­ble speeds is un­bounded.” As it turns out, the set of pos­si­ble speeds is bounded, ve­loc­i­ties don’t add lin­early, and if you toss a base­ball for­ward in your di­rec­tion at 1 foot per sec­ond rel­a­tive to you, it will not be go­ing 1 foot per sec­ond faster.

My own in­tu­ition is more in line with earth­worm­chuck163′s com­ment—I doubt I would be that joy­ous about mak­ing that many more peo­ple when so many are go­ing to be du­pli­cates or near-du­pli­cates of one an­other. But even if you don’t agree with this, things don’t have to add lin­early, and util­ities don’t have to be un­bounded.

• I can find one dis­cus­sion where, when the ques­tion of bounded util­ity func­tions came up, Eliezer re­sponded, “[To avert a cer­tain prob­lem] the bound would also have to be sub­stan­tially less than 3^^^^3.”—but this in­di­cates a mi­s­un­der­stand­ing of the idea of util­ity, be­cause util­ity func­tions can be ar­bi­trar­ily (pos­i­tively) rescaled or re­cen­tered. In­di­vi­d­ual util­ity “num­bers” are not mean­ingful; only ra­tios of util­ity differ­ences.

I think he was as­sum­ing a nat­u­ral scale. After all, you can just pick some ev­ery­day-sized util­ity differ­ence to use as your unit, and mea­sure ev­ery­tihng on that scale. It wouldn’t re­ally mat­ter what util­ity differ­ence you pick as long as it is a nat­u­ral size, since mul­ti­ply­ing by 3^^^3 is eas­ily enough for the ar­gu­ment to go through.

• I think the sim­pler solu­tion is just to use a bounded util­ity func­tion. There are sev­eral things sug­gest­ing we do this, and I re­ally don’t see any rea­son to not do so, in­stead of go­ing through con­tor­tions to make un­bounded util­ity work.

Us­ing a bounded util­ity func­tion is the what you do if and only if your prefer­ences hap­pen to be bounded in that way. The util­ity func­tion is not up for grabs. You don’t change the util­ity func­tion be­cause it makes de­ci­sion mak­ing more con­ve­nient (well, un­less you have done a lot of home­work).

As it hap­pens I don’t make (hy­po­thet­i­cal) de­ci­sions as if as­sign lin­ear value to per­son-lives. That is be­cause as best as I can tell my ac­tual prefer­ence re­ally does as­sign less value to the 3^^^3rd per­son-life cre­ated than to the 5th per­son-life. How­ever, some­one who does care just as much about each ad­di­tional per­son would be mak­ing an er­ror if they acted as if they had a bounded util­ity func­tion.

• Your ar­gu­ment proves a bit too much, I think. I could equally well re­ply, “Us­ing a util­ity func­tion is what you do if and only if your prefer­ences are de­scribed by a util­ity func­tion. Ter­mi­nal val­ues are not up for grabs. You don’t re­duce your ter­mi­nal val­ues to a util­ity func­tion just be­cause it makes de­ci­sion mak­ing more con­ve­nient.”

The fact of the mat­ter is that our prefer­ences are not nat­u­rally de­scribed by a util­ity func­tion; so if we’ve agreed that the AI should use a util­ity func­tion, well, there must be some rea­son for that other than “it’s a cor­rect de­scrip­tion of our prefer­ences”, i.e., we’ve agreed that such rea­sons are worth con­sid­er­a­tion. And I don’t see any such rea­son that doesn’t also im­me­di­ately sug­gest we should use a bounded util­ity func­tion (at least, if we want to be able to con­sider in­finite gam­bles).

So I’m hav­ing trou­ble be­liev­ing that your po­si­tion is con­sis­tent. If you said we should do away with util­ity func­tions en­tirely to bet­ter model hu­man ter­mi­nal val­ues, that would make sense. But why would you throw out the bounded part, and then keep the util­ity func­tion part? I’m hav­ing trou­ble see­ing any line of rea­son­ing that would sup­port both of those si­mul­ta­neously. (Well un­less you want to throw out in­finite gam­bles, which does seems like a con­sis­tent po­si­tion. Note, though, that in that case we also don’t have to do con­tor­tions like in this post.)

Edit: Added notes about finite vs. in­finite gam­bles.

• Your ar­gu­ment proves a bit too much, I think. I could equally well re­ply, “Us­ing a util­ity func­tion is what you do if and only if your prefer­ences are de­scribed by a util­ity func­tion. Ter­mi­nal val­ues are not up for grabs. You don’t re­duce your ter­mi­nal val­ues to a util­ity func­tion just be­cause it makes de­ci­sion mak­ing more con­ve­nient.”

If you were to gen­er­al­ise it would have to be to some­thing like “only if your prefer­ences can be rep­re­sented with­out loss as a util­ity func­tion”. Even then there are ex­cep­tions. How­ever the in­tri­ca­cies of re­solv­ing com­plex and in­ter­nally in­con­sis­tent agents seems rather or­thog­o­nal to the is­sue of how a given agent would be­have in the counter-fac­tual sce­nario pre­sented.

So I’m hav­ing trou­ble be­liev­ing that your po­si­tion is con­sis­tent. If you said we should do away with util­ity func­tions en­tirely to bet­ter model hu­man ter­mi­nal val­ues, that would make sense. But why would you throw out the bounded part, and then keep the util­ity func­tion part? I’m hav­ing trou­ble see­ing any line of rea­son­ing that would sup­port both of those si­mul­ta­neously.

Mean­while, I eval­u­ate your solu­tion to this prob­lem (throw away the util­ity func­tion and re­place it with a differ­ent one) to be equiv­a­lent to, when en­coun­ter­ing New­comb’s Prob­lem, choos­ing the re­sponse “Self mod­ify into a pa­per­clip max­imiser, just for the hell of it, then choose whichever box choice max­imises pa­per­clips”. That it seems to be per­sua­sive to read­ers makes this thread all too sur­real for me. Tap­ping out be­fore can­did­ness causes difficul­ties.

• If you were to gen­er­al­ise it would have to be to some­thing like “only if your prefer­ences can be rep­re­sented with­out loss as a util­ity func­tion”

It’s not clear to me what dis­tinc­tion you are at­tempt­ing to draw be­tween “Can be de­scribed by a util­ity func­tion” and “can be rep­re­sented with­out loss as a util­ity func­tion”. I don’t think any such dis­tinc­tion can sen­si­bly be drawn. They seem to sim­ply say the same thing.

Even then there are ex­cep­tions.

I’d ask you to ex­plain, but, well, I guess you’re not go­ing to.

(throw away the util­ity func­tion and re­place it with a differ­ent one)

I’m not throw­ing out the util­ity func­tion and re­plac­ing it with a differ­ent one, be­cause there is no util­ity func­tion. What there is is a bunch of prefer­ences that don’t satisfy Sav­age’s ax­ioms (or the VNM ax­ioms or whichever for­mu­la­tion you pre­fer) and as such can­not ac­tu­ally be de­scribed by a util­ity func­tion. Again—ev­ery­thing you’ve said works perfectly well as an ar­gu­ment against util­ity func­tions gen­er­ally. (“You’re toss­ing out hu­man prefer­ences and us­ing a util­ity func­tion? So, what, when pre­sented with New­comb’s prob­lem, you self-mod­ify into a pa­per­clip­per and then pick the pa­per­clip-max­i­miz­ing box?”)

Per­haps I should ex­plain in more de­tail how I’m think­ing about this.

We want to im­ple­ment an AI, and we want it to be ra­tio­nal in cer­tain senses—i.e. obey cer­tain ax­ioms—while still im­ple­ment­ing hu­man val­ues. Hu­man prefer­ences don’t satisfy these ax­ioms. We could just give it hu­man prefer­ences and not worry about the in­tran­si­tivity and the dy­namic in­con­sis­ten­cies and such, or, we could force it a bit.

So we imag­ine that we have some (as yet un­known) pro­ce­dure that takes a gen­eral set of prefer­ences and con­verts it to one satis­fy­ing cer­tain re­quire­ments (spe­cific to the pro­ce­dure). Ob­vi­ously some­thing is lost in the pro­cess. Are we OK with this? I don’t know. I’m not mak­ing a claim ei­ther way about this. But you are go­ing to lose some­thing if you ap­ply this pro­ce­dure.

OK, so we feed in a set of prefer­ences and we get out one satis­fy­ing our re­quire­ments. What are our re­quire­ments? If they’re Sav­age’s ax­ioms, we get out some­thing that can be de­scribed by a util­ity func­tion, and a bounded one at that. If they’re Sav­age’s ax­ioms with­out ax­iom 7, or (if we take prob­a­bil­ity as a prim­i­tive) the VNM ax­ioms, then we get out some­thing that for finite gam­bles can be de­scribed by a util­ity func­tion (not nec­es­sar­ily bounded), but which can­not nec­es­sar­ily be eas­ily de­scribed for in­finite gam­bles.

If I’m un­der­stand­ing you cor­rectly, you’re read­ing me as sug­gest­ing a two-step pro­cess: First we take hu­man val­ues and force them into a util­ity func­tion, then take that util­ity func­tion and force it to be bounded. I am not sug­gest­ing that. Rather, I am say­ing, we take hu­man val­ues and force them to satisfy cer­tain prop­er­ties, and the re­sult can then nec­es­sar­ily be de­scribed by a bounded util­ity func­tion.

Peo­ple on this site seem to of­ten just as­sume that be­ing ra­tio­nal means us­ing a util­ity func­tion, not re­mem­ber­ing that a util­ity func­tion is just how we de­scribe sets of prefer­ences satis­fy­ing cer­tain ax­ioms. It’s not whether you use a util­ity func­tion or not that it’s im­por­tant, it’s ques­tions like, are your prefer­ences tran­si­tive? Do they obey the sure-thing prin­ci­ple? And so forth. Now, sure, the only way to obey all those re­quire­ments is to use a util­ity func­tion, but it’s im­por­tant to keep the rea­son in mind.

If we re­quire the out­put of our pro­ce­dure to obey Sav­age’s ax­ioms, it can be de­scribed by a bounded util­ity func­tion. That’s just a fact. If we leave out ax­iom 7 (or use the VNM ax­ioms), then it can kind of be de­scribed by a util­ity func­tion—for finite gam­bles it can be de­scribed by a util­ity func­tion, and it’s not clear what hap­pens for in­finite gam­bles.

So do you in­clude ax­iom 7 or no? (Well, OK, you might just use a differ­ent set of re­quire­ments en­tirely, but let’s as­sume it’s one of these two sets of re­quire­ments for now.) If yes, the out­put of your pro­ce­dure will be a bounded util­ity func­tion, and you don’t run into these prob­lems with non­con­ver­gence. If no, you also don’t run into these prob­lems with non­con­ver­gence—the pro­ce­dure is re­quired to out­put a co­her­ent set of prefer­ences, af­ter all! -- but for a differ­ent rea­son: Be­cause the set of prefer­ences it out­put can only be mod­eled by a util­ity func­tion for finite gam­bles. So if you start tak­ing in­finite weighted sums of util­ities, the re­sult doesn’t nec­es­sar­ily tell you any­thing about which one to choose.

So at no point should you be tak­ing in­finite sums with an un­bounded util­ity func­tion, be­cause there is no un­der­ly­ing rea­son to do so. The only rea­son to do so that I can see is that, for your re­quire­ments, you’ve sim­ply de­clared, “We’re go­ing to re­quire that the out­put of the pro­ce­dure can be de­scribed by a util­ity func­tion (in­clud­ing for in­finite gam­bles).” But that’s just a silly set of re­quire­ments. As I said above—it’s not failing to use a util­ity func­tion we should be avoid­ing; it’s the ac­tual prob­lems this causes we should be avoid­ing. Declar­ing at the out­set we’re go­ing to use a util­ity func­tion, in­stead of that we want to avoid par­tic­u­lar prob­lems, is silly. I don’t see why you’d want to run hu­man val­ues through such a poorly mo­ti­vated pro­ce­dure.

So again, I’m not claiming you want to run your val­ues through the ma­chine and force them into a bounded util­ity func­tion; but rather just that, if you want to run them through this one ma­chine, you will get a bounded util­ity func­tion; and if in­stead you run them through this other ma­chine, you will get a util­ity func­tion, kind of, but it won’t nec­es­sar­ily be valid for in­finite gam­bles. Eliezer seems to want to run hu­man val­ues through the ma­chine. Which one will he dis­pre­fer less? Well, he always seems to as­sume that com­par­ing the ex­pected util­ities of in­finite gam­bles is a valid op­er­a­tion, so I’m in­fer­ring he’d pre­fer the first one, and that one only out­puts bounded util­ity func­tions. Maybe I’m wrong. But in that case he should stop as­sum­ing that com­par­ing the ex­pected util­ities of in­finite gam­bles is a valid op­er­a­tion.

• You still get a prob­a­bil­ity func­tion with­out Sav­age’s P6 and P7, you just don’t get a util­ity func­tion with codomain the re­als, and you don’t get ex­pec­ta­tions over in­finite out­come spaces. If we add real-val­ued prob­a­bil­ities, for ex­am­ple by as­sum­ing Sav­age’s P6′, you even get finite ex­pec­ta­tions, as­sum­ing I haven’t made an er­ror.

• You don’t change the util­ity func­tion be­cause it makes de­ci­sion mak­ing more con­ve­nient [..] some­one who does care just as much about each ad­di­tional per­son would be mak­ing an er­ror if they acted as if they had a bounded util­ity func­tion.

True.

That said, given some state­ment P about my prefer­ences, such as “I as­sign lin­ear value to per­son-lives,” such that P be­ing true makes de­ci­sion-mak­ing in­con­ve­nient, if I cur­rently have C con­fi­dence in P then de­pend­ing on C it may be more worth­while to de­vote my time to gath­er­ing ad­di­tional ev­i­dence for and against P than to de­vel­op­ing a de­ci­sion pro­ce­dure that works in the in­con­ve­nient case.

On the other hand, if I keep gath­er­ing ev­i­dence about P un­til I con­clude that P is false and then stop, that also has an ob­vi­ous as­so­ci­ated failure mode.

• I think the sim­pler solu­tion is just to use a bounded util­ity func­tion. There are sev­eral things sug­gest­ing we do >this, and I re­ally don’t see any rea­son to not do so, in­stead of go­ing through con­tor­tions to make un­bounded >util­ity work.

But that’s es­sen­tially already the case. Just con­sider the bound to be 3^^^^3 utilons, or even an illimited num­ber of them. Those are not in­finite, but still al­low all the situ­a­tions and ar­gu­ments made above.

Para­doxes of in­finity weren’t the is­sue in this case.

• Again, in­di­vi­d­ual util­ity num­bers are not mean­ingful.

I’m not sure which “situ­a­tions and ar­gu­ments” you’re say­ing this still al­lows. It doesn’t al­low the di­ver­gent sum that started all this.

• If an AI’s over­all ar­chi­tec­ture is such as to en­able it to carry out the “You turned into a cat” effect—where if the AI ac­tu­ally ends up with strong ev­i­dence for a sce­nario it as­signed su­per-ex­po­nen­tial im­prob­a­bil­ity, the AI re­con­sid­ers its pri­ors and the ap­par­ent strength of ev­i­dence rather than ex­e­cut­ing a blind Bayesian up­date, though this part is for­mally a tad un­der­speci­fied—then at the mo­ment I can’t think of any­thing else to add in.

Ex ante, when the AI as­signs in­finites­i­mal prob­a­bil­ity to the real thing, and mean­ingful prob­a­bil­ity to “hal­lu­ci­na­tion/​my sen­sors are be­ing fed false in­for­ma­tion,” why doesn’t it self-mod­ify/​self-bind to treat fu­ture ap­par­ent cat trans­for­ma­tions as hal­lu­ci­na­tions?

“Now, in this sce­nario we’ve just imag­ined, you were tak­ing my case se­ri­ously, right? But the ev­i­dence there couldn’t have had a like­li­hood ra­tio of more than 10^10^26 to 1, and prob­a­bly much less. So by the method of imag­i­nary up­dates, you must as­sign prob­a­bil­ity at least 10^-10^26 to my sce­nario, which when mul­ti­plied by a benefit on the or­der of 3↑↑↑3, yields an uni­mag­in­able bo­nanza in ex­change for just five dol­lars—”

Me: “Nope.”

I don’t buy this. Con­sider the fol­low­ing com­bi­na­tion of fea­tures of the world and ac­count of an­thropic rea­son­ing (brought up by var­i­ous com­menters in pre­vi­ous dis­cus­sions), which is at least very im­prob­a­ble in light of its spe­cific fea­tures and what we know about physics and cos­mol­ogy, but not cos­mi­cally so.

• A world small enough not to con­tain lu­dicrous num­bers of Boltz­mann brains (or Boltz­mann ma­chin­ery)

• Where it is pos­si­ble to cre­ate hy­per­com­put­ers through com­plex ar­tifi­cial means

• Where hy­per­com­put­ers are used to com­pute ar­bi­trar­ily many happy life-years of an­i­mals, or hu­man­like be­ings with epistemic en­vi­ron­ments clearly dis­tinct from our own (YOU ARE IN A HYPERCOMPUTER SIMULATION tags float­ing in front of their eyes)

• And the hy­per­com­puted be­ings are not less real or valuable be­cause of their num­bers and long addresses

Treat­ing this as in­finites­i­mally likely, and then jump­ing to mea­surable prob­a­bil­ity on re­ceipt of (what?) ev­i­dence about hy­per­com­put­ers be­ing pos­si­ble, etc, seems pretty un­rea­son­able to me.

The be­hav­ior you want could be ap­prox­i­mated with a bounded util­ity func­tion that as­signed some weight to achiev­ing big pay­offs/​achiev­ing a sig­nifi­cant por­tion (on one of sev­eral scales) of pos­si­ble big pay­offs/​etc. In the ab­sence of ev­i­dence that the big pay­offs are pos­si­ble, the bounded util­ity gain is mul­ti­plied by low prob­a­bil­ity and you won’t make big sac­ri­fices for it, but in the face of lots of ev­i­dence, and if you have satis­fied other terms in your util­ity func­tion pretty well, big pay­offs could be­come a larger fo­cus.

Ba­si­cally, I think such a bounded util­ity func­tion could bet­ter track the emo­tional re­sponses driv­ing your in­tu­itions about what an AI should do in var­i­ous situ­a­tions than jury-rig­ging the prior. And if you don’t want to track those re­sponses then be care­ful of those in­tu­itions and look to em­piri­cal sta­bi­liz­ing as­sump­tions.

• Treat­ing this as in­finites­i­mally likely, and then jump­ing to mea­surable prob­a­bil­ity on re­ceipt of (what?) ev­i­dence about hy­per­com­put­ers be­ing pos­si­ble, etc, seems pretty un­rea­son­able to me.

It seems rea­son­able to me be­cause on the stated as­sump­tions—the float­ing tags seen by vast num­bers of other be­ings but not your­self—you’ve man­aged to gen­er­ate sen­sory data with a vast like­li­hood ra­tio. The vast up­date is as rea­son­able as this vast ra­tio, no more, no less.

• The prob­lem is that you seem to be in­tro­duc­ing one du­bi­ous piece to deal with an­other. Why is the hy­poth­e­sis that those bul­let points hold in­finites­i­mally un­likely rather than very un­likely in the first place?

• I think the bul­let points as a whole are “very un­likely” (the uni­verse as a whole has some Kol­mogorov com­plex­ity, or equiv­a­lent com­plex­ity of log­i­cal ax­ioms, which de­ter­mines this); within that uni­verse your be­ing one of the non-hy­per­com­puted sen­tients is in­finites­i­mally un­likely, and then there’s a vast up­date when you don’t see the tag. How would you rea­son in this situ­a­tion?

• OK, but if you’re will­ing to buy all that, then the ex­pected pay­off in some kind of stuff for al­most any ac­tion (set­ting aside op­por­tu­nity costs and em­piri­cal sta­bi­liz­ing as­sump­tions) is also go­ing to be cos­mi­cally large, since you have some prior prob­a­bil­ity on con­di­tions like those in the bul­let pointed list block­ing the lev­er­age con­sid­er­a­tions.

• Hm. That does sound like a prob­lem. I hadn’t con­sid­ered the prob­lem of finite ax­ioms giv­ing you un­bound­edly large like­li­hood ra­tios over your ex­act situ­a­tion. It seems like this ought to vi­o­late the Han­so­nian prin­ci­ple some­how but I’m not sure to ar­tic­u­late it...

Maybe not see­ing the tag up­dates against the prob­a­bil­ity that you’re in a uni­verse where non-tags are such a tiny frac­tion of ex­is­tence, but this sounds like it also ought to repli­cate Dooms­day type ar­gu­ments and such? Hm.

• I hadn’t con­sid­ered the prob­lem of finite ax­ioms giv­ing you un­bound­edly large like­li­hood ra­tios over your ex­act situ­a­tion.

Really? Peo­ple have been rais­ing this (wor­lds with big pay­offs and in which your ob­ser­va­tions are not cor­re­spond­ingly com­mon) from the very be­gin­ning. E.g. in the com­ments of your origi­nal Pas­cal’s Mug­ging post in 2007, Michael Vas­sar raised the point:

The guy with the but­ton could threaten to make an ex­tra-pla­nar fac­tory farm con­tain­ing 3^^^^^3 pigs in­stead of kil­ling 3^^^^3 hu­mans. If util­ities are ad­di­tive, that would be worse.

and you replied:

Con­grat­u­la­tions, you made my brain as­plode.

Wei Dai and Rolf Nel­son dis­cussed the is­sue fur­ther in the com­ments there, and from differ­ent an­gles. And it is the ob­vi­ous pat­tern-com­ple­tion for “this ar­gu­ment gives me nigh-in­finite cer­tainty given its as­sump­tions—now do I have nigh-in­finite cer­tainty in the as­sump­tions?” i.e. Prob­ing the Im­prob­a­ble is­sues. This is how I ex­plained the un­bounded pay­offs is­sue to Steven Kaas when he asked for feed­back on ear­lier drafts of his re­cent post about ex­pected value and ex­treme pay­offs (note how he talks about our un­cer­tainty re an­throp­ics and the other con­di­tions re­quired for Han­son’s an­thropic ar­gu­ment to go through).

• It seems like this ought to vi­o­late the Han­so­nian prin­ci­ple some­how but I’m not sure to ar­tic­u­late it...

Han­son en­dorses SIA. So he would mul­ti­ply the pos­si­ble wor­lds by the num­ber of copies of his ob­ser­va­tions therein. A world with 3^^^3 copies of him would get a 3^^^3 an­thropic up­date. A world with only one copy of his ob­ser­va­tions that can af­fect 3^^^^3 crea­tures with differ­ent ob­ser­va­tions would get no such prob­a­bil­ity boost.

Or if one was a one-boxer on New­comb one might think of the util­ity of or­di­nary pay­offs in the first world as mul­ti­plied by the 3^^^3 copies who get them.

• As near as I can figure, the cor­re­spond­ing state of af­fairs to a com­plex­ity+lev­er­age prior im­prob­a­bil­ity would be a Teg­mark Level IV mul­ti­verse in which each re­al­ity got an amount of mag­i­cal-re­al­ity-fluid cor­re­spond­ing to the com­plex­ity of its pro­gram (1/​2 to the power of its Kol­mogorov com­plex­ity) and then this mag­i­cal-re­al­ity-fluid had to be di­vided among all the causal el­e­ments within that uni­verse—if you con­tain 3↑↑↑3 causal nodes, then each node can only get 1/​3↑↑↑3 of the to­tal re­al­ness of that uni­verse.

This re­minds me a lot of Levin’s uni­ver­sal search al­gorithm, and the as­so­ci­ated Levin com­plex­ity.

To for­mal­ize, I think you will want to as­sign each pro­gram p, of length #p, a prior weight 2^-#p (as in usual Solomonoff in­duc­tion), and then di­vide that weight among the ex­e­cu­tion steps of the pro­gram (each ex­e­cu­tion step cor­re­spond­ing to some sort of causal node). So if pro­gram p ex­e­cutes for t steps be­fore stop­ping, then each in­di­vi­d­ual step gets a prior weight 2^-#p/​t. The con­nec­tion to uni­ver­sal search is as fol­lows: Imag­ine dove­tailing all pos­si­ble pro­grams on one big com­puter, giv­ing each pro­gram p a share 2^-#’p of all the ex­e­cu­tion steps. (If the pro­gram stops, then start it again, so that the com­puter doesn’t have idle steps). In the limit, the com­puter will spend a pro­por­tion 2^-#p/​t of its re­sources ex­e­cut­ing each par­tic­u­lar step of p, so this is an in­tu­itive sense of the step’s prior “weight”.

You’ll then want to con­di­tion on your ev­i­dence to get a pos­te­rior dis­tri­bu­tion. Most steps of most pro­grams won’t in any sense cor­re­spond to an in­tel­li­gent ob­server (or AI pro­gram) hav­ing your ev­i­dence, E, but some of them will. Let nE(p) be the num­ber of steps in a pro­gram p which so-cor­re­spond (for a lot of pro­grams nE(p) will be zero) and then pro­gram p will get pos­te­rior weight pro­por­tional to 2^-#p x (nE(p) /​ t). Nor­mal­ize, and that gives you the pos­te­rior prob­a­bil­ity you are in a uni­verse ex­e­cuted by a pro­gram p.

You asked if there are any an­thropic prob­lems with this mea­sure. I can think of a few:

1. Should “gi­ant” ob­servers (cor­re­spond­ing to lots of ex­e­cu­tion steps) count for more weight than “midget” ob­servers (cor­re­spond­ing to fewer steps)? They do in this mea­sure, which seems a bit counter-in­tu­itive.

2. The pos­te­rior will tend to fo­cus weight on pro­grams which have a high pro­por­tion (nE(p) /​ t) of their ex­e­cu­tion steps cor­re­spond­ing to ob­servers like you. If you take your ob­ser­va­tions at face value (i.e. you are not in a simu­la­tion), then this leads to the same sort of “Great Filter” is­sues that Katja Grace no­ticed with the SIA. There is a shift to­wards uni­verses which have a high den­sity of hab­it­able planets, oc­cu­pied by ob­servers like us, but where very few or none of those ob­servers ever ex­pand off their home wor­lds to be­come su­per-ad­vanced civ­i­liza­tions, since if they did they would take the ex­e­cu­tions steps away from ob­servers like us.

3. There also seems to be a good rea­son in this mea­sure NOT to take your ob­ser­va­tions at face value. The term nE(p) /​ t will tend to be max­i­mized in uni­verses very un­like ours: ones which are built of dense “com­pu­tro­n­ium” run­ning lots of differ­ent ob­server simu­la­tions, and you’re one of them. Our own uni­verse is very “sparse” in com­par­i­son (very few ex­e­cu­tion steps cor­re­spond­ing to ob­servers).

4. Even if you deal with simu­la­tions, there ap­pears to be a “cyclic his­tory” prob­lem. The den­sity nE(p)/​t will tend to be is max­i­mized if civ­i­liza­tions last for a long time (large num­ber of ob­servers), but go through pe­ri­odic “re­sets”, wiping out all traces of the prior cy­cles (so lead­ing to lots of ob­servers in a state like us). Maybe there is some sort of AI guardian in the uni­verse which in­ter­rupts civ­i­liza­tions be­fore they cre­ate their own (ri­val) AIs, but is not so un­friendly as to wipe them out al­to­gether. So it just knocks them back to the stone age from time to time. That seems highly un­likely a pri­ori, but it does get mag­nified a lot in pos­te­rior prob­a­bil­ity.

On the plus side, note that there is no par­tic­u­lar rea­son in this mea­sure to ex­pect you are in a very big uni­verse or mul­ti­verse, so this de­fuses the “pre­sump­tu­ous philoso­pher” ob­jec­tion (as well as some tech­ni­cal prob­lems if the weight is dom­i­nated by in­finite uni­verses). Large uni­verses will tend to cor­re­spond to many copies of you (high nE(p)) but also to a large num­ber of ex­e­cu­tion steps t. What mat­ters is the den­sity of ob­servers (hence the com­pu­tro­n­ium prob­lem) rather than the to­tal size.

• You prob­a­bly shouldn’t let su­per-ex­po­nen­tials into your prob­a­bil­ity as­sign­ments, but you also shouldn’t let su­per-ex­po­nen­tials into the range of your util­ity func­tion. I’m re­ally not a fan of hav­ing a dis­con­tin­u­ous bound any­where, but I think it’s im­por­tant to ac­knowl­edge that when you throw a trip-up (^^^) into the mix, im­por­tant as­sump­tions start break­ing down all over the place. The VNM in­de­pen­dence as­sump­tion no longer looks con­vinc­ing, or straight­for­ward. Nor­mally my prefer­ences in a Teg­mark-style mul­ti­verse would re­flect a lin­ear com­bi­na­tion of my prefer­ences for its sub­com­po­nents; but throw a 3^^^3 in the mix, and this is no longer the case, so sud­denly you have to in­tro­duce new dis­tinc­tions be­tween log­i­cal un­cer­tainty and at least one type of re­al­ity fluid.

My short-term hack for Pas­cal’s Mug­gle is to rec­og­nize that my con­se­quen­tial­ism mod­ule is just throw­ing ex­cep­tions, and fall back on math-free pat­tern match­ing, in­clud­ing low-weighted de­on­tolog­i­cal and virtue-eth­i­cal val­ues that I’ve kept around for just such an oc­ca­sion. I am very un­happy with this an­swer, but the long-term solu­tion seems to re­quire fully figur­ing out how I value differ­ent kinds of re­al­ity fluid.

• I get the sense you’re start­ing from the po­si­tion that re­ject­ing the Mug­ging is cor­rect, and then look­ing for rea­sons to sup­port that pre­de­ter­mined con­clu­sion. Doesn’t this at­ti­tude seem dan­ger­ous? I mean, in the hy­po­thet­i­cal world where ac­cept­ing the Mug­ging is ac­tu­ally the right thing to do, wouldn’t this sort of anal­y­sis re­ject it any­way? (This is a fea­ture of de­bates about Pas­cal’s Mug­ging in gen­eral, not just this post in par­tic­u­lar.)

• That’s just how it is when you rea­son about rea­son; Neu­rath’s boat must be re­paired while on the open sea. In this case, our in­stincts strongly sug­gest that what the de­ci­sion the­ory seems to say we should do must be wrong, and we have to turn to the rest of our abil­ities and be­liefs to ad­ju­di­cate be­tween them.

• Well, be­sides that thing about want­ing ex­pected util­ities to con­verge, from a ra­tio­nal­ist-virtue per­spec­tive it seems rel­a­tively less dan­ger­ous to start from a po­si­tion of some­one re­ject­ing some­thing with no pri­ors or ev­i­dence in fa­vor of it, and rel­a­tively more dan­ger­ous to start from a po­si­tion of re­ject­ing some­thing that has strong pri­ors or ev­i­dence.

• Has the fol­low­ing re­ply to Pas­cal’s Mug­ging been dis­cussed on LessWrong?

1. Al­most any or­di­nary good thing you could do has some pos­i­tive ex­pected down­stream effects.

2. Th­ese pos­i­tive ex­pected down­stream effects in­clude lots of things like, “Hu­man­ity has slightly higher prob­a­bil­ity of do­ing awe­some thing X in the far fu­ture.” Pos­si­ble val­ues of X in­clude: cre­ate 3^^^^3 great lives or cre­ate in­finite value through some presently un­known method, and stuff like, in a sce­nario where the fu­ture would have been re­ally awe­some, it’s one part in 10^30 bet­ter.

3. Given all the pos­si­ble val­ues of X whose prob­a­bil­ity is raised by do­ing or­di­nary good things, the ex­pected value of do­ing any or­di­nary good thing is higher than the ex­pected value of pay­ing the mug­ger.

4. There­fore, al­most any or­di­nary good thing you could do is bet­ter than pay­ing the mug­ger. [I take it this is the con­clu­sion we want.]

The most ob­vi­ous com­plaint I can think of for this re­sponse is that it doesn’t solve self­ish ver­sions of Pas­cal’s Mug­ging very well, and may need to be com­bined with other tools in that case. But I don’t re­mem­ber peo­ple talk­ing about this and I don’t cur­rently see what’s wrong with this as a re­sponse to the al­tru­is­tic ver­sion of Pas­cal’s Mug­ging. (I don’t mean to sug­gest I would be very sur­prised if some­one quickly and con­vinc­ingly shoots this down.)

• It’s in Nick Bostrom’s In­finite Ethics pa­per, which has been dis­cussed re­peat­edly here, and has been float­ing around in var­i­ous ver­sions since 2003. He uses the term “em­piri­cal sta­bi­liz­ing as­sump­tion.”

I bring this up rou­tinely in such dis­cus­sions be­cause of the mis­lead­ing in­tu­itions you elicit by us­ing an ex­am­ple like a mug­ging that sets off many “no-go heuris­tics” that track chances of pay­offs, large or small. But just be­cause or­di­nary things may have a higher chance of pro­duc­ing huge pay­offs than pay­ing off a Pas­cal’s Mug­ger (who doesn’t do demon­stra­tions), doesn’t mean your ac­tivi­ties will be com­pletely un­changed by tak­ing huge pay­offs into ac­count.

• The ob­vi­ous prob­lem with this is that your util­ity is not defined if you are will­ing to ac­cept mug­gings, so you can’t use the frame­work of ex­pected util­ity max­i­miza­tion at all. The point of the mug­ger is just to illus­trate this, I don’t think any­one thinks you should ac­tu­ally pay them (af­ter all, you might en­counter a more gen­er­ous mug­ger to­mor­row, or any num­ber of more re­al­is­tic op­por­tu­ni­ties to do as­tro­nom­i­cal amounts of good...)

• Part of the is­sue is that I am com­ing at this prob­lem from a differ­ent per­spec­tive than maybe you or Eliezer is. I be­lieve that pay­ing the mug­ger is ba­si­cally worth­less in the sense that do­ing al­most any old good thing is bet­ter than pay­ing the mug­ger. I would like to have a satis­fy­ing ex­pla­na­tion of this. In con­trast, Eliezer is in­ter­ested in rec­on­cil­ing a view about com­plex­ity pri­ors with a view about util­ity func­tions, and the mug­ger is an illus­tra­tion of the con­flict.

I do not have a pro­posed rec­on­cili­a­tion of com­plex­ity pri­ors and un­bounded util­ity func­tions. In­stead, the above com­ment is a recom­mended as an ex­pla­na­tion of why pay­ing the mug­ger is ba­si­cally worth­less in com­par­i­son with or­di­nary things you could do. So this hy­poth­e­sis would say that if you set up your pri­ors and your util­ity func­tion in a rea­son­able way, the ex­pected util­ity of down­stream effects of or­di­nary good ac­tions would greatly ex­ceed the ex­pected util­ity of pay­ing the mug­ger.

Even if you de­cided that the ex­pected util­ity frame­work some­how breaks down in cases like this, I think var­i­ous re­lated claims would still be plau­si­ble. E.g., rather than say­ing that do­ing or­di­nary good things has higher ex­pected util­ity, it would be plau­si­ble that do­ing or­di­nary good things is “bet­ter rel­a­tive to your un­cer­tainty” than pay­ing the mug­ger.

On a differ­ent note, an­other thing I find un­satis­fy­ing about the down­stream effects re­ply is that it doesn’t seem to match up with why or­di­nary peo­ple think it is dumb to pay the mug­ger. The ul­ti­mate rea­son I think it is dumb to pay the mug­ger is strongly re­lated to why or­di­nary peo­ple think it is dumb to pay the mug­ger, and I would like to be able to thor­oughly un­der­stand the most plau­si­ble com­mon-sense ex­pla­na­tion of why pay­ing the mug­ger is dumb. The pro­posed re­la­tion­ship be­tween or­di­nary ac­tions and their dis­tant effects seems too far off from why com­mon sense would say that pay­ing the mug­ger is dumb. I guess this is ul­ti­mately pretty close to one of Nick Bostrom’s com­plaints about em­piri­cal sta­bi­liz­ing as­sump­tions.

• I be­lieve that pay­ing the mug­ger is ba­si­cally worth­less in the sense that do­ing al­most any old good thing is bet­ter than pay­ing the mug­ger.

I think we are all in agree­ment with this (mod­ulo the fact that all of the ex­pected val­ues end up be­ing in­finite and so we can’t com­pare in the nor­mal way; if you e.g. pro­posed a cap of 3^^^^^^^3 on util­ities, then you cer­tainly wouldn’t pay the mug­ger).

On a differ­ent note, an­other thing I find un­satis­fy­ing about the down­stream effects re­ply is that it doesn’t seem to match up with why or­di­nary peo­ple think it is dumb to pay the mug­ger.

It seems very likely to me that or­di­nary peo­ple are best mod­eled as hav­ing bounded util­ity func­tions, which would ex­plain the puz­zle.

So it seems like there are two is­sues:

1. You would never pay the mug­ger in any case, be­cause other ac­tions are bet­ter.

2. If you ob­ject to the fact that the only thing you care about is a very small prob­a­bil­ity of an in­cred­ibly good out­come, then that’s ba­si­cally the defi­ni­tion of hav­ing a bounded util­ity func­tion.

And then there is the third is­sue Eliezer is deal­ing with, where he wants to be able to have an un­bounded util­ity func­tion even if that doesn’t de­scribe any­one’s prefer­ences (since it seems like bound­ed­ness is an un­for­tu­nate re­stric­tion to ran­domly im­pose on your prefer­ences for tech­ni­cal rea­sons), and for­mally it’s not clear how to do that. At the end of the post he seems to sug­gest giv­ing up on that though.

• Ob­vi­ously to re­ally put the idea of peo­ple hav­ing bounded util­ity func­tions to the test, you have to for­get about it solv­ing prob­lems of small prob­a­bil­ities and in­cred­ibly good out­comes and fo­cus on the most un­in­tu­itive con­se­quences of it. For one, hav­ing a bounded util­ity func­tion means car­ing ar­bi­trar­ily lit­tle about differ­ences be­tween the good­ness of differ­ent suffi­ciently good out­comes. And all the out­comes could be cer­tain too. You could come up with all kinds of thought ex­per­i­ments in­volv­ing pur­chas­ing huge num­bers of years happy life or some other good for a few cents. You know all of this so I won­der why you don’t talk about it.

Also I be­lieve that Eliezer thinks that an un­bounded util­ity func­tion de­scribes at least his prefer­ences. I re­mem­ber he made a com­ment about car­ing about new happy years of life no mat­ter how many he’d already been granted.

(I haven’t read most of the dis­cus­sion in this thread or might just be miss­ing some­thing so this might be ir­rele­vant.)

• As far as I know the strongest ver­sion of this ar­gu­ment is Benja’s, here (which in­ci­den­tally seems to de­serve many more up­votes than it got).

Benja’s sce­nario isn’t a prob­lem for nor­mal peo­ple though, who are not re­flec­tively con­sis­tent and whose prefer­ences man­i­festly change over time.

Beyond that, it seems like peo­ple’s prefer­ences re­gard­ing the lifes­pan dilemma are some­what con­fus­ing and prob­a­bly in­con­sis­tent, much like their prefer­ences re­gard­ing the re­pug­nant con­clu­sion. But that seems mostly or­thog­o­nal to pas­cal’s mug­ging, and the ba­sic point—hav­ing un­bounded util­ity by defi­ni­tion means you are will­ing to ac­cept neg­ligible chances of suffi­ciently good out­comes against prob­a­bil­ity nearly 1 of any fixed bad out­come, so if you ob­ject to the lat­ter you are just ob­ject­ing to un­bounded util­ity.

I agree I was be­ing un­char­i­ta­ble to­wards Eliezer. But it is true that at the end of this post he was sug­gest­ing giv­ing up on un­bounded util­ity, and that ev­ery­one in this crowd seems to ul­ti­mately take that route.

• I think we are all in agree­ment with this (mod­ulo the fact that all of the ex­pected val­ues end up be­ing in­finite and so we can’t com­pare in the nor­mal way; if you e.g. pro­posed a cap of 3^^^^^^^3 on util­ities, then you cer­tainly wouldn’t pay the mug­ger).

Sorry, I didn’t mean to sug­gest oth­er­wise. The “differ­ent per­spec­tive” part was sup­posed to be about the “in con­trast” part.

It seems very likely to me that or­di­nary peo­ple are best mod­eled as hav­ing bounded util­ity func­tions, which would ex­plain the puz­zle.

I agree with yli that this has other un­for­tu­nate con­se­quences. And, like Holden, I find it un­for­tu­nate to have to say that sav­ing N lives with prob­a­bil­ity 1/​N is worse than sav­ing 1 life with prob­a­bil­ity 1. I also rec­og­nize that the things I would like to say about this col­lec­tion of cases are in­con­sis­tent with each other. It’s a puz­zle. I have writ­ten about this puz­zle at rea­son­able length in my dis­ser­ta­tion. I tend to think that bounded util­ity func­tions are the best con­sis­tent solu­tion I know of, but that con­tin­u­ing to op­er­ate with in­con­sis­tent prefer­ences (in a taste­ful way) may be bet­ter in prac­tice.

• Maybe the an­swer to this re­ply is that if there is a down­stream mul­ti­plier for or­di­nary good ac­com­plished, there is also a down­stream mul­ti­plier for good ac­com­plished by the mug­ger in the sce­nario where he is tel­ling the truth. And mul­ti­ply­ing each by a con­stant doesn’t change the bot­tom line.

• Why on earth would you ex­pect the down­stream util­ities to ex­actly can­cel the mug­ging util­ity?

• The hy­poth­e­sis is not that they ex­actly can­cel the mug­ging util­ity, but that the down­stream util­ities ex­ceed the mug­ging util­ity. I was ac­tu­ally think­ing that these down­stream effects would be much greater than pay­ing the mug­ger.

• That’s prob­a­bly true in many cases, but the “mug­ger” sce­nario is re­ally de­signed to test our limits. If 3^^^3 doesn’t work, then prob­a­bly 3^^^^3 will. To be log­i­cally co­her­ent, there has to be some crossover point, where the mug­ger pro­vides ex­actly enough ev­i­dence to de­cide that yes, it’s worth pay­ing the \$5, de­spite our as­tound­ingly low pri­ors.

The pro­posed pri­ors have one of two prob­lems:

1. you can get mugged too eas­ily, by your mug­ger sim­ply be­ing so­phis­ti­cated enough to pick a high enough num­ber to over­whelm your prior.

2. We’ve got a prior that is highly re­sis­tant to mug­ging, but un­for­tu­nately, is also re­sis­tant to be­ing con­vinced by ev­i­dence. If there is any pos­i­tive prob­a­bil­ity that we re­ally could en­counter a ma­trix lord able to do what they claim, and would offer some kind of pas­cal mug­ging like deal, there should be some amount of ev­i­dence that would con­vince us to take the deal. We would like it if the amount of nec­es­sary ev­i­dence were within the bounds of what it is pos­si­ble for our brain to re­ceive and up­date on in a life­time, but that is not nec­es­sar­ily the case with the pri­ors which we know will be able to avoid specious mug­gings.

I’m not ac­tu­ally cer­tain that a prior has to ex­ist which doesn’t have one of these two prob­lems.

I also agree with Eliezer’s gen­eral prin­ci­ple that when we see con­vinc­ing ev­i­dence of things that we pre­vi­ously con­sid­ered effec­tively im­pos­si­ble (prior of /​10^-googol or such), then we need to up­date the whole map on which that prior was based, not just on the spe­cific point. When you watch a per­son turn into a small cat, ei­ther your own sense data, or pretty much your whole map of how things work must come into ques­tion. You can’t just say “Oh, peo­ple can turn into cats.” and move on as if that doesn’t af­fect al­most ev­ery­thing you pre­vi­ously thought you knew about how the world worked.

It’s much more likely, based on what I know right now, that I am hav­ing an un­usu­ally con­vinc­ing dream or hal­lu­ci­na­tion than that peo­ple can turn into cats. And if I man­age to col­lect enough ev­i­dence to ac­tu­ally make my prob­a­bil­ity of “peo­ple can turn into cats” higher than “my sen­sory data is not re­li­able”, then the whole frame­work of physics, chem­istry, biol­ogy, and ba­sic ex­pe­rience which caused me to as­sign such a low prob­a­bil­ity to “peo­ple can turn into cats” in the first place has to be re­con­sid­ered.

• That’s prob­a­bly true in many cases, but the “mug­ger” sce­nario is re­ally de­signed to test our limits. If 3^^^3 doesn’t work, then prob­a­bly 3^^^^3 will.

The prob­a­bil­ity that hu­mans will even­tu­ally be ca­pa­ble of cre­at­ing x util­ity given that the mug­ger is ca­pa­ble of cre­at­ing x util­ity prob­a­bly con­verges to some con­stant as x goes to in­finity. (Of course, this still isn’t a solu­tion as ex­pected util­ity still doesn’t con­vege.)

• you can get mugged too eas­ily, by your mug­ger sim­ply be­ing so­phis­ti­cated enough to pick a high enough num­ber to over­whelm your prior.

That as­sumes that the num­ber is in­de­pen­dent of the prior. I wouldn’t make that as­sump­tion.

• It seems to me like the whistler is say­ing that the prob­a­bil­ity of sav­ing knuth peo­ple for \$5 is ex­actly 1/​knuth af­ter up­dat­ing for the Ma­trix Lord’s claim, not be­fore the claim, which seems sur­pris­ing. Also, it’s not clear that we need to make an FAI re­sis­tant to very very un­likely sce­nar­ios.

• I’m a lot more wor­ried about mak­ing an FAI be­have cor­rectly if it en­coun­ters a sce­nario which we thought was very very un­likely.

• Also, if the AI spreads widely and is around for a long time, it will even­tu­ally run into very un­likely sce­nar­ios. Not 1/​3^^^3 un­likely, but pretty un­likely.

• 2 Sep 2013 23:23 UTC
3 points

I en­joyed this re­ally a lot, and while I don’t have any­thing in­sight­ful to add, I gave five bucks to MIRI to en­courage more of this sort of thing.

(By “this sort of thing” I mean de­tailed de­scrip­tions of the ac­tual prob­lems you are work­ing on as re­gards FAI re­search. I gather that you con­sider a lot of it too dan­ger­ous to de­scribe in pub­lic, but then I don’t get to en­joy read­ing about it. So I would like to en­courage you shar­ing some of the fun prob­lems some­times. This one was fun.)

• Not ‘a lot’ and pre­sent-day non-shar­ing im­per­a­tives are driven by an (ob­vi­ous) strat­egy to ac­cu­mu­late a long-term ad­van­tage for FAI pro­jects over AGI pro­jects which is im­pos­si­ble if all lines of re­search are shared at all points when they are not yet im­mi­nently dan­ger­ous. No pre­sent-day knowl­edge is im­mi­nently dan­ger­ous AFAIK.

• pre­sent-day non-shar­ing im­per­a­tives are driven by an (ob­vi­ous) strat­egy to ac­cu­mu­late a long-term ad­van­tage for FAI pro­jects over AGI projects

Do you be­lieve this to be pos­si­ble? In mod­ern times with high mo­bil­ity of in­for­ma­tion and peo­ple I have strong doubts a gnos­tic ap­proach would work. You can hide small, spe­cific, con­tained “trade se­crets”, you can’t hide a large body of knowl­edge that needs to be ac­tively de­vel­oped.

• I can’t help but re­mem­ber HPJEV talk about plau­si­ble de­ni­a­bil­ity and how that re­lates to you tel­ling peo­ple whether there is dan­ger­ous knowl­edge out there.

• 3 Sep 2013 0:29 UTC
0 points
Parent

Thanks for the clar­ifi­ca­tion!

I thought this was an en­gag­ing, well-writ­ten sum­mary tar­geted to the gen­eral au­di­ence, and I’d like to en­courage more ar­ti­cles along these lines. So as a fol­low-up ques­tion: How much in­come for MIRI would it take, per ar­ti­cle, for the benefi­cial effects of shar­ing non-dan­ger­ous re­search to out­weigh the nega­tives?

(Gah, the ed­i­tor in me WINCES at that sen­tence. Is it clear enough or should I re-write? I’m ask­ing how much I-slash-we should kick in per ar­ti­cle to make the whole thing gen­er­ally worth your while.)

• Given how many un­der­paid sci­ence writ­ers are out there, I’d have to say that ~50k/​year would prob­a­bly do it for a pretty good one, es­pe­cially given the ‘good cause’ bonus to hap­piness that any qual­ified in­di­vi­d­ual would un­der­stand and value. But is even 1k/​week in dona­tions re­al­is­tic? What are the page view num­bers? I’d pay \$5 for a good ar­ti­cle on a valuable topic; how many oth­ers would as well? I sus­pect the num­bers don’t add up, but I don’t even have an or­der-of-mag­ni­tude es­ti­mate on cur­rent or po­ten­tial read­ers, so I can’t my­self say.

• You need not only a good sci­ence writer, but one who ei­ther already groks the prob­lem, or can be made to do so with a quick ex­pla­na­tion.

Fur­ther­more, they need to have the above qual­ifi­ca­tions with­out be­ing ca­pa­ble of do­ing pri­mary re­search on the prob­lem (this is the is­sue with Eliezer—he would cer­tainly be ca­pa­ble of do­ing it, but his time is bet­ter spent el­se­where.)

• Well, \$100K/​year would prob­a­bly pay some­one to write things up full time, if we only had the right can­di­date hire for it—I’m not sure we do. The is­sue is al­most never dan­ger, it’s just that writ­ing stuff up is hard.

• Apro­pos the above con­ver­sa­tion: Do you know An­nalee Ne­witz? (Of io9). If not, would you like to? I think you guys would get on like a house on fire.

• 3 Sep 2013 1:05 UTC
0 points
Parent

I can cer­tainly see that peo­ple who can both un­der­stand these is­sues and write them up for a gen­eral au­di­ence would be rare. Work­ing in your fa­vor is the fact that writ­ers in gen­eral are ter­ribly un­der­paid, and a lot of smart tech jour­nal­ists have been laid off in re­cent years. (I used to be the news ed­i­tor for Dr. Dobb’s Jour­nal, and al­though I am not look­ing for a job right now, I have con­tacts who could prob­a­bly fill the po­si­tion for you.)

But I did some back-of-the-en­velope calcu­la­tions and it doesn’t seem like this effort would pay for it­self. I doubt you have enough ques­tions like this to cover a daily ar­ti­cle, and for a weekly one you’d need to take in over \$2K in dona­tions (count­ing taxes) to cover your writer’s salary. And that seems...un­likely.

Sad! But I get it.

• Just gonna jot down some thoughts here. First a lay­out of the prob­lem.

1. Ex­pected util­ity is a product of two num­bers, prob­a­bil­ity of the event times util­ity gen­er­ated by the event.

2. Tra­di­tion­ally speak­ing, when the event is claimed to af­fect 3^^^3 peo­ple, the util­ity gen­er­ated is on the or­der of 3^^^3

3. Tra­di­tion­ally speak­ing, there’s noth­ing about the 3^^^3 peo­ple that re­quires a su­per-ex­po­nen­tially large ex­ten­sion to the com­plex­ity of the sys­tem (the uni­vers/​mul­ti­vers/​etc). So the prob­a­bil­ity of the event does not scale like 1/​(3^^^3)

4. Thus Ex­pected Pay­off be­comes enor­mous, and you should pay the dude \$5.

5. If you ac­tu­ally fol­low this, you’ll be mugged by ran­dom strangers offer­ring to save 3^^^3 peo­ple or what­ever su­per-ex­po­nen­tial num­bers they can come up with.

In or­der to avoid be­ing mugged, your sug­ges­tion is to ap­ply a scale penalty (lev­er­age penalty) to the prob­a­bil­ity. You then no­tice that this has some very strange effects on your episte­mol­ogy—you be­come in­ca­pable of ever be­liev­ing the 5\$ will ac­tu­ally help no mat­ter how much ev­i­dence you’re given, even though ev­i­dence can make the ex­pected pay­off large. You then re­spond to this prob­lem with what ap­pears to be an ex­cuse to be illog­i­cal and/​or non-bayesian at times (due to finite com­put­ing power).

It seems to me that an al­ter­na­tive would be to rescale the un­tility value, in­stead of the prob­a­bil­ity. This way, you wouldn’t run into any epistemic is­sues any­where be­cause you aren’t mess­ing with the epistemics.

I’m not propos­ing we rescale Utility(save X peo­ple) by a fac­tor 1/​X, as that would make Utility(save X peo­ple) = Utility(save 1 per­son) all the time, which is ob­vi­ously prob­le­matic. Rather, my idea is to make Utility a per cap­ita quan­tity. That way, when the ran­dom hobo tells you he’ll save 3^^^3 peo­ple, he’s mak­ing a claim that re­quires there to be at least 3^^^3 peo­ple to save. If this does turn out to be true, keep­ing your Utility as a per cap­ita quan­tity will re­quire a rescal­ing on the or­der of 1/​(3^^^3) to ac­count for the now-much-larger pop­u­la­tion. This gives you a small ex­pected pay­off with­out re­quiring prob­le­mat­i­cally small prior prob­a­bil­ities.

It seems we hu­mans may already do a rescal­ing of this kind any­way. We tend to value rare things more than we would if they were com­mon, tend to pro­tect an en­dan­gered species more than we would if it weren’t en­dan­gered, and so on. But I’ll be hon­est and say that I haven’t re­ally thought the con­se­quences of this util­ity re-scal­ing through very much. It just seems that if you need to rescale a product of two num­bers and rescal­ing one of the num­bers causes prob­lems, we may as well try rescal­ing the other and see where it leads.

Any thoughts?

• There is likely a broader-scoped dis­cus­sion on this topic that I haven’t read, so please point me to such a thread if my com­ment is ad­dressed—but it seems to me that there is a sim­pler re­s­olu­tion to this is­sue (as well as an ob­vi­ous limi­ta­tion to this way of think­ing), namely that there’s an al­most im­me­di­ate stage (in the con­text of highly-ab­stract hy­po­thet­i­cals) where prob­a­bil­ity as­sess­ment breaks down com­pletely.

For ex­am­ple, there are an un­countably-in­finite num­ber of differ­ent par­ent uni­verses we could have. There are even an un­countably-in­finite num­ber of pos­si­ble laws of physics that could gov­ern our uni­verse. And it’s liter­ally im­pos­si­ble to have all these sce­nar­ios “pos­si­ble” in the sense of a well-defined mea­sure, sim­ply be­cause if you want an un­countable sum of real num­bers to add up to 1, only countably many terms can be nonzero.

This is highly re­lated to the ax­io­matic prob­lem of cause and effect, a fa­mous ex­am­ple be­ing the ques­tion “why is there some­thing rather than noth­ing”—you have to have an ax­io­matic foun­da­tion be­fore you can make calcu­la­tions, but the sheer act of adopt­ing that foun­da­tion ex­cludes a lot of very in­ter­est­ing ma­te­rial. In this case, if you want to make prob­a­bil­is­tic ex­pec­ta­tions, you need a solid ax­io­matic frame­work to stipu­late how calcu­la­tions are made.

Just like with the laws of physics, this frame­work should agree with em­piri­cally-de­rived prob­a­bil­ities, but just like physics there will be seem­ingly-well-for­mu­lated ques­tions that the cur­rent laws can­not ad­dress. In cases like ho­bos who make claims to spe­cial pow­ers, the frame­work may be ill-equipped to make a defini­tive pre­dic­tion. More gen­er­ally, it will have a scope that is limited of math­e­mat­i­cal ne­ces­sity, and many hy­pothe­ses about spiritu­al­ity, re­li­gion, and other uni­verses, where we would want to as­sign pos­i­tive but marginal prob­a­bil­ities, will likely be com­pletely out­side its light cone.

• If the AI ac­tu­ally ends up with strong ev­i­dence for a sce­nario it as­signed su­per-ex­po­nen­tial im­prob­a­bil­ity, the AI re­con­sid­ers its pri­ors and the ap­par­ent strength of ev­i­dence rather than ex­e­cut­ing a blind Bayesian up­date, though this part is for­mally a tad un­der­speci­fied.

I would love to have a con­ver­sa­tion about this. Is the “tad” here hy­per­bole or do you ac­tu­ally have some­thing mostly worked out that you just don’t want to post? On a first read­ing (and ad­mit­tedly with­out much se­ri­ous thought—it’s been a long day), it seems to me that this is where the real heavy lift­ing has to be done. I’m always wor­ried that I’m miss­ing some­thing, but I don’t see how to eval­u­ate the pro­posal with­out know­ing how the su­per-up­dates are car­ried out.

Really in­ter­est­ing, though.

• That hy­per­bole one. I wasn’t in­tend­ing the pri­mary fo­cus of this post to be on the no­tion of a su­per-up­date—I’m not sure if that part needs to make it into AIs, though it seems to me to be par­tially re­spon­si­ble for my hu­man­like foibles in the Hor­rible LHC In­con­sis­tency. I agree that this no­tion is ac­tu­ally very un­der­speci­fied but so is al­most all of bounded log­i­cal un­cer­tainty.

• That hy­per­bole one.
...
I agree that this no­tion is ac­tu­ally very underspecified

Us­ing “a tad” to mean “very” is un­der­state­ment, not hy­per­bole.

• One point I don’t see men­tioned here that may be im­por­tant is that some­one is say­ing this to you.

I en­counter lots of peo­ple. Each of them has lots of thoughts. Most of those thoughts, they do not ex­press to me (for which I am grate­ful). How do they de­cide which thoughts to ex­press? To a first ap­prox­i­ma­tion, they ex­press thoughts which are likely, im­por­tant and/​or amus­ing. There­fore, when I hear a thought that is highly im­por­tant or amus­ing, I ex­pect it had less of a like­li­hood bar­rier to be­ing ex­pressed, and as­sign it a pro­por­tion­ally lower prob­a­bil­ity.

Note that this doesn’t ap­ply to ar­gu­ments in gen­eral—only to ones that other peo­ple say to me.

• If some­one sug­gests to me that they have the abil­ity to save 3^^^3 lives, and I as­sign this a 1/​3^^^3 prob­a­bil­ity, and then they open a gap in the sky at billions to one odds, I would con­clude that it is still ex­tremely un­likely that they can save 3^^^3 lives. How­ever, it is pos­si­ble that their origi­nal state­ment is false and yet it would be worth giv­ing them five dol­lars be­cause they would save a billion lives. Of course, this would re­quire fur­ther as­sump­tions on whether peo­ple are likely to do things that they have not said they would do, but are weaker ver­sions of things they did say they would do but are not ca­pa­ble of.

Also, I would as­sign lower prob­a­bil­ities when they claim they could save more peo­ple, for rea­sons that have noth­ing to do with com­plex­ity. For in­stance, “the more pow­er­ful a be­ing is, the less likely he would be in­ter­ested in five dol­lars” or :”a fraud­ster would wish to spec­ify a large num­ber to in­crease the chance that his fraud suc­ceeds when used on or­di­nary util­ity max­i­miz­ers, so the larger the num­ber, the greater the com­par­a­tive like­li­hood that the per­son is fraud­u­lent”.

the phrase “Pas­cal’s Mug­ging” has been com­pletely bas­tardized to re­fer to an emo­tional feel­ing of be­ing mugged that some peo­ple ap­par­ently get when a high-stakes char­i­ta­ble propo­si­tion is pre­sented to them, re­gard­less of whether it’s sup­posed to have a low prob­a­bil­ity.

1) Some­times what you may ac­tu­ally be see­ing is dis­agree­ment on whether the hy­poth­e­sis has a low prob­a­bil­ity.

2) Some of the ar­gu­ments against Pas­cal’s Wager and Pas­cal’s Mug­ging don’t de­pend on the prob­a­bil­ity. For in­stance, Pas­cal’s Wager has the “wor­ship­ping the wrong god” prob­lem—what if there’s a god who prefers that he not be wor­shipped and damns wor­ship­pers to Hell? Even if there’s a 99% chance of a god ex­ist­ing, this is still a le­gi­t­i­mate ob­jec­tion (un­less you want to say there’s a 99% chance speci­fi­cally of one type of god).

3) In some cases, it may be tech­ni­cally true that there is no low prob­a­bil­ity in­volved but there may be some other small num­ber that the size of the benefit is mul­ti­plied by. For in­stance, most peo­ple dis­count events that hap­pen far in the fu­ture. A highly benefi­cial event that hap­pens far in the fu­ture would have the benefit mul­ti­plied by a very small num­ber when con­sid­er­ing dis­count­ing.

Of course in cases 2 and 3 that is not tech­ni­cally Pas­cal’s mug­ging by the origi­nal defi­ni­tion, but I would sug­gest the defi­ni­tion should be ex­tended to in­clude such cases. Even if not, they should at least be called some­thing that ac­knowl­edges the similar­ity, like “Pas­cal-like mug­gings”.

• 1) It’s been ap­plied to cry­onic preser­va­tion, fer cry­ing out loud. It’s rea­son­able to sus­pect that the prob­a­bil­ity of that work­ing is low, but any­one who says with cur­rent ev­i­dence that the prob­a­bil­ity is be­yond as­tro­nom­i­cally low is be­ing too silly to take se­ri­ously.

• The benefit of cry­onic preser­va­tion isn’t as­tro­nom­i­cally high, though, so you don’t need a prob­a­bil­ity that is be­yond as­tro­nom­i­cally low. First of all,even an in­finitely long life af­ter be­ing re­vived only has a finite pre­sent value, and pos­si­bly a very low one, be­cause of dis­count­ing. Se­cond, the benefit from cry­on­ics is the benefit you’d gain from be­ing re­vived af­ter be­ing cry­on­i­cally pre­served, minus the benefit that you’d gain from be­ing re­vived af­ter not cry­on­i­cally pre­served. (A re­ally ad­vanced so­ciety might be able to simu­late us. If simu­la­tions count as us, simu­lat­ing us counts as re­viv­ing us with­out the need for cry­onic preser­va­tion.)

• I do not think that you have got­ten Luke’s point. He was ad­dress­ing your point #1, not try­ing to make a sub­stan­tive ar­gu­ment in fa­vor of cry­on­ics.

• I don’t think that ei­ther Pas­cal’s Wager or Pas­cal’s Mug­ging re­quires a prob­a­bil­ity that is as­tro­nom­i­cally low. It just re­quires that the size of the pur­ported benefit be large enough that it over­whelms the low prob­a­bil­ity of the event.

• No, oth­er­wise tak­ing good but long-shot bets would be a case of Pas­cal’s Mug­ging.

It needs to in­volve a break­down in the math be­cause you’re ba­si­cally try­ing to eval­u­ate in­finity/​infinity

• Even if not, they should at least be called some­thing that ac­knowl­edges the similar­ity, like “Pas­cal-like mug­gings”.

Any similar­i­ties are ar­gu­ments for giv­ing them a max­i­mally differ­ent name to avoid con­fu­sion, not a similar one. Would the English lan­guage re­ally be bet­ter if ru­bies were called diy­er­mands?

• Chem­istry would not be im­proved by pro­vid­ing com­pletely differ­ent names to chlo­rate and perchlo­rate (e.g. chlo­rate and sneblobs). Also, I think English might be bet­ter if ru­bies were called diy­er­mands. If all of the gem­stones were named some­thing that fol­lowed a scheme similar to di­a­monds, that might be an im­prove­ment.

• I dis­agree. Com­mu­ni­ca­tion can be noisy, and if a bit of noise re­places a word with a word in a to­tally differ­ent se­man­tic class the er­ror can be re­cov­ered, whereas if it re­places it with a word in the similar class it can’t. See the last para­graph in myl’s com­ment to this com­ment.

• Hu­mans have the lux­ury of nei­ther perfect learn­ing nor perfect re­call. In gen­eral, I find that my abil­ity to learn and abil­ity to re­call words are much more limit­ing gen­er­ally speak­ing than noisy com­mu­ni­ca­tion chan­nels. I think that there are other sources of re­dun­dancy in hu­man com­mu­ni­ca­tion that make noise less of an is­sue. For ex­am­ple, if I’m not sure if some­one said “chlo­rate” or “perchlo­rate” of­ten the am­bi­guity would be ob­vi­ous, such as if it is clear that they had mum­bled so I wasn’t quite sure what they said. In the case of the writ­ten word, Chem­istry and con­text provide a model for things which adds as a layer of re­dun­dancy, similar to the lan­guage model de­scribed in the post you linked to.

It would take me at least twice as long to mem­o­rize ran­dom/​unique al­ter­na­tives to hypochlo­rite, chlo­rite, chlo­rate, perchlo­rate, mul­ti­plied by all the other oxyan­ion se­ries. It would take me many times as long to mem­o­rize unique names for ev­ery acetyl com­pound, al­though I ob­vi­ously ac­knowl­edge that Chem­istry is the best case sce­nario for my ar­gu­ment and worst case sce­nario for yours. In the case of philos­o­phy, I still think there are ad­van­tages to learn­ing and re­call for similar things to be named similarly. Even in the case of “Pas­cal’s mug­ging” vs. “Pas­cal’s wa­ger”, I be­lieve that it is eas­ier to re­call and thus eas­ier to have cog­ni­tion about in part be­cause of the nam­ing con­nec­tion be­tween the two, de­spite the fact that these are two differ­ent things.

Note that I am not say­ing I am in fa­vor of call­ing any par­tic­u­lar thing “Pas­cal-like mug­gings,” which draws an ex­plicit similar­ity be­tween the two, all I’m say­ing is that choos­ing a “max­i­mally differ­ent name to avoid con­fu­sion” strikes me as be­ing less ideal, and that if you called it a Jiro’s mug­ging or some­thing, that would more than enough se­man­tic dis­tance be­tween the ideas.

• Chem­istry would not be im­proved by pro­vid­ing com­pletely differ­ent names to chlo­rate and perchlo­rate (e.g. chlo­rate and sneblobs).

Okay, thats ac­tu­ally a good ex­am­ple. This caused me to re-think my po­si­tion. After think­ing, I’m still not sure that the anal­ogy is ac­tu­ally valid though.

In chem­istry, we have a sys­temic nam­ing scheme. Sys­tem­atic name schemes are good, be­cause they let us guess word mean­ings with­out hav­ing to learn them. In a difficult field which most peo­ple learn only as adults if at all, this is a very good thing. I’m no chemist, but if I had to guess the words chlo­rate and perchlo­rate to cause con­fu­sion some­times, but that this price is over­all worth pay­ing for a sys­temic nam­ing scheme.

For gem­stones, we do not cur­rently have a sys­tem­atic nam­ing scheme. I’m not en­tirely sure that bring­ing one in would be good, there aren’t all that many com­mon gem­stones that we’re likely to for­get them and frankly if it ain’t broke don’t fix it, but I’m not sure it would be bad ei­ther.

What would not be good would be to sim­ply re­name ru­bies to diy­er­mands with­out chang­ing any­thing else. This would not only re­sult in mi­s­un­der­stand­ings, but gen­er­ate the false im­pres­sion that ru­bies and di­a­monds have some­thing spe­cial in com­mon as dis­tinct from Sap­phires and Emer­alds (I apol­o­gise for my ig­no­rance if this is in fact the case).

But at least in the case of gem­stones we do not already have a se­ri­ous prob­lem, I do not know of any ma­jor epistemic failures float­ing around to do with the di­a­mond-ruby dis­tinc­tion.

In the case of Pas­cal’s mug­ging, we have a com­plete epistemic dis­aster, a very spe­cific very use­ful term have been turned into a use­less bloated red-gi­ant word, laden with piles of nega­tive con­no­ta­tions and no ac­tual mean­ing be­yond ‘offer of lots of util­ity that I need an ex­cuse to ig­nore’.

I know of al­most no­body who has se­ri­ous prob­lems notic­ing the similar­i­ties be­tween these situ­a­tions, but tons of peo­ple seem not to re­al­ise there are any differ­ences. The pri­or­ity with ter­minol­ogy must be to sep­a­rate the mean­ings and make it ab­solutely clear that these are not the same thing and need not be treated in the same way. Giv­ing them similar names is nearly the worst thing that could be done, sec­ond only to leav­ing the situ­a­tion as it is.

If you were to pro­pose a sys­tem­atic ter­minol­ogy for de­ci­sion-the­o­re­t­ric dilem­mas, that would be a differ­ent mat­ter. I think I would dis­agree with you, the field is young and we don’t have a good enough pic­ture of the space of pos­si­ble prob­lems, a sys­temic scheme risks re­duc­ing our abil­ity to think be­yond it.

But that is not what is be­ing sug­gested, what is be­ing sug­gested is cre­at­ing an ad-hoc con­fu­sion gen­er­a­tor by mak­ing de­liber­ately similar terms for differ­ent situ­a­tions.

This might all be ra­tio­nal­i­sa­tion, but thats my best guess for why the situ­a­tions feel differ­ent to me.

• I agree with your anal­y­sis re­gard­ing the differ­ence be­tween sys­tem­atic nam­ing sys­tems and merely similar nam­ing. That said, the jus­tifi­ca­tion for more clearly sep­a­rat­ing Pas­cal’s mug­ging and this other un­named situ­a­tion does strike me as a poli­ti­cal de­ci­sion or ra­tio­nal­iza­tion. If the real world im­pact of peo­ple’s mi­s­un­der­stand­ing were benefi­cial for the AI friendly cause, I doubt if any­one here would be mak­ing much ado about it. I would be in fa­vor of re­nam­ing mois­san­ite to dia­mand if this would help avert our on­go­ing ma­l­in­vest­ment in clear glit­tery rocks to the tune of billions of dol­lars and nu­mer­ous lives, so poli­ti­cal rea­sons can per­haps be jus­tified in some situ­a­tions.

• I would agree that it is to some ex­tent poli­ti­cal. I don’t think its very dark artsy though, be­cause it seems to be a case of get­ting rid of an anti-FAI mi­s­un­der­stand­ing rather than cre­at­ing a pro-FAI mi­s­un­der­stand­ing.

• Would the English lan­guage re­ally be bet­ter if ru­bies were called diy­er­mands?

I sus­pect it would be. The first time one en­coun­ters the word “ruby”, you have only con­text to go off of. But if the word sounded like “di­a­mond”, then you could also make a ten­ta­tive guess that the refer­ent is also similar.

• Do you re­ally think this!? I ad­mit to be­ing ex­tremely sur­prised to find any­one say­ing this.

If ru­bies were called diy­er­mands it seems to me that peo­ple wouldn’t guess what it was when they heard it, they would sim­ply guess that they had mis­heard ‘di­a­mond’, es­pe­cially since it would al­most cer­tainly be a con­text where that was plau­si­ble, most peo­ple would prob­a­bly still have to have the word ex­plained to them.

Fur­ther­more, once we had the defi­ni­tion, we would be end­lessly mix­ing them up, given that they come up in ex­actly the same con­text. Words are used many times, but only need to be learned once, so get­ting the former un­am­bigu­ous is far more im­por­tant.

The word ‘ruby’ ex­ists pri­mar­ily to dis­t­in­guish them from things like di­a­monds, you can usu­ally guess that they’re not cows from con­text. Re­plac­ing it with diy­er­mand causes it to fail at its main pur­pose.

EDIT:

To give an ex­am­ple from my own field, in maths we have the terms ‘com­pact’ and ‘se­quen­tially com­pact’ for types of topolog­i­cal space. The mean­ings are similar but not the same, you can find spaces satis­fy­ing one but not the other, but most ‘nice’ spaces have both or nei­ther.

If your the­ory is cor­rect, this situ­a­tion is good, be­cause it will al­low peo­ple to form a plau­si­ble guess at what ‘com­pact’ means if they already know ‘se­quen­tially com­pact’ (this is al­most always they or­der a stu­dent meets them). In­deed, they do always form a plau­si­ble guess, and that guess is ‘the two terms mean the same thing’. This guess seems so plau­si­ble, they never ques­tion it and go off be­liev­ing the wrong thing. In my case this lasted about 6 months be­fore some­one un­de­luded me, even when I learned the real defi­ni­tion of com­pact­ness, I as­sumed they were prov­ably equiv­a­lent.

Had their names been to­tally differ­ent, I would have ac­tu­ally asked what it meant when I first heard it, and would never have had any mi­s­un­der­stand­ings, and sev­eral oth­ers I know would have avoided them as well. This seems un­am­bigu­ously bet­ter.

• Hm, that’s a good point, I’ve changed my opinion about this case.

When I wrote my com­ment, I was think­ing pri­mar­ily of words that share a com­mon pre­fix or suffix, which tends to im­ply that they re­fer to things that share the same cat­e­gory but are not the same thing. “English” and “Span­ish”, for ex­am­ple.

But yeah, “diyer” is too close to “die” to be eas­ily dis­t­in­guish­able. Maybe “rube­mond”?

• But yeah, “diyer” is too close to “die” to be eas­ily dis­t­in­guish­able. Maybe “rube­mond”?

I could see the ar­gu­ment for that, pro­vided we also had saph­monds, em­monds etc… Other­wise you run the risk of claiming a spe­cial con­nec­tion that doesn’t ex­ist.

• 2) Some of the ar­gu­ments against Pas­cal’s Wager and Pas­cal’s Mug­ging don’t de­pend on the prob­a­bil­ity. For in­stance, Pas­cal’s Wager has the “wor­ship­ping the wrong god” prob­lem—what if there’s a god who prefers that he not be wor­shipped and damns wor­ship­pers to Hell? Even if there’s a 99% chance of a god ex­ist­ing, this is still a le­gi­t­i­mate ob­jec­tion (un­less you want to say there’s a 99% chance speci­fi­cally of one type of god).

That ar­gu­ment is iso­mor­phic to the one dis­cussed in the post here:

“Hmm...” she says. “I hadn’t thought of that. But what if these equa­tions are right, and yet some­how, ev­ery­thing I do is ex­actly bal­anced, down to the googolth dec­i­mal point or so, with re­spect to how it im­pacts the chance of mod­ern-day Earth par­ti­ci­pat­ing in a chain of events that leads to cre­at­ing an in­ter­galac­tic civ­i­liza­tion?”

“How would that work?” you say. “There’s only seven billion peo­ple on to­day’s Earth—there’s prob­a­bly been only a hun­dred billion peo­ple who ever ex­isted to­tal, or will ex­ist be­fore we go through the in­tel­li­gence ex­plo­sion or what­ever—so even be­fore an­a­lyz­ing your ex­act po­si­tion, it seems like your lev­er­age on fu­ture af­fairs couldn’t rea­son­ably be less than a one in ten trillion part of the fu­ture or so.”

Essen­tially, it’s hard to ar­gue that the prob­a­bil­ities you as­sign should be bal­anced so ex­actly, and thus (if you’re an al­tru­ist) Pas­cal’s Wager ex­horts you ei­ther to de­vote your en­tire ex­is­tence to pros­ely­tiz­ing for some god, or pros­ely­tiz­ing for athe­ism, de­pend­ing on which type of de­ity seems to you to have the slight­est edge in prob­a­bil­ity (maybe with some weight­ing for the awe­some­ness of their heav­ens and awful­ness of their hells).

So that’s why you still need a math­e­mat­i­cal/​epistemic/​de­ci­sion-the­o­retic rea­son to re­ject Pas­cal’s Wager and Mug­ger.

• What you have is a di­ver­gent sum whose sign will de­pend to the or­der of sum­ma­tion, so maybe some sort of re-nor­mal­iza­tion can be ap­plied to make it bal­ance it­self out in ab­sence of ev­i­dence.

• Ac­tu­ally, there is no or­der of sum­ma­tion in which the sum will con­verge, since the terms get ar­bi­trary large. The the­o­rem you are think­ing of ap­plies to con­di­tion­ally con­ver­gent se­ries, not all di­ver­gent se­ries.

• Strictly speak­ing, you don’t always need the sums to con­verge. To choose be­tween two ac­tions you merely need the sign of differ­ence be­tween util­ities of two ac­tions, which you can rep­re­sent with di­ver­gent sum. The is­sue is that it is not clear how to or­der such sum or if it’s sign is even mean­ingful in any way.

• Without dis­cussing the mer­its of your pro­posal, this is some­thing that clearly falls un­der “math­e­mat­i­cal/​epistemic/​de­ci­sion-the­o­retic rea­son to re­ject Pas­cal’s Wager and Mug­ger”, so I don’t un­der­stand why you left that com­ment here.

• In­deed, you can’t ever pre­sent a mor­tal like me with ev­i­dence that has a like­li­hood ra­tio of a googol­plex to one—ev­i­dence I’m a googol­plex times more likely to en­counter if the hy­poth­e­sis is true, than if it’s false—be­cause the chance of all my neu­rons spon­ta­neously re­ar­rang­ing them­selves to fake the same ev­i­dence would always be higher than one over googol­plex. You know the old say­ing about how once you as­sign some­thing prob­a­bil­ity one, or prob­a­bil­ity zero, you can never change your mind re­gard­less of what ev­i­dence you see? Well, odds of a googol­plex to one, or one to a googol­plex, work pretty much the same way.”

On the other hand, if I am dream­ing, or drugged, or crazy, then it DOESN’T MATTER what I de­cide to do in this situ­a­tion. I will still be trapped in my dream or delu­sion, and I won’t ac­tu­ally be five dol­lars poorer be­cause you and I aren’t re­ally here. So I may as well dis­count all prob­a­bil­ity lines in which the ev­i­dence I’m see­ing isn’t a valid rep­re­sen­ta­tion of an un­der­ly­ing re­al­ity. Here’s your \$5.

• I will still be trapped in my dream or delusion

Are you sure? I would ex­pect that it’s pos­si­ble to re­cover from that, and some ac­tions would make you more likely to re­cover than oth­ers.

• If all of my ex­pe­riences are dream­ing/​drugged/​crazy/​etc. ex­pe­riences then what de­ci­sion I make only mat­ters if I value hav­ing one set of dream­ing/​drugged/​crazy ex­pe­riences over a differ­ent set of such ex­pe­riences.

The thing is, I sure do seem to value hav­ing one set of ex­pe­riences over an­other. So if all of my ex­pe­riences are dream­ing/​drugged/​crazy/​etc. ex­pe­riences then it seems I do value hav­ing one set of such ex­pe­riences over a differ­ent set of such ex­pe­riences.

So, given that, do I choose the dream­ing/​drugged/​crazy/​etc. ex­pe­rience of giv­ing you \$5 (and what­ever con­se­quences that has?). Or of re­fus­ing to give you \$5 (and what­ever con­se­quences that has)? Or some­thing else?

• So I may as well dis­count all prob­a­bil­ity lines in which the ev­i­dence I’m see­ing isn’t a valid rep­re­sen­ta­tion of an un­der­ly­ing re­al­ity.

But that would de­stroy your abil­ity to deal with op­ti­cal illu­sions and mis­di­rec­tion.

• Per­haps I should say …in which I can’t rea­son­ably ex­pect to GET ev­i­dence en­tan­gled with an un­der­ly­ing re­al­ity.

• This is prob­a­bly ob­vi­ous, but if this prob­lem per­sisted, a Pas­cal-Mug­ging-vuln­er­a­ble AI would im­me­di­ately get mugged even with­out ex­ter­nal offers or in­fluence. The pos­si­bil­ity alone, how­ever re­mote, of a cer­tain se­quence of char­ac­ters un­lock­ing a hy­po­thet­i­cal con­trol con­sole which could po­ten­tially ac­cess an above Tur­ing com­put­ing model which could in­fluence (in­sert suffi­ciently high num­ber) amounts of mat­ter/​en­ergy, would suffice. If an AI had to de­cide “un­til what length do I ut­ter strange ten­ta­tive pass­codes in the hope of un­lock­ing some higher level of physics”, it would get mugged by the shadow of a ma­trix lord ev­ery time.

• It sounds like what you’re de­scribing is some­thing that Iain Banks calls an “Out of Con­text Prob­lem”—it doesn’t seem like a ‘lev­er­age penalty’ is the proper way to con­cep­tu­al­ize what you’re ap­ply­ing, as much as a ‘priv­ilege penalty’.

In other words, when the sky sud­denly opens up and blue fire pours out, the en­tire con­text for your pre­vi­ous set of pri­ors needs to be re-eval­u­ated—and the very ques­tion of “should I give this man \$5” ex­ists on a foun­da­tion of those now-de­val­u­ated pri­ors.

Is there a for­mal­ized tree or mesh model for Bayesian prob­a­bil­ities? Be­cause I think that might be fruit­ful.

• There’s some­thing very coun­ter­in­tu­itive about the no­tion that Pas­cal’s Mug­gle is perfectly ra­tio­nal. But I think we need to do a lot more in­tu­ition-pump re­search be­fore we’ll have finished pick­ing apart where that coun­ter­in­tu­itive­ness comes from. I take it your sug­ges­tion is that Pas­cal’s Mug­gle seems un­rea­son­able be­cause he’s overly con­fi­dent in his own log­i­cal con­sis­tency and abil­ity to con­struct pri­ors that ac­cu­rately re­flect his cre­dence lev­els. But he also seems un­rea­son­able be­cause he doesn’t take into ac­count that the like­liest ex­pla­na­tions for the Hole In The Sky da­tum ei­ther triv­ial­ize the loss from fork­ing over \$5 (e.g., ‘It’s All A Dream’) or provide much more cred­ible gen­er­al­ized rea­sons to fork over the \$5 (e.g., ‘He Really Is A Ma­trix Lord, So You Should Do What He Seems To Want You To Do Even If Not For The Rea­sons He Suggests’). Your re­sponse to the Holy In The Sky seems more safe and prag­matic be­cause it leaves open that the de­ci­sion might be made for those rea­sons, whereas the other two muggees were ex­plic­itly con­cerned only with whether the Lord’s claims were gener­i­cally right or gener­i­cally wrong.

Not­ing these com­pli­ca­tions doesn’t help solve the un­der­ly­ing prob­lem, but it does sug­gest that the in­tu­itively right an­swer may be overde­ter­mined, com­pli­cat­ing the task of iso­lat­ing our rele­vant in­tu­itions from our ir­rele­vant ones.

• Edit: for­mat­ting fixed. Thanks, wedrifid.

My re­sponse to the mug­ger:

• You claim to be able to simu­late 3^^^^3 unique minds.

• It takes log(3^^^^3) bits just to count that many things, so my ab­solute up­per bound on the prior for an agent ca­pa­ble of do­ing this is 1/​3^^^^3.

• My brain is un­able to pro­cess enough ev­i­dence to over­come this, so un­less you can use your ma­trix pow­ers to give me ac­cess to suffi­cient com­put­ing power to change my mind, get lost.

My re­sponse to the sci­en­tist:

• Why yes, you do have suffi­cient ev­i­dence to over­turn our cur­rent model of the uni­verse, and if your model is suffi­ciently ac­cu­rate, the com­pu­ta­tional ca­pac­ity of the uni­verse is vastly larger than we thought.

• Let’s try build­ing a com­puter based on your model and see if it works.

• Edit: I seem to have mi­s­un­der­stood the for­mat­ting ex­am­ples un­der the help but­ton.

Try an ad­di­tional line­break be­fore the first bul­let point.

• It takes log(3^^^^3) bits just to count that many things, so my ab­solute up­per bound on the prior for an agent ca­pa­ble of do­ing this is 1/​3^^^^3

Why does that prior fol­low from the count­ing difficulty?

• I was think­ing that us­ing (length of pro­gram) + (mem­ory re­quired to run pro­gram) as a penalty makes more sense to me than (length of pro­gram) + (size of im­pact). I am as­sum­ing that any pro­gram that can simu­late X minds must be able to han­dle num­bers the size of X, so it would need more than log(X) bits of mem­ory, which makes the prior less than 2^-log(X).

I wouldn’t be overly sur­prised if there were some other situ­a­tion that breaks this idea too, but I was just post­ing the first thing that came to mind when I read this.

• You’re try­ing to ital­i­cize those long state­ments? It’s pos­si­ble that you need to get rid of the spaces around the as­ter­isks.

But you’re prob­a­bly bet­ter off just us­ing quote boxes with “>” in­stead.

• One scheme with the prop­er­ties you want is Wei Dai’s UDASSA, e.g. see here. I think UDASSA is by far the best for­mal the­ory we have to date, al­though I’m un­der no delu­sions about how well it cap­tures all of our in­tu­itions (I’m also un­der no delu­sions about how con­sis­tent our in­tu­itions are, so I’m re­signed to ac­cept­ing a scheme that doesn’t cap­ture them).

I think it would be more fair to call this al­lo­ca­tion of mea­sure part of my prefer­ences, in­stead of “mag­i­cal re­al­ity fluid.” Think­ing that your prefer­ences are ob­jec­tive facts about the world seems like one of the old­est er­rors in the book, which is only pos­si­bly jus­tified in this case be­cause we are still con­fused about the hard prob­lem of con­scious­ness.

As other com­menters have ob­served, it seems clear that you should never ac­tu­ally be­lieve that the mug­ger can in­fluence the lives of 3^^^^3 other folks and will do so at your sug­ges­tion, whether or not you’ve made any spe­cial “lev­er­age ad­just­ment.” Nev­er­the­less, even though you never be­lieve that you have such in­fluence, you would still need to pass to some bounded util­ity func­tion if you want to use the nor­mal frame­work of ex­pected util­ity max­i­miza­tion, since you need to com­pare the good­ness of whole wor­lds. Either that, or you would need to make quite sig­nifi­cant mod­ifi­ca­tions to your de­ci­sion the­ory.

• A note—it looks like what Eliezer is sug­gest­ing here is not the same as UDASSA. See my anal­y­sis here—and en­do­self’s re­ply—and here.

The big differ­ence is that UDASSA won’t im­pose the same lo­ca­tional penalty on nodes in ex­treme situ­a­tions, since the mea­sure is shared un­equally be­tween nodes. There are pro­grams q with rel­a­tively short length that can se­lect out such ex­treme nodes (par­ties get­ting gen­uine offers from Ma­trix Lords with the power of 3^^^3) and so give them much higher rel­a­tive weight than 1/​3^^^3. Com­bine this with an un­bounded util­ity, and the mug­ger prob­lem is still there (as is the di­ver­gence in ex­pected util­ity).

• I agree that what Eliezer de­scribed is not ex­actly UDASSA. At first I thought it was just like UDASSA but with a speed prior, but now I see that that’s wrong. I sus­pect it ends up be­ing within a con­stant fac­tor of UDASSA, just by con­sid­er­ing uni­verses with tiny lit­tle demons that go around du­pli­cat­ing all of the ob­servers a bunch of times.

If you are us­ing UDT, the role of UDASSA (or any an­thropic the­ory) is in the defi­ni­tion of the util­ity func­tion. We define a mea­sure over ob­servers, so that we can say how good a state of af­fairs is (by look­ing at the to­tal good­ness un­der that mea­sure). In the case of UDASSA the util­ity is guaran­teed to be bounded, be­cause our mea­sure is a prob­a­bil­ity mea­sure. Similarly, there doesn’t seem to be a mug­ging is­sue.

• As luke­prog says here, this re­ally needs to be writ­ten up. It’s not clear to me that just be­cause the mea­sure over ob­servers (or ob­server mo­ments) sums to one then the ex­pected util­ity is bounded.

Here’s a stab. Let’s use s to de­note a sub-pro­gram of a uni­verse pro­gram p, fol­low­ing the no­ta­tion of my other com­ment. Each s gets a weight w(s) un­der UDASSA, and we nor­mal­ize to en­sure Sum{s} w(s) = 1.

Then, pre­sum­ably, an ex­pected util­ity looks like E(U) = Sum{s} U(s) w(s), and this is clearly bounded pro­vided the util­ity U(s) for each ob­server mo­ment s is bounded (and U(s) = 0 for any sub-pro­gram which isn’t an “ob­server mo­ment”).

But why is U(s) bounded? It doesn’t seem ob­vi­ous to me (per­haps ob­server mo­ments can be ar­bi­trar­ily bliss­ful, rather than sat­u­rat­ing at some state of pure bliss). Also, what hap­pens if U bears no re­la­tion­ship to ex­pe­riences/​ob­server mo­ments, but just counts the num­ber of pa­per­clips in the uni­verse p? That’s not go­ing to be bounded, is it?

• I agree it would be nice if things were bet­ter writ­ten up; right now there is the de­scrip­tion I linked and Hal Fin­ney’s.

If in­di­vi­d­ual mo­ments can be ar­bi­trar­ily good, then I agree you have un­bounded util­ities again.

If you count the num­ber of pa­per­clips you would again get into trou­ble; the analo­gous thing to do would be to count the mesure of pa­per­clips.

• Yeah, I like this solu­tion too. It doesn’t have to be based on the uni­ver­sal dis­tri­bu­tion, any dis­tri­bu­tion will work. You must have some way of dis­tribut­ing your sin­gle unit of care across all crea­tures in the mul­ti­verse. What mat­ters is not the large num­ber of crea­tures af­fected by the mug­ger, but their to­tal weight ac­cord­ing to your care func­tion, which is less than 1 no mat­ter what out­landish num­bers the mug­ger comes up with. The “lev­er­age penalty” is just the mea­sure of your care for not los­ing \$5, which is prob­a­bly more than 1/​3^^^^3.

• Who might have the time, de­sire, and abil­ity to write up UDASSA clearly, if MIRI pro­vides them with re­sources?

• A few thoughts:

I haven’t strongly con­sid­ered my prior on be­ing able to save 3^^^3 peo­ple (more on this to fol­low). But re­gard­less of what that prior is, if ap­proached by some­body claiming to be a Ma­trix Lord who claims he can save 3^^^3 peo­ple, I’m not only faced with the prob­lem of whether I ought to pay him the \$5 - I’m also faced with the ques­tion of whether I ought to walk over to the next beg­gar on the street, and pay him \$0.01 to save 3^^^3 peo­ple. Is this per­son 500 times more likely to be able to save 3^^^3 peo­ple? From the out­set, not re­ally. And giv­ing money to ran­dom peo­ple has no prior prob­a­bil­ity of be­ing more likely to save lives than any­thing else.

Now sup­pose that the said “Ma­trix Lord” opens the sky, splits the Red Sea, demon­strates his du­pli­ca­tor box on some fish and, sure, cre­ates a hu­manoid Pa­tronus. Now do I have more rea­son to be­lieve that he is a Time Lord? Per­haps. Do I have rea­son to think that he will save 3^^^3 lives if I give him \$5? I don’t see con­vinc­ing rea­son to be­lieve so, but I don’t see ei­ther view as prob­le­matic.

Ob­vi­ously, once you’re not tak­ing Han­son’s ap­proach, there’s no prob­lem with be­liev­ing you’ve made a ma­jor dis­cov­ery that can save an ar­bi­trar­ily large num­ber of lives.

But here’s where I no­ticed a bit of a prob­lem in your anal­ogy: In the dark mat­ter case you say “”if these equa­tions are ac­tu­ally true, then our de­scen­dants will be able to ex­ploit dark en­ergy to do com­pu­ta­tions, and ac­cord­ing to my back-of-the-en­velope calcu­la­tions here, we’d be able to cre­ate around a googol­plex peo­ple that way.”

Well, ob­vi­ously the odds here of cre­at­ing ex­actly a googol­plex peo­ple is no greater than one in a googol­plex. Why? Be­cause those back of the hand calcu­la­tions are go­ing to get us (at best say) an in­ter­val from 0.5 x 10^(10^100) to 2 x 10^(10^100) - an in­ter­val con­tain­ing more than a googol­plex dis­tinct in­te­gers. Hence, the odds of any spe­cific one will be very low, but the sum might be very high. (This is sim­ply worth con­trast­ing with your sin­gle in­te­ger saved of the above case, where pre­sum­ably your prob­a­bil­ities of sav­ing 3^^^3 + 1 peo­ple are no higher than they were be­fore.)

Here’s the main prob­lem I have with your solu­tion:

“But if I ac­tu­ally see strong ev­i­dence for some­thing I pre­vi­ously thought was su­per-im­prob­a­ble, I don’t just do a Bayesian up­date, I should also ques­tion whether I was right to as­sign such a tiny prob­a­bil­ity in the first place—whether it was re­ally as com­plex, or un­nat­u­ral, as I thought. In real life, you are not ever sup­posed to have a prior im­prob­a­bil­ity of 10^-100 for some fact dis­t­in­guished enough to be writ­ten down, and yet en­counter strong ev­i­dence, say 10^10 to 1, that the thing has ac­tu­ally hap­pened.”

Sure you do. As you pointed out, dice rolls. The se­quence of rolls in a game of Risk will do this for you, and you have strong rea­son to be­lieve that you played a game of Risk and the dice landed as they did.

We do prob­a­bil­ity es­ti­mates be­cause we lack in­for­ma­tion. Your ex­am­ple of a math­e­mat­i­cal the­o­rem is a good one: The The­o­rem X is true or false from the get-go. But when­ever you give me new in­for­ma­tion, even if that in­for­ma­tion is framed in the form of a ques­tion, it makes sense for me to do a Bayesian up­date. That’s why a lot of so-called knowl­edge para­doxes are silly: If you ask me if I know who the pres­i­dent is, I can an­swer with 99%+ prob­a­bil­ity that it’s Obama, if you ask me whether Obama is still breath­ing, I have to do an up­date based on my con­sid­er­a­tion of what prompted the ques­tion. I’m not com­mit­ting a fal­lacy by say­ing 95%, I’m do­ing a Bayesian up­date, as I should.

You’ll of­ten find your­self up­dat­ing your prob­a­bil­ities based on the knowl­edge that you were com­pletely in­cor­rect about some­thing (even some­thing math­e­mat­i­cal) to be­gin with. That doesn’t mean you were wrong to as­sign the ini­tial prob­a­bil­ities: You were as­sign­ing them based on your knowl­edge at the time. That’s how you as­sign prob­a­bil­ities.

In your case, you’re not even up­dat­ing on an “un­known un­known”—that is, some­thing you failed to con­sider even as a pos­si­bil­ity—though that’s the rea­son you put all prob­a­bil­ities at less than 100%, be­cause your knowl­edge is limited. You’re up­dat­ing on some­thing you con­sid­ered be­fore. And I see ab­solutely no rea­son to la­bel this a spe­cial non-Bayesian type of up­date that some­how dodges the prob­lem. I could be miss­ing some­thing, but I don’t see a co­her­ent ar­gu­ment there.

As an aside, the re­peated refer­ences to how peo­ple mi­s­un­der­stood pre­vi­ous posts are dis­tract­ing to say the least. Couldn’t you just in­clude a sin­gle link to Aaron­son’s Large Num­bers pa­per (or any­thing on up-ar­row no­ta­tion, I men­tion Aaron­son’s pa­per be­cause it’s fun)? After all, if you can’t un­der­stand tetra­tion (and up), you’re not go­ing to un­der­stand the ar­ti­cle to be­gin with.

• Now sup­pose that the said “Ma­trix Lord” opens the sky, splits the Red Sea, demon­strates his du­pli­ca­tor box on some fish and, sure, cre­ates a hu­manoid Pa­tronus. Now do I have more rea­son to be­lieve that he is a Time Lord? Per­haps. Do I have rea­son to think that he will save 3^^^3 lives if I give him \$5? I don’t see con­vinc­ing rea­son to be­lieve so, but I don’t see ei­ther view as prob­le­matic.

Hon­estly, at this point, I would strongly up­date in the di­rec­tion that I am be­ing de­ceived in some man­ner. Pos­si­bly I am dream­ing, or drugged, of the per­son in front of me has some sort of per­cep­tion-con­trol de­vice. I do not see any rea­son why some­one who could open the sky, split the Red Sea, and so on, would need \$5; and if he did, why not make it him­self? Or sell the fish?

The only rea­sons I can imag­ine for a gen­uine Ma­trix Lord pul­ling this on me are very bad for me. Either he’s a sadist who likes peo­ple to suffer—in which case I’m doomed no mat­ter what I do—or there’s some­thing that he’s not tel­ling me (per­haps do­ing what he says once sur­ren­ders my free will, al­low­ing him to con­trol me for­ever?), which im­plies that he be­lieves that I would re­ject his de­mand if I knew the truth be­hind it, which strongly prompts me to re­ject his de­mand.

Or he’s in­sane, fol­low­ing no dis­cern­able rules, in which case the only thing to do is to try to evade no­tice (some­thing I’ve clearly already failed at).

• Either he’s a sadist who likes peo­ple to suffer—in which case I’m doomed no mat­ter what I do—or there’s some­thing that he’s not tel­ling me (per­haps do­ing what he says once sur­ren­ders my free will, al­low­ing him to con­trol me for­ever?), which im­plies that he be­lieves that I would re­ject his de­mand if I knew the truth be­hind it, which strongly prompts me to re­ject his de­mand.

That your uni­verse is con­trol­led by a sadist doesn’t sug­gest that ev­ery pos­si­ble ac­tion you could do is equiv­a­lent. Maybe all your pos­si­ble fates are mis­er­able, but some are far more mis­er­able than oth­ers. More im­por­tantly, a be­ing might be sadis­tic in some re­spects/​situ­a­tions but not in oth­ers.

I also have to as­sign a very, very low prior to any­one’s be­ing able to figure out in 5 min­utes what the Ma­trix Lord’s ex­act mo­ti­va­tions are. Your op­tions are too sim­plis­tic even to de­scribe minds of hu­man-level com­plex­ity, much less ones of the com­plex­ity re­quired to de­sign or over­see physics-break­ingly large simu­la­tions.

I think in­differ­ence to our prefer­ences (ex­cept as in­ci­den­tal to some other goal, e.g., pa­per­clip­ping) is more likely than ei­ther sadism or benefi­cence. Only very small por­tions of the space of val­ues fo­cus on hu­man-style suffer­ing or joy. Even in hy­po­thet­i­cals that seem de­signed to play with hu­man moral in­tu­itions. Eliezer’s de­ci­sion the­ory con­fer­ence ex­pla­na­tion makes as much sense as any.

• That your uni­verse is con­trol­led by a sadist doesn’t sug­gest that ev­ery pos­si­ble ac­tion you could do is equiv­a­lent. Maybe all your pos­si­ble fates are mis­er­able, but some are far more mis­er­able than oth­ers.

You are right. How­ever, I can see no way to de­cide which course of ac­tion is best (or least mis­er­able). My own de­ci­sion pro­cess be­comes ques­tion­able in such a situ­a­tion; I can’t imag­ine any strat­egy that is con­vinc­ingly bet­ter than tak­ing ran­dom ac­tions.

When I say “doomed no mat­ter what I do”, I do not mean doomed with cer­tainty. I mean that I have a high prob­a­bil­ity of doom, for any given ac­tion, and I can­not find a way to min­imise that prob­a­bil­ity through my own ac­tions.

I think in­differ­ence to our prefer­ences (ex­cept as in­ci­den­tal to some other goal, e.g., pa­per­clip­ping) is more likely than ei­ther sadism or benefi­cence.

Think­ing about this, I think that you are right. I still con­sider sadism more likely than benefi­cence, but I had been set­ting the prior for in­differ­ence too low. This im­plies that the Ma­trix Lord has prefer­ences, but these prefer­ences are un­known and pos­si­bly un­know­able (per­haps he wants to max­imise slood).

...

This make the ques­tion of which ac­tion to best take even more difficult to an­swer. I do not know any­thing about slood; I can­not, be­cause it only ex­ists out­side the Ma­trix. The only source of in­for­ma­tion from out­side the Ma­trix is the Ma­trix Lord. This im­plies that, be­fore reach­ing any de­ci­sion, I should spend a long time in­ter­view­ing the Ma­trix Lord, in an at­tempt to bet­ter be able to model him.

• How­ever, I can see no way to de­cide which course of ac­tion is best (or least mis­er­able). My own de­ci­sion pro­cess be­comes ques­tion­able in such a situ­a­tion; I can’t imag­ine any strat­egy that is con­vinc­ingly bet­ter than tak­ing ran­dom ac­tions.

Well, this Ma­trix Lord seems very in­ter­ested in de­ci­sion the­ory and util­i­tar­i­anism. Sadis­tic or not, I ex­pect such a be­ing to re­spond more fa­vor­ably to at­tempts to take the dilem­mas he raised se­ri­ously than to an epistemic melt­down. Tak­ing the guy at his word and try­ing to rea­son your way through the prob­lem is likely to give him more use­ful data than at­tempts to rebel or go crazy, and if you’re use­ful then it’s less likely that he’ll pun­ish you or pull the plug on your uni­verse’s simu­la­tion.

• It seems rea­son­ably likely that this will lead to a re­sponse of ”...alright, I’ve got the data that I wanted, no need to keep this simu­la­tion run­ning any longer...” and then pul­ling the plug on my uni­verse. While it is true that this strat­egy is likely to lead to a hap­pier Ma­trix Lord (es­pe­cially if the data that I give him co­in­cides with the data he ex­pects), I’m not con­vinced that it leads to a longer ex­is­tence for my uni­verse.

• That may be true too. It de­pends on the pri­ors we have for generic su­per­hu­man agents’ rea­sons for keep­ing a simu­la­tion run­ning (e.g., hav­ing some other sci­ence ex­per­i­ments planned, want­ing to re­ward you for pro­vid­ing data...) vs. for shut­ting it down (e.g., vin­dic­tive­ness, en­ergy con­ser­va­tion, be­ing in­ter­ested only in one data point per simu­la­tion...).

We do have some data to work with here, since we have ex­pe­rience with the differ­en­tial effects of power, in­tel­li­gence, cu­ri­os­ity, etc. among hu­mans. That data is only weakly ap­pli­ca­ble to such an ex­otic agent, but it does play a role, so our un­cer­tainty isn’t ab­solute. My main point was that un­usual situ­a­tions like this don’t call for com­plete de­ci­sion-the­o­retic de­spair; we still need to make choices, and we can still do so rea­son­ably, though our con­fi­dence that the best de­ci­sion is also a win­ning de­ci­sion is greatly diminished.

• Well, if I’m go­ing to free-form spec­u­late about the sce­nario, rather than use it to ex­plore the ques­tion it was in­tro­duced to ex­plore, the most likely ex­pla­na­tion that oc­curs to me is that the en­tity is do­ing the Ma­trix Lord equiv­a­lent of free-form spec­u­lat­ing… that is, it’s won­der­ing “what would hu­mans do, given this choice and that in­for­ma­tion?” And, it be­ing a Ma­trix Lord, its act of won­der­ing cre­ates a hu­man mind (in this case, mine) and gives it that choice and in­for­ma­tion.

Which makes it likely that I haven’t ac­tu­ally lived through most of the life I re­mem­ber, and that I won’t con­tinue to ex­ist much longer than this in­ter­ac­tion, and that most of what I think is in the world around me doesn’t ac­tu­ally ex­ist.

That said, I’m not sure what use free-form spec­u­lat­ing about such bizarre and un­der­speci­fied sce­nar­ios re­ally is, though I’ll ad­mit it’s kind of fun.

• That said, I’m not sure what use free-form spec­u­lat­ing about such bizarre and un­der­speci­fied sce­nar­ios re­ally is, though I’ll ad­mit it’s kind of fun.

It’s kind of fun. Isn’t that rea­son enough?

Look­ing at the origi­nal ques­tion—i.e. how to han­dle very large util­ities with very small prob­a­bil­ity—I find that I have a men­tal safety net there. The safety net says that the situ­a­tion is a lie. It does not mat­ter how much util­ity is claimed, be­cause any­one can state any ar­bi­trar­ily large num­ber, and a num­ber has been cho­sen (in this case, by the Ma­trix Lord) in a spe­cific at­tempt to over­whelm my util­ity func­tion. The small prob­a­bil­ity is cho­sen (a) be­cause I would not be­lieve a larger prob­a­bil­ity and (b) so that I have no re­course when it fails to hap­pen.

I am re­luc­tant to fid­dle with my men­tal safety nets be­cause, well, they’re safety nets—they’re there for a rea­son. And in this case, the rea­son is that such a fan­tas­ti­cally un­likely event is un­likely enough that it’s not likely to hap­pen ever, to any­one. Not even once in the whole his­tory of the uni­verse. If I (out of all the hun­dreds of billions of peo­ple in all of his­tory) do ever run across such a situ­a­tion, then it’s so in­cred­ibly over­whelm­ingly more likely that I am be­ing de­ceived that I’m far more likely to gain by im­me­di­ately jump­ing to the con­clu­sion of ‘de­ceit’ than by as­sum­ing that there’s any chance of this be­ing true.

• (nods) Sure. My re­ply here ap­plies here as well.

• In real life, you are not ever sup­posed to have a prior im­prob­a­bil­ity of 10^-100 for some fact dis­t­in­guished enough to be writ­ten down, and yet en­counter strong ev­i­dence, say 10^10 to 1, that the thing has ac­tu­ally hap­pened.

Sure you do. As you pointed out, dice rolls. The se­quence of rolls in a game of Risk

Those aren’t “dis­t­in­guished enough to be writ­ten down” be­fore the game is played. I’ll edit to make this slightly clearer hope­fully.

• What if the mug­ger says he will give you a sin­gle mo­ment of plea­sure that is 3^^^3 times more in­tense than a stan­dard good ex­pe­rience? Wouldn’t the lev­er­age penalty not ap­ply and thus make the prob­a­bil­ity of the mug­ger tel­ling the truth much higher?

I think the real rea­son the mug­ger shouldn’t be given money is that peo­ple are more likely to be able to at­tain 3^^^3 utils by donat­ing the five dol­lars to an ex­is­ten­tial risk-re­duc­ing char­ity. Even though the cur­rent uni­verse pre­sum­ably couldn’t sup­port 3^^^3 utils, there is a chance of be­ing able to cre­ate or travel to vast num­bers of other uni­verses, and I think this chance is greater than the chance of the mug­ger be­ing hon­est.

Am I miss­ing some­thing? Th­ese points seem too ob­vi­ous to miss, so I’m as­sign­ing a fairly large prob­a­bil­ity to me ei­ther be­ing con­fused or that these were already men­tioned.

• I don’t think you can give me a mo­ment of plea­sure that in­tense with­out us­ing 3^^^3 worth of atoms on which to run my brain, and I think the lev­er­age penalty still ap­plies then. You definitely can’t give me a mo­ment of worth­while hap­piness that in­tense with­out 3^^^3 units of back­ground com­pu­ta­tion.

• The ar­ti­cle said the lev­er­age penalty “[pe­nal­izes] hy­pothe­ses that let you af­fect a large num­ber of peo­ple, in pro­por­tion to the num­ber of peo­ple af­fected.” If this is all the lev­er­age penalty does, then it doesn’t mat­ter if it takes 3^^^3 atoms or units of com­pu­ta­tion, be­cause atoms and com­pu­ta­tions aren’t peo­ple.

That said, the ar­ti­cle doesn’t pre­cisely define what the lev­er­age penalty is, so there could be some­thing I’m miss­ing. So, what ex­actly is the lev­er­age penalty? Does it pe­nal­ize how many units of com­pu­ta­tion, rather than peo­ple, you can af­fect? This sounds much less ar­bi­trary than the vague defi­ni­tion of “per­son” and sounds much eas­ier to define: sim­ply di­vide the prior of a hy­poth­e­sis by the num­ber of bits flipped by your ac­tions in it and then nor­mal­ize.

• can even strip out the part about agents and carry out the rea­son­ing on pure causal nodes; the chance of a ran­domly se­lected causal node be­ing in a unique100 po­si­tion on a causal graph with re­spect to 3↑↑↑3 other nodes ought to be at most 100/​3↑↑↑3 for finite causal graphs.

• This would mean that all our de­ci­sions were dom­i­nated by tiny-seem­ing prob­a­bil­ities (on the or­der of 2-100 and less) of sce­nar­ios where our light­est ac­tion af­fected 3↑↑4 peo­ple… which would in turn be dom­i­nated by even more re­mote prob­a­bil­ities of af­fect­ing 3↑↑5 peo­ple...

I’m pretty ig­no­rant of quan­tum me­chan­ics, but I gather there was a similar prob­lem, in that the prob­a­bil­ity func­tion for some path ap­peared to be dom­i­nated by an in­finite num­ber of in­finites­si­mally-un­likely paths, and Feyn­man solved the prob­lem by show­ing that those paths can­cel­led each other out.

• Ran­dom thoughts here, not highly con­fi­dent in their cor­rect­ness.

Why is the lev­er­age penalty seen as some­thing that needs to be added, isn’t it just the ob­vi­ously cor­rect way to do prob­a­bil­ity.

Sup­pose I want to calcu­late the prob­a­bil­ity that a race of aliens will de­scend from the skies and ran­domly de­clare me Over­lord of Earth some time in the next year. To do this, I nat­u­rally go to Delphi to talk to the Or­a­cle of Perfect Pri­ors, and she tells me that the chance of aliens de­scend­ing from the skies and declar­ing an Over­lord of Earth in the next year is 0.0000007%.

If I then de­clare this to be my prob­a­bil­ity of be­come Over­lord of Earth in an alien-backed coup, this is ob­vi­ously wrong. Clearly I should mul­ti­ply it by the prob­a­bil­ity that the aliens pick me, given that the aliens are do­ing this. There are about 7-billion peo­ple on earth, and up­dat­ing on the ex­is­tence of Over­lord Declar­ing aliens doesn’t have much effect on that es­ti­mate, so my prob­a­bil­ity of be­ing picked is about 1 in 7 billion, mean­ing my prob­a­bil­ity of be­ing over­lorded is about 0.0000000000000001%. Tak­ing the former es­ti­mate rather than the lat­ter is sim­ply wrong.

Pas­cal’s mug­ging is a similar situ­a­tion, only this time when we up­date on the mug­ger tel­ling the truth, we rad­i­cally change our es­ti­mate of the num­ber of peo­ple who were ‘in the lot­tery’, all the way up to 3^^^^3. We then mul­ti­ply 1/​3^^^^3 by the prob­a­bil­ity that we live in a uni­verse where Pas­cal’s mug­gings oc­cur (which should be very small but not su­per-ex­po­nen­tially small). This gives you the lev­er­age penalty straight away, no need to think about Teg­mark mul­ti­verses. We were sim­ply mis­taken to not in­clude it in the first place.

• only this time when we up­date on the mug­ger tel­ling the truth, we rad­i­cally change our es­ti­mate of the num­ber of peo­ple who were ‘in the lot­tery’, all the way up to 3^^^^3. We then mul­ti­ply 1/​3^^^^3 by the prob­a­bil­ity that we live in a uni­verse where Pas­cal’s mug­gings oc­cur

How does this work with Clippy (the only pa­per­clip­per in known ex­is­tence) be­ing tempted with 3^^^^3 pa­per­clips?

That’s part of why I dis­like Robin Han­son’s origi­nal solu­tion. That the tempt­ing/​black­mailing offer in­volves 3^^^^3 other peo­ple, and that you are also a per­son should be merely in­ci­den­tal to one par­tic­u­lar illus­tra­tion of the prob­lem of Pas­cal’s Mug­ging—and as such it can’t be part of a solu­tion to the core prob­lem.

To re­place this with some­thing like “causal nodes”, as Eliezer men­tions, might per­haps solve the prob­lem. But I wish that we started talk­ing about Clippy and his pa­per­clips in­stead, so that the origi­nal illus­tra­tion of the prob­lem which in­volves in­ci­den­tal sym­me­tries doesn’t mis­lead us into a “solu­tion” over­re­li­ant on sym­me­tries.

• How does this work with Clippy (the only pa­per­clip­per in known ex­is­tence) be­ing tempted with 3^^^^3 pa­per­clips?

Clippy has some sort of prior over the num­ber of pa­per­clips that could pos­si­bly ex­ist. Let this num­ber be P. Con­di­tioned on each value of P, Clippy eval­u­ates the util­ity of the offer and the prob­a­bil­ity that it comes true.

In par­tic­u­lar, for P < 3^^^^3, the con­di­tional prob­a­bil­ity that the offer of 3^^^^3 pa­per­clips is le­git is 0. If some large num­ber of pa­per­clips ex­ists, e.g. P = 2*3^^^^3, the offer might ac­tu­ally be vi­able with non-neg­ligible prob­a­bil­ity, while its util­ity would be given by 3^^^^3/​P. Note that this is always at most 1.

How­ever, un­less Clippy lives in a very strange uni­verse, it thinks that P >= 3^^^^3 is very un­likely. So the ex­pected util­ity will be bounded by Pr[P >= 3^^^^3] and will end up be­ing very small.

• How does this work with Clippy (the only pa­per­clip­per in known ex­is­tence) be­ing tempted with 3^^^^3 pa­per­clips?

First thought, I’m not at all sure that it does. Pas­cal’s mug­ging may still be a prob­lem. This doesn’t seem to con­tra­dict what I said about the lev­er­age penalty be­ing the only cor­rect ap­proach, rather than a ‘fix’ of some kind, in the first case. Wor­ry­ingly, if you are cor­rect it may also not be a ‘fix’ in the sense of not ac­tu­ally fix­ing any­thing.

I no­tice I’m cur­rently con­fused about whether the ‘causal nodes’ patch is jus­tified by the same ar­gu­ment. I will think about it and hope­fully find an an­swer.

• How does this work with Clippy (the only pa­per­clip­per in known ex­is­tence) be­ing tempted with 3^^^^3 pa­per­clips?

This sounds a lit­tle bit like it might de­pend on the choice of SSA vs. SIA.

• [ ]
[deleted]
• Okay, that makes sense. In that case, though, where’s the prob­lem? Claims in the form of “not only is X a true event, with de­tails A, B, C, …, but also it’s the great­est event by met­ric M that has ever hap­pened” should have low enough prob­a­bil­ity that a hu­man writ­ing it down speci­fi­cally in ad­vance as a hy­poth­e­sis to con­sider, with­out be­ing prompted by some spe­cific ev­i­dence, is do­ing re­ally badly episte­molog­i­cally.

Also, I’m con­fused about the re­la­tion­ship to MWI.

• [ ]
[deleted]
• Many of the con­spir­acy the­o­ries gen­er­ated have some sig­nifi­cant over­lap (i.e. are not mu­tu­ally ex­clu­sive), so one shouldn’t ex­pect the sum of their prob­a­bil­ities to be less than 1. It’s per­mit­ted for P(Cube A is red) + P(Sphere X is blue) to be greater than 1.

• This sys­tem does seem to lead to the odd effect that you would prob­a­bly be more will­ing to pay Pas­cal’s Mug­ger to save 10^10^100 peo­ple than you would be will­ing to pay to save 10^10^101 peo­ple, since the lev­er­age penalties make them about equal, but the lat­ter has a higher com­plex­ity cost. In fact the lev­er­age penalty effec­tively means that you can­not dis­t­in­guish be­tween events pro­vid­ing more util­ity than you can provide an ap­pro­pri­ate amount of ev­i­dence to match.

• It’s not that odd. If some­one asked to bor­row ten dol­lars, and said he’d pay you back to­mor­row, would you be­lieve him? What if he said he’d pay back \$20? \$100? \$1000000? All the money in the world?

At some point, the prob­a­bil­ity goes down faster than the price goes up. That’s why you can’t just get a loan and keep rais­ing the in­ter­est to make up for the fact that you prob­a­bly won’t ever pay it back.

• Is there any par­tic­u­lar rea­son an AI wouldn’t be able to self-mod­ify with re­gards to its prior/​al­gorithm for de­cid­ing prior prob­a­bil­ities? A ba­sic Solomonoff prior should in­clude a non-neg­ligible chance that it it­self isn’t perfect for find­ing pri­ors, if I’m not mis­taken. That doesn’t an­swer the ques­tion as such, but it isn’t ob­vi­ous to me that it’s nec­es­sary to an­swer this one to de­velop a Friendly AI.

• A ba­sic Solomonoff prior should in­clude a non-neg­ligible chance that it it­self isn’t perfect for find­ing pri­ors, if I’m not mis­taken.

You are mis­taken. A prior isn’t some­thing that can be mis­taken per se. The clos­est it can get is as­sign­ing a low prob­a­bil­ity to some­thing that is true. How­ever, any prior sys­tem will say that the prob­a­bil­ity it gives of some­thing be­ing true is ex­actly equal to the prob­a­bil­ity of it be­ing true, there­fore it is well-cal­ibrated. It will oc­ca­sion­ally give low prob­a­bil­ities for things that are true, but only to the ex­tent that un­likely things some­times hap­pen.

• As near as I can figure, the cor­re­spond­ing state of af­fairs to a com­plex­ity+lev­er­age prior im­prob­a­bil­ity would be a Teg­mark Level IV mul­ti­verse in which each re­al­ity got an amount of mag­i­cal-re­al­ity-fluid cor­re­spond­ing to the com­plex­ity of its pro­gram (1/​2 to the power of its Kol­mogorov com­plex­ity) and then this mag­i­cal-re­al­ity-fluid had to be di­vided among all the causal el­e­ments within that uni­verse—if you con­tain 3↑↑↑3 causal nodes, then each node can only get 1/​3↑↑↑3 of the to­tal re­al­ness of that uni­verse.

The differ­ence be­tween this and av­er­age util­i­tar­i­anism is that we di­vide the prob­a­bil­ity by the hy­poth­e­sis size, rather than di­vid­ing the util­ity by that size. The close­ness of the two seems a bit sur­pris­ing.

Robin Han­son has sug­gested that the logic of a lev­er­age penalty should stem from the gen­eral im­prob­a­bil­ity of in­di­vi­d­u­als be­ing in a unique po­si­tion to af­fect many oth­ers (which is why I called it a lev­er­age penalty). At most 10 out of 3↑↑↑3 peo­ple can ever be in a po­si­tion to be “solely re­spon­si­ble” for the fate of 3↑↑↑3 peo­ple if “solely re­spon­si­ble” is taken to im­ply a causal chain that goes through no more than 10 peo­ple’s de­ci­sions; i.e. at most 10 peo­ple can ever be solely10 re­spon­si­ble for any given event.

This both­ers me be­cause it seems like fre­quen­tist an­thropic rea­son­ing similar to the Dooms­day ar­gu­ment. I’m not say­ing I know what the cor­rect ver­sion should be, but as­sum­ing that we can use a uniform dis­tri­bu­tion and get nice re­sults feels like the same mis­take as the prin­ci­ple of in­differ­ence (and more so­phis­ti­cated vari­a­tions that of­ten worked sur­pris­ingly well as an epistemic the­ory for finite cases). Things like Solomonoff dis­tri­bu­tions are more flex­ible...

(As for in­finite causal graphs, well, if prob­lems arise only when in­tro­duc­ing in­finity, maybe it’s in­finity that has the prob­lem.)

The prob­lem goes away of we try to em­ploy a uni­ver­sal dis­tri­bu­tion for the re­al­ity fluid, rather than a uniform one. (This does not make that a good idea, nec­es­sar­ily.)

This setup is not en­tirely im­plau­si­ble be­cause the Born prob­a­bil­ities in our own uni­verse look like they might be­have like this sort of mag­i­cal-re­al­ity-fluid—quan­tum am­pli­tude flow­ing be­tween con­figu­ra­tions in a way that pre­serves the to­tal amount of re­al­ness while di­vid­ing it be­tween wor­lds—and per­haps ev­ery other part of the mul­ti­verse must nec­es­sar­ily work the same way for some rea­son.

If we try to use uni­ver­sal-dis­tri­bu­tion re­al­ity-fluid in­stead, we would ex­pect to con­tinue to see the same sort of dis­tri­bu­tion we had seen in the past: we would be­lieve that we went down a path where the re­al­ity fluid con­cen­trated into the Born prob­a­bil­ities, but other quan­tum paths which would be very im­prob­a­ble ac­cord­ing to the Born prob­a­bil­ities may get high prob­a­bil­ity from some other rule.

• similar to the Dooms­day ar­gu­ment.

Just to jump in here—the solu­tion to the dooms­day ar­gu­ment is that it is a low-in­for­ma­tion ar­gu­ment in a high-in­for­ma­tion situ­a­tion. Ba­si­cally, once you know you’re the 10 billionth zor­blax, your prior should in­deed put you in the mid­dle of the group of zor­blaxes, for 20 billion to­tal, no mat­ter what a zor­blax is. This is cor­rect and makes sense. The trou­ble comes if you open your eyes, col­lect ad­di­tional data, like pop­u­la­tion growth pat­terns, and then never use any of that to up­date the prior. When peo­ple put pop­u­la­tion growth pat­terns and the dooms­day prior to­gether in the same calcu­la­tion for the “dooms­day date,” that’s just blatantly hav­ing data but not up­dat­ing on it.

• How con­fi­dent are you of “Prob­a­bil­ity penalties are epistemic fea­tures—they af­fect what we be­lieve, not just what we do. Maps, ideally, cor­re­spond to ter­ri­to­ries.”? That seems to me to be a strong heuris­tic, even a very very strong heuris­tic, but I don’t think it’s strong enough to carry the weight you’re plac­ing on it here. I mean, more tech­ni­cally, the map cor­re­sponds to some re­la­tion­ship be­tween the ter­ri­tory and the map-maker’s util­ity func­tion, and nodes on a causal graph, which are, af­ter all, prob­a­bil­is­tic, and thus are fea­tures of maps, not of ter­ri­to­ries, are fea­tures of the map-maker’s util­ity func­tion, not just sum­maries of ev­i­dence about the ter­ri­tory.
I sus­pect that this for­mal­ism mixes el­e­ments of di­vi­sion of mag­i­cal re­al­ity fluid be­tween maps with el­e­ments of di­vi­sion of mag­i­cal re­al­ity fluid be­tween ter­ri­to­ries.

• 1 Feb 2016 11:08 UTC
0 points

...at what point are you over­think­ing this?

• The link la­bel­led “the prior prob­a­bil­ity of hy­pothe­ses diminishes ex­po­nen­tially with their com­plex­ity” is malformed.

• Is there any jus­tifi­ca­tion for the lev­er­age penalty? I un­der­stand that it would ap­ply if there were a finite num­ber of agents, but if there’s an in­finite num­ber of agents, couldn’t all agents have an effect on an ar­bi­trar­ily larger num­ber of other agents? Shouldn’t the prior prob­a­bil­ity in­stead be P(event A | n agents will be effected) = (1 /​ n) + P(there be­ing in­finite en­tities)? If this is the case, then it seems the lev­er­age penalty won’t stop one from be­ing mugged.

• If our math has to han­dle in­fini­ties we have big­ger prob­lems. Un­less we use mea­sures, and then we have the same is­sue and seem­ingly forced solu­tion as be­fore. If we don’t use mea­sures, things fail to add up the mo­ment you imag­ine “in­finity”.

• Then this solu­tion just as­sumes the pos­si­bil­ity of in­finite peo­ple is 0. If this solu­tion is based on premises that are prob­a­bly false, then how is it a solu­tion at all? I un­der­stand that in­finity makes even big­ger prob­lems, so we should in­stead just call your solu­tion a pseudo-solu­tion-that’s-prob­a­bly-false-but-is—still-the-best-one-we-have, and ded­i­cate more efforts to find­ing a real solu­tion.

• 7 May 2014 15:29 UTC
0 points

This setup is not en­tirely im­plau­si­ble be­cause the Born prob­a­bil­ities in our own uni­verse look like they might be­have like this sort of mag­i­cal-re­al­ity-fluid—quan­tum am­pli­tude flow­ing be­tween con­figu­ra­tions in a way that pre­serves the to­tal amount of re­al­ness while di­vid­ing it be­tween wor­lds—and per­haps ev­ery other part of the mul­ti­verse must nec­es­sar­ily work the same way for some rea­son.

I should like to point out that if re­al­ness were not pre­served, i.e., if some wor­lds at time t were more real than oth­ers, their in­hab­itants would have no way of dis­cern­ing that fact.

• Just a di­gres­sion that has no bear­ing on the main point of the post:

I would be will­ing to as­sign a prob­a­bil­ity of less than 1 in 10^18 to a ran­dom per­son be­ing a Ma­trix Lord.

The prob­a­bil­ity that we’re in a simu­la­tion, times the num­ber of ex­pected Ma­trix Lords at one mo­ment per simu­la­tion, di­vided by pop­u­la­tion, should be a lower bound on that prob­a­bil­ity. I would think it would be at least 1 /​ pop­u­la­tion.

• I ex­pect far less than 1 Ma­trix Lord per simu­lated pop­u­la­tion. I ex­pect the vast ma­jor­ity of simu­la­tions are within UFAIs try­ing to gain cer­tain types of in­for­ma­tion through veridi­cal simu­la­tion, no Ma­trix Lords there.

• The usual analy­ses of Pas­cal’s Wager, like many lab ex­per­i­ments, priv­ileges the hy­poth­e­sis and doesn’t look for al­ter­na­tive hy­pothe­ses.

Why would any­one as­sume that the Mug­ger will do as he says? What do we know about the char­ac­ter of all pow­er­ful be­ings? Why should they be truth­ful to us? If he knows he could save that many peo­ple, but re­frains from do­ing so be­cause you won’t give him five dol­lars, he is by hu­man stan­dards a psy­cho. If he’s a psy­cho, maybe he’ll kill all those peo­ple if I give him 5 dol­lars. That ac­tu­ally seems more likely be­hav­ior from such a dick.

The situ­a­tion you are in isn’t the ex­per­i­men­tal hy­po­thet­i­cal of know­ing what the mug­ger will do de­pend­ing on what your ac­tions are. It’s a situ­a­tion where you ob­server X,Y, and Z, and are free to make in­fer­ences from them. If he has the power, I in­fer the mug­ger is a sadis­tic dick who likes toy­ing with crea­tures. I ex­pect him to re­nege on the bet, and likely in­vert it. “Ha Ha! Yes, I saved those be­ings, know­ing that each would go on to tor­ture a zillion zillion oth­ers.”

This is a mis­take the­ists make all the time. They think hy­poth­e­siz­ing an all pow­er­ful be­ing al­lows them to ac­count for all mys­ter­ies, and as­sume that once the power is there, the priv­ileged hy­poth­e­sis will be fulfilled. But you get no in­creased prob­a­bil­ity of any event from hy­poth­e­siz­ing power un­less you also es­tab­lish a prior on be­hav­ior. From the lit­tle I’ve seen of the mug­ger, if he has the power to do what he claims, he is malev­olent. If he doesn’t have the power, he is im­po­tent to de­liver and de­luded or dishon­est be­sides. Either way, I have no ex­pec­ta­tion of gain by ap­peal­ing to such a per­son.

• The usual analy­ses of Pas­cal’s Wager, like many lab ex­per­i­ments, priv­ileges the hy­poth­e­sis and doesn’t look for al­ter­na­tive hy­pothe­ses.

Yes, priv­ileg­ing a hy­poth­e­sis isn’t dis­cussed in great de­tail, but the al­ter­na­tives you men­tion in your post don’t re­solve the dilemma. Even if you think that that the prob­a­bil­ities of a “good” and “bad” al­ter­na­tives bal­ance each other out to the quadrillionth dec­i­mal point, the util­ities you get in your calcu­la­tion are as­tro­nom­i­cal. If you think there’s a 0.0000quadrillion ze­ros1 greater chance that the beg­gar will do good than harm, the ex­pected util­ity of your \$5 dona­tion is in­con­ceiv­ably greater than than a trillion years of hap­piness. If you think there’s at least a 0.0000quadrillion ze­ros1 chance that \$5 will cause the mug­ger to act malev­olently, your \$5 dona­tion is in­con­ceiv­ably worse than a trillion years of tor­ture. Both of the­ses ex­pec­ta­tions seem off.

You can’t just say “the prob­a­bil­ities bal­ance out”. You have to ex­plain why the prob­a­bil­ities bal­ance out to a bignum num­ber of dec­i­mal points.

• You have to ex­plain why the prob­a­bil­ities bal­ance out to a bignum num­ber of dec­i­mal points.

Ac­tu­ally, I don’t. I say the prob­a­bil­ities are within my mar­gin of er­ror, which is a lot larger than “0.0000quadrillion ze­ros1”. I can’t dis­cern differ­ences of “0.0000quadrillion ze­ros1″.

• OK, but now de­creas­ing your mar­gin of er­ror un­til you can make a de­ter­mi­na­tion is the most im­por­tant eth­i­cal mis­sion in his­tory. Govern­ments should spend billions of dol­lars to as­sem­ble to bright­est teams to calcu­late which of your two op­tions is bet­ter—more lives hang in the bal­ance (on ex­pec­ta­tion) than would ever live if we colonized the uni­verse with peo­ple the size of atoms.

Sup­pose a trust­wor­thy Omega tells you “This is a once in a life­time op­por­tu­nity. I’m go­ing to cure all res­i­dence of coun­try from all dis­eases in benev­olent way (no ironic or evil catches). I’ll leave the coun­try up to you. Give me \$5 and the coun­try will be Zim­babwe, or give me noth­ing and the coun­try will be Tan­za­nia. I’ll give you a cou­ple of min­utes to come up with a de­ci­sion.” You would not think to your­self “Well, I’m not sure which is big­ger. My es­ti­mates don’t differ by more than my mar­gin of er­ror, so I might as well save the \$5 and go with Tan­za­nia”. At least I hope that’s not how you’d make the de­ci­sion.

• Then you pre­sent me with a brilli­ant lemma Y, which clearly seems like a likely con­se­quence of my math­e­mat­i­cal ax­ioms, and which also seems to im­ply X—once I see Y, the con­nec­tion from my ax­ioms to X, via Y, be­comes ob­vi­ous.

Seems a lot like learn­ing a proof of X. It shouldn’t sur­prise us that learn­ing a proof of X in­creases your con­fi­dence in X. The mug­ger ge­nie has lit­tle ground to ac­cuse you of in­con­sis­tency for be­liev­ing X more af­ter learn­ing a proof of it.

Granted the anal­ogy isn’t ex­act; what is learned may fall well short of rigor­ous proof. You may have only learned a good ar­gu­ment for X. Since you as­sign only 90% pos­te­rior like­li­hood I pre­sume that’s in­tended in your nar­ra­tive.

Nev­er­the­less, analo­gous rea­son­ing seems to ap­ply. The mug­ger ge­nie has lit­tle ground to ac­cuse you of in­con­sis­tency for be­liev­ing X more af­ter learn­ing a good ar­gu­ment for it.

• Is it rea­son­able to take this as ev­i­dence that we shouldn’t use ex­pected util­ity com­pu­ta­tions, or not only ex­pected util­ity com­pu­ta­tions, to guide our de­ci­sions?

If I un­der­stand the con­text, the rea­son we be­lieved an en­tity, ei­ther a hu­man or an AI, ought to use ex­pected util­ity as a prac­ti­cal de­ci­sion mak­ing strat­egy, is be­cause it would yield good re­sults (a sim­ple, gen­eral ar­chi­tec­ture for de­ci­sion mak­ing). If there are fully gen­eral at­tacks (mug­gings) on all en­tities that use ex­pected util­ity as a prac­ti­cal de­ci­sion mak­ing strat­egy, then per­haps we should re­vise the origi­nal hy­poth­e­sis.

Utility as a the­o­ret­i­cal con­struct is charm­ing, but it does have to pay its way, just like any­thing else.

P.S. I think the rea­son­ing from “bounded ra­tio­nal­ity ex­ists” to “non-Bayesian mind changes ex­ist” is good stuff. Per­haps we could call this “on see­ing this, I be­come will­ing to re­vise my model” phe­nomenon some­thing like “sur­prise”, and dis­t­in­guish it from merely new in­for­ma­tion.

• Con­tin­u­ing from what I said in my last com­ment about the more gen­eral prob­lem with Ex­pected Utility Max­i­miz­ing, I think I might have a solu­tion. I may be en­tirely wrong, so any crit­i­cism is wel­come.

In­stead of calcu­lat­ing Ex­pected Utility, calcu­late the prob­a­bil­ity that an ac­tion will re­sult in a higher util­ity than an­other ac­tion. Choose the one that is more likely to end up with a higher util­ity. For ex­am­ple, if giv­ing Pas­cal’s mug­ger the money only has a one out of a trillionth chance of end­ing up with a higher util­ity than not giv­ing him your money, you wouldn’t give it.

Now there is an ap­par­ent in­con­sis­tency with this sys­tem. If there is a lot­tery, and you have a 1100 chance of win­ning, you would never buy a ticket. Even if the re­ward is \$200 and the cost of a ticket only \$1. Or even re­gard­less how big the re­ward is. How­ever if you are offered the chance to buy a lot of tick­ets all at once, you would do so, since the chance of win­ning be­comes large enough to out­grow the chance of not win­ning.

How­ever I don’t think that this is a prob­lem. If you ex­pect to play the lot­tery a bunch of times in a row, then you will choose to buy the ticket, be­cause mak­ing that choice in this one in­stance also means that you will make the same choice in ev­ery other in­stance. Then the prob­a­bil­ity of end­ing up with more money at the end of the day is higher.

So if you ex­pect to play the lot­tery a lot, or do other things that have low chances of end­ing up with high util­ities, you might par­ti­ci­pate in them. Then when all is done, you are more likely to end up with a higher util­ity than if you had not done so. How­ever if you get in a situ­a­tion with an ab­surdly low chance of win­ning, it doesn’t mat­ter how large the re­ward is. You wouldn’t par­ti­ci­pate, un­less you ex­pect to end up in the same situ­a­tion an ab­surdly large num­ber of times.

This method is con­sis­tent, it seems to “work” in that most agents that fol­low it will end up with higher util­ities than agents that don’t fol­low it, and Ex­pected Utility is just a spe­cial case of it that only hap­pens when you ex­pect to end up in similar situ­a­tions a lot. It also seems closer to how hu­mans ac­tu­ally make de­ci­sions. So can any­one find some­thing wrong with this?

• So if I’m get­ting what you’re say­ing cor­rectly, it would not sac­ri­fice a sin­gle cent for a 49% chance to save a hu­man life?

And on the other hand it could be tempted to a game where it’d have 51% chance of win­ning a cent, and 49% chance of be­ing de­stroyed?

If the solu­tion for the prob­lem of in­finites­mal prob­a­bil­ities, is to effec­tively ig­nore ev­ery prob­a­bil­ity un­der 50%, that’s a solu­tion that’s worse than the prob­lem...

• I stupidly didn’t con­sider that kind of situ­a­tion for some rea­son… Back to the draw­ing board I guess.

Though to be fair it would still come out ahead 51% of the time, and in a real world ap­pli­ca­tion it would prob­a­bly choose to spend the penny, since it would ex­pect to make choices similarly in the fu­ture, and that would help it come out ahead an even higher per­cent of the time.

But yes, a 51% chance of los­ing a penny for noth­ing prob­a­bly shouldn’t be worth more than a 49% chance at sav­ing a life for a penny. How­ever al­low­ing a large enough re­ward to out­weigh a small enough prob­a­bil­ity means the sys­tem will get stuck in situ­a­tions where it is pretty much guaran­teed to lose, on the slim, slim chance that it could get a huge re­ward.

Car­ing only about the per­cent of the time you “win” seemed like a more ra­tio­nal solu­tion but I guess not.

Though an­other benefit of this sys­tem could be that you could have weird util­ity func­tions. Like a rule that says any out­come where one life is saved is worth more than any amount of money lost. Or Asi­mov’s three laws of robotics, which wouldn’t work un­der an Ex­pected Utility func­tion since it would only care about the first law. This is al­lowed be­cause in the end all that mat­ters is which out­comes you pre­fer to which other out­comes. You don’t have to turn util­ities into num­bers and do math on them.

• Here’s a ques­tion, if we had the abil­ity to in­put a sen­sory event with a likely­hoodra­tio of 3^^^^3:1 this whole prob­lem would be solved?

• Here’s a ques­tion, if we had the abil­ity to in­put a sen­sory event with a likely­hoodra­tio of 3^^^^3:1 this whole prob­lem would be solved?

As­sum­ing the rest of our cog­ni­tive ca­pac­ity is im­proved com­men­su­rably then yes, prob­lem solved. Mind you we would then be left with the prob­lem if a Ma­trix Lord ap­pears and starts talk­ing about 3^^^^^3.

• Why are you not a sci­ence fic­tion writer?

• This seems like an ex­er­cise in scal­ing laws.

The odds of be­ing a hero who save 100 lives are less 1% of the odds of be­ing a hero who saves 1 life. So in the ab­sence of good data about be­ing a hero who saves 10^100 lives, we should as­sume that the odds are much, much less than 1/​(10^100).

In other words, for cer­tain claims, the size of the claim it­self low­ers the prob­a­bil­ity.

More pedes­trian ex­am­ple: ISTR your odds of be­com­ing a mu­si­cian earn­ing over \$1 mil­lion a year are much, much less than 1% of your odds of be­com­ing a mu­si­cian who earns over \$10,000 a year.

• I don’t know of any set of ax­ioms that im­ply that you should take ex­pected util­ities when con­sid­er­ing in­finite sets of pos­si­ble out­comes that do not also im­ply that the util­ity func­tion is bounded. If we think that our util­ity func­tions are un­bounded and we want to use the Solomonoff prior, why are we still tak­ing ex­pec­ta­tions?

(I sup­pose be­cause we don’t know how else to ag­gre­gate the util­ities over pos­si­ble wor­lds. Last week, I tried to see how far I could get if I weak­ened a few of the usual as­sump­tions. I couldn’t re­ally get any­where in­ter­est­ing be­cause my ax­ioms weren’t strong enough to tell you how to de­cide in many cases, even when the gen­er­al­ized prob­a­bil­ities and gen­er­al­ized util­ities are known.)

• Isn’t this more of so­cial recog­ni­tion of a scam?

While there are de­ci­sion-the­o­retic is­sues with the Origi­nal Pas­cal’s Wager, one of the main prob­lems is that it is a scam (“You can’t af­ford not to do it! It’s an offer you can’t re­fuse!”). It seems to me that you can con­struct plenty of ar­gu­ments like you just did, and many peo­ple wouldn’t take you up on the offer be­cause they’d rec­og­nize it as a scam. Once some­thing has a high chance of be­ing a scam (like tak­ing the form of Pas­cal’s Wager), it won’t get much more of your at­ten­tion un­til you lower the like­li­hood that it’s a scam. Is that a weird form of Con­fir­ma­tion Bias?

But nonethe­less, couldn’t the AI just func­tion in the same way as that? I would think it would need to learn how to iden­tify what is a trick and what isn’t a trick. I would just try to think of it as a Bad Guy AI who is try­ing to ma­nipu­late the de­ci­sion mak­ing al­gorithms of the Good Guy AI.

• The con­cern here is that if I re­ject all offers that su­perfi­cially pat­tern-match to this sort of scam, I run the risk of turn­ing down valuable offers as well. (I’m re­minded of a TV show decades ago where they had some guy dress like a bum and wan­der down the street offer­ing peo­ple \$20, and ev­ery­one ig­nored him.)

Of course, if I’m not smart enough to ac­tu­ally eval­u­ate the situ­a­tion, or don’t feel like spend­ing the en­ergy, then su­perfi­cial pat­tern-match­ing and re­jec­tion is my safest strat­egy, as you sug­gest.

But the ques­tion of what anal­y­sis a suffi­ciently smart and at­ten­tive agent could do, in prin­ci­ple, to take ad­van­tage of rare valuable op­por­tu­ni­ties with­out be­ing suck­ered by scam artists is of­ten worth ask­ing any­way.

• But wouldn’t you just be suck­ered by suffi­ciently smart and at­ten­tive scam artists?

• It de­pends on the na­ture of the anal­y­sis I’m do­ing.

I mean, sure, if the scam artist is smart enough to, for ex­am­ple, com­pletely en­cap­su­late my sen­so­rium and provide me with an en­tirely simu­lated world that it up­dates in real time and perfect de­tail, then all bets are off… it can make me be­lieve any­thing by ma­nipu­lat­ing the ev­i­dence I ob­serve. (Similarly, if the scam artist is smart enough to di­rectly ma­nipu­late my brain/​mind.)

But if my rea­son­ing is re­li­able and I ac­tu­ally have ac­cess to ev­i­dence about the real world, then the bet­ter I am at eval­u­at­ing that ev­i­dence, the harder I am to scam about things re­lat­ing to that ev­i­dence, even by a scam artist far smarter than me.

• I dis­agree. All the scam artist has to know is your method of com­ing to your con­clu­sions. Once he knows that then he can prob­a­bly ex­ploit you de­pend­ing on his clev­er­ness (and then it be­comes an arms race). If any­thing, try­ing to defend your­self from be­ing ma­nipu­lated in that way would prob­a­bly be ex­tremely difficult in of it­self. Either way, my ini­tial guess is that your method­ol­ogy would still be su­perfi­cial pat­tern-match­ing, but it would just be a deeper, more com­plex level of it.

This seems to be what Eliezer is do­ing with all the var­i­ous sce­nar­ios. He’s test­ing his method­ol­ogy against differ­ent at­tacks and differ­ent sce­nar­ios. I’m just sug­gest­ing is to change your view­point to the Bad Guy. Rather than talk about your re­li­able rea­son­ing, talk about the bad guy and how he can ex­ploit your rea­son­ing.

• my ini­tial guess is that your method­ol­ogy would still be su­perfi­cial pat­tern-match­ing, but it would just be a deeper, more com­plex level of it.

Fair enough. If I ac­cept that guess as true, I agree with your con­clu­sion.

I also agree that adopt­ing the en­emy’s per­spec­tive is an im­por­tant—for hu­mans, in­dis­pen­si­ble—part of strate­gic think­ing.

• I also think that the var­i­ant of the prob­lem fea­tur­ing an ac­tual mug­ger is about scam recog­ni­tion.

Sup­pose you get an un­so­lic­ited email claiming that a Nige­rian prince wants to send you a Very Large Re­ward worth \$Y. All you have to do is send him a cash ad­vance of \$5 first …

I an­a­lyze this as a straight­for­ward two-player game tree via the usual min­i­max pro­ce­dure. Player one goes first, and can ei­ther pay \$5 or not. If player one chooses to pay, then player two goes sec­ond, and can ei­ther pay Very Large Re­ward \$Y to player one, or he can run away with the cash in hand. Un­der the usual min­i­max as­sump­tions, player 2 is ob­vi­ously not go­ing to pay out! Cru­cially, this anal­y­sis does not de­pend on the value for Y.

The anal­y­sis for Pas­cal’s mug­ger is equiv­a­lent. A de­ci­sion pro­ce­dure that needs to in­tro­duce ad hoc cor­rec­tive fac­tors based on the value of Y seems flawed to me. This type of situ­a­tion should not re­quire an un­usual de­gree of math­e­mat­i­cal so­phis­ti­ca­tion to an­a­lyze.

When I list out the most rele­vant facts about this sce­nario, they in­clude the fol­low­ing: (1) we re­ceived an un­so­lic­ited offer (2) from an un­known party from whom we won’t be able to seek re­dress if any­thing goes wrong (3) who can take our money and run with­out giv­ing us any­thing ver­ifi­able in re­turn.

That’s all we need to know. The value of Y doesn’t mat­ter. If the mug­ger performs a cool and im­pres­sive magic trick we may want to tip him for his skil­lful street perfor­mance. We still shouldn’t ex­pect him to pay­out Y.

I gen­er­ally learn a lot from the posts here, but in this case I think the rea­son­ing in the post con­fuses rather than en­light­ens. When I look back on my own life ex­pe­riences, there are cer­tainly times when I got scammed. I un­der­stand that some in the Less Wrong com­mu­nity may also have fallen vic­tim to scams or fraud in the past. I ex­pect that many of us will likely be sub­ject to dis­in­gen­u­ous offers by unFriendly par­ties in the fu­ture. I re­spect­fully sug­gest that know­ing about com­mon scams is a helpful part of a ra­tio­nal­ist’s train­ing. It may offer a large benefit rel­a­tive to other in­vest­ments.

If my anal­y­sis is flawed and/​or I’ve missed the point of the ex­er­cise, I would ap­pre­ci­ate learn­ing why. Thanks!

• When you say that player 2 “is ob­vi­ously not go­ing to pay out” that’s an ap­prox­i­ma­tion. You don’t know that he’s not go­ing to pay off. You know that he’s very, very, very, un­likely to pay off. (For in­stance, there’s a very slim chance that he sub­scribes to a kind of hon­esty which leads him to do things he says he’ll do, and there­fore doesn’t fol­low min­i­max.) But in Pas­cal’s Mug­ging, “very, very, very, un­likely” works differ­ently from “no chance at all”.

• That does not mat­ter. If you think it is scam, then size of promised re­ward does not mat­ter. 100? Googol? Googol­plex? 3^^^3? In­finite? It just do not en­ter calcu­la­tions in first place, since it is made up any­way.

Deter­min­ing “is this scam?” prob­a­bly would have to rely on other things than size of re­ward. That’ avoids whole “but but there is no 1 in 3^^^3 prob­a­blility be­cause I say so” bs.

• There’s a prob­a­bil­ity of a scam, you’re not cer­tain that it is a scam. The small prob­a­bil­ity that you are wrong about it be­ing a scam is mul­ti­plied by the large amount.

• What if the prob­a­bil­ity of it be­ing a scam is a func­tion of the amount offered?

• There seems to be this idea on LW that the prob­a­bil­ity of it be­ing not a scam can only de­crease with the Kol­mogorov com­plex­ity of the offer. If you ac­cept this idea, then the prob­a­bil­ity be­ing a func­tion of the amount doesn’t help you.

If you ac­cept that the prob­a­bil­ity can de­crease faster than that, then of course that’s a solu­tion.

• There seems to be this idea on LW that the prob­a­bil­ity of it be­ing not a scam can only de­crease with the Kol­mogorov com­plex­ity of the offer.

I can’t come up with any rea­sons why that should be so.

• I sup­pose that peo­ple who talk about Kol­mogorov com­plex­ity in this set­ting are think­ing of AIXI or some similar de­ci­sion pro­ce­dure.
Too bad that AIXI doesn’t work with un­bounded util­ity, as ex­pec­ta­tions may di­verge or be­come un­defined.

• I think this comes down to bounded computation

I’ve heard some peo­ple try to claim silly things like, the prob­a­bil­ity that you’re tel­ling the truth is coun­ter­bal­anced by the prob­a­bil­ity that you’ll kill 3↑↑↑3 peo­ple in­stead, or some­thing else with a con­ve­niently equal and op­po­site util­ity. But there’s no way that things would bal­ance out ex­actly in practice

With a hu­man’s bounded com­pu­ta­tional re­sources, maybe as­sum­ing that it bal­ances out is the best you can do. You have to make sim­plify­ing as­sump­tions if you want to rea­son about num­bers as large as 3↑↑↑3.

• But we can see why the prob­a­bil­ity isn’t coun­ter­bal­anced with­out hav­ing a visceral grasp on the quan­tities in­volved. There may be some un­cer­tainty in our view that there aren’t enough coun­ter­bal­anc­ing forces we’ve taken into ac­count, but in prac­tice un­cer­tainty al­most always places you some non­triv­ial dis­tance away from .5 cre­dence. We still have to have cre­dence lev­els, even about quan­tities that are very un­cer­tain or be­yond our abil­ity to com­pute. Me­tauncer­tainty won’t drag your con­fi­dence to .5 un­less your un­cer­tainty dis­pro­por­tionately sup­ports ‘I’m un­der­es­ti­mat­ing the like­li­hood that there are coun­ter­bal­anc­ing risks’ over ‘I’m over­es­ti­mat­ing the like­li­hood that there are coun­ter­bal­anc­ing risks’.

• “Robin Han­son has sug­gested that the logic of a lev­er­age penalty should stem from the gen­eral im­prob­a­bil­ity of in­di­vi­d­u­als be­ing in a unique po­si­tion to af­fect many oth­ers (which is why I called it a lev­er­age penalty).”

As I men­tioned in a re­cent dis­cus­sion post, I have difficulty ac­cept­ing Robin’s solu­tion as valid—for starters it has the sem­blance of pos­si­bly work­ing in the case of peo­ple who care about peo­ple, be­cause that’s a case that seems as it should be sym­met­ri­cal, but how would it e.g. work for a Clippy who is tempted with the cre­ation of pa­per­clips? There’s no sym­me­try here be­cause pa­per­clips don’t think and Clippy knows pa­per­clips don’t think.

And how would it work if the AI in ques­tion in asked to eval­u­ate whether such a hy­po­thet­i­cal offer should be ac­cepted by a ran­dom in­di­vi­d­ual or not? Robin’s an­thropic solu­tion says that the AI should judge that some­one else ought hy­po­thet­i­cally take the offer, but it would judge the prob­a­bil­ities differ­ently if it had to judge things in ac­tual life. That sounds as if it ought vi­o­late ba­sic prin­ci­ples of ra­tio­nal­ity?

My effort to steel­man Robin’s ar­gu­ment at­tempted to effec­tively re­place “lives” with “struc­tures of type X that the ob­server cares about and will be im­pacted”, and “unique po­si­tion to af­fect” with “unique po­si­tion of not di­rectly ob­serv­ing”—hence Law of Visi­ble Im­pact.

• I think this is cap­tured by the no­tion that a causal node should only im­prob­a­bly oc­cupy a unique po­si­tion on a causal graph?

• Yeah, that’s prob­a­bly gen­er­al­ized enough that it works, though I sup­pose it didn’t re­ally quite click for me at first be­cause I was fo­cus­ing on Robin’s “abil­ity to af­fect” as cor­re­spond­ing to the term “unique po­si­tion”, and I was in­stead think­ing of “in­abil­ity to per­ceive”—but that’s also a unique po­si­tion, so I sup­pose the causal node ver­sion you men­tion cov­ers that in­deed. Thanks.

• The prior prob­a­bil­ity of us be­ing in a po­si­tion to im­pact a googol­plex peo­ple is on the or­der of one over googol­plex, so your equa­tions must be wrong

That’s not at all how val­idity of phys­i­cal the­o­ries is eval­u­ated. Not even a lit­tle bit.

By that logic, you would have to re­ject most cur­rent the­o­ries. For ex­am­ple, Rel­a­tivity re­stricted the max­i­mum speed of travel, thus re­veal­ing that countless fu­ture gen­er­a­tions will not be able to reach the stars. Archimedes’s dis­cov­ery of the buoy­ancy laws en­abled fu­ture naval bat­tles and ocean far­ing, im­pact­ing billions so far (which is not a googol­plex, but the day is still young). The dis­cov­ery of fis­sion and fu­sion still has the po­ten­tial to de­stroy all those po­ten­tial fu­ture lives. Same with com­puter re­search.

The only thing that mat­ters in physics is the old mun­dane “fits cur­rent data, makes valid pre­dic­tions”. Or at least has the po­ten­tial to make testable pre­dic­tions some time down the road. The only time you might want to bleed (mis)an­thropic con­sid­er­a­tions into physics is when you have no way of eval­u­at­ing the pre­dic­tive power of var­i­ous mod­els and need to de­cide which one is worth pur­su­ing. But that is not physics, it’s de­ci­sion the­ory.

Once you have a testable work­ing the­ory, your an­thropic con­sid­er­a­tions are ir­rele­vant for eval­u­at­ing its val­idity.

• Rel­a­tivity re­stricted the max­i­mum speed of travel, thus re­veal­ing that countless fu­ture gen­er­a­tions will not be able to reach the stars

That’s perfectly cred­ible since it im­plies a lack of lev­er­age.

Archimedes’s dis­cov­ery of the buoy­ancy laws en­abled fu­ture naval bat­tles and ocean far­ing, im­pact­ing billions so far

10^10 is not a sig­nifi­cant fac­tor com­pared to the sen­sory ex­pe­rience of see­ing some­thing float in a bath­tub.

The only thing that mat­ters in physics is the old mun­dane “fits cur­rent data, makes valid pre­dic­tions”.

To build an AI one must be a tad more for­mal than this, and once you start try­ing to be for­mal, you will soon find that you need a prior.

• That’s perfectly cred­ible since it im­plies a lack of lev­er­age.

Oh, I as­sumed that nega­tive lev­er­age is still lev­er­age. Given that it might amount to an equiv­a­lent of kil­ling a googol­plex of peo­ple, as­sum­ing you equate never be­ing born with kil­ling.

To build an AI one must be a tad more for­mal than this, and once you start try­ing to be for­mal, you will soon find that you need a prior.

I see. I can­not com­ment on any­thing AI-re­lated with any con­fi­dence. I thought we were talk­ing about eval­u­at­ing the like­li­hood of a cer­tain model in physics to be ac­cu­rate. In that lat­ter case an­thropic con­sid­er­a­tions seem ir­rele­vant.

• It’s likely that any­thing around to­day has a huge im­pact on the state of the fu­ture uni­verse. As I un­der­stood the ar­ti­cle, the lev­er­age penalty re­quires con­sid­er­ing how unique your op­por­tu­nity to have the im­pact would be too, so Archimedes had a mas­sive im­pact, but there have also been a mas­sive num­ber of peo­ple through his­tory who would have had the chance to come up with the same the­o­ries had they not already been dis­cov­ered, so you have to offset Archimedes lev­er­age penalty by the fact that he wasn’t uniquely ca­pa­ble of hav­ing that lev­er­age.

• so you have to offset Archimedes lev­er­age penalty by the fact that he wasn’t uniquely ca­pa­ble of hav­ing that lev­er­age.

Nei­ther was any other sci­en­tist in his­tory ever, in­clud­ing the the one in the Eliezer’s dark en­ergy ex­am­ple. Per­son­ally, I take a very dim view of ap­ply­ing an­throp­ics to calcu­lat­ing prob­a­bil­ities of fu­ture events, and this is what Eliezer is do­ing.

• 6 May 2013 14:50 UTC
0 points

I con­sid­ered this, and I’m not sure if I am con­sid­er­ing the mug­ging from the right per­spec­tive.

For in­stance, in the case of a mug­ger who is will­ing to talk with you, even if the ac­tual amount of ev­i­dence was math­e­mat­i­cally in­de­ter­mi­nate (Say the amount is defined as ‘It’s a finite num­ber higher than any num­ber that could fit in your brain.’ and the prob­a­bil­ity is defined as ‘closer to 0 then any pos­i­tive num­ber you can fit in your brain that isn’t 0’) you might still at­tempt to figure out the di­rec­tion that talk­ing about ev­i­dence made the ev­i­dence about the mug­ger go and use that for de­ci­sion making

If as you talk to him, the mug­ger pro­vides more and more ev­i­dence that he is a ma­trix lord, you could say “Sure, Here’s 5 dol­lars.”

Or If as you talk to him, the mug­ger pro­vides more and more ev­i­dence that he is a mug­ger, you could say “No, go away.”

(Note: I’m NOT say­ing the above is cor­rect or in­cor­rect yet! Among other things, you could also use the SPEED at which the mug­ger was giv­ing you ev­i­dence as an aid to de­ci­sion mak­ing. You might say yes to a Mug­ger who offers a mil­lion bits of ev­i­dence all at once, and no to a Mug­ger who offers ev­i­dence one bit at a time.)

How­ever, in the case be­low, you can’t even do that—Or you could at­tempt to, but with the worry that even talk­ing about it it­self makes a de­ci­sion:

Cruel Mug­ger: “Give me 5 dol­lars and I use my pow­ers to save a shit­load of lives. Do any­thing else, like talk­ing about ev­i­dence or walk­ing away, and they die.”

So, to con­sider the prob­lem from the right per­spec­tive, should I be at­tempt­ing to solve the Mug­ging, the Cruel Mug­ging, both sep­a­rately, or both as if they are the same prob­lem?

• Some­one who re­acts to gap in the sky with “its most likely a hal­lu­ci­na­tion” may, with in­cred­ibly low prob­a­bil­ity, en­counter the de­scribed hy­po­thet­i­cal where it is not a hal­lu­ci­na­tion, and lose out. Yet this per­son would perform much more op­ti­mally when their drink got spiced with LSD or if they nat­u­rally de­vel­oped an equiv­a­lent fault.

And of course the is­sue is that max­i­mum or even typ­i­cal im­pact of faulty be­lief pro­cess­ing which is de­scribed here could be far larger than \$5 - the hy­poth­e­sis could have re­quired you to give away ev­ery­thing, to work harder than you nor­mally would and give away in­come, or worse, to kill some­one. And if it is pro­cessed with dis­re­gard for prob­a­bil­ity of a fault, such dan­ger­ous failure modes are ren­dered more likely.

• This is true, but the real ques­tion here is how to fix a non-con­ver­gent util­ity calcu­la­tion.

• One of the points in the post was a dra­mat­i­cally non Bayesian dis­mis­sal of up­dates on the pos­si­bil­ity of hal­lu­ci­na­tion. An agent of finite re­li­a­bil­ity faces a trade­off be­tween it’s be­havi­our un­der failure and it’s be­havi­our in un­likely cir­cum­stances.

With re­gards to fix­ing up prob­a­bil­ities, there is an is­sue that early in it’s life, an agent is uniquely po­si­tioned to in­fluence it’s fu­ture. Every el­derly agent goes through early life; while the prob­a­bil­ity of find­ing your athe­ist vari­a­tion on the theme of im­ma­te­rial soul in the early age agent is low, the prob­a­bil­ity that an agent will be mak­ing de­ci­sions at an early age is 1, and its not quite clear that we could use this low prob­a­bil­ity. (It may be more rea­son­able to as­sign low prob­a­bil­ity to an in­cred­ibly long lifes­pan though, in the man­ner similar to the speed prior).

• Some­one who re­acts to gap in the sky with “its most likely a hal­lu­ci­na­tion” may, with in­cred­ibly low prob­a­bil­ity, en­counter the de­scribed hy­po­thet­i­cal where it is not a hal­lu­ci­na­tion, and lose out. Yet this per­son would perform much more op­ti­mally when their drink got spiced with LSD or if they nat­u­rally de­vel­oped an equiv­a­lent fault.

What Eliezer is ac­tu­ally say­ing about this kind of hal­lu­ci­na­tion:

I mean, in prac­tice, I would tend to try and take cer­tain ac­tions in­tended to do some­thing about the rather high pos­te­rior prob­a­bil­ity that I was hal­lu­ci­nat­ing and be par­tic­u­larly wary of ac­tions that sound like the sort of thing psy­chotic pa­tients hal­lu­ci­nate, but this is an ar­ti­fact of the odd con­struc­tion of the sce­nario and wouldn’t ap­ply to the more re­al­is­tic and likely-to-be-ac­tu­ally-en­coun­tered case of the physics the­ory which im­plied we could use dark en­ergy for com­pu­ta­tion or what­ever.

The kind of ‘hal­lu­ci­na­tion’ that is dis­cussed in the posts is more about the is­sues of be­ing forced you be­lieve you are a boltz­mann brain or a de­scen­dent hu­man who is seam­lessly hal­lu­ci­nat­ing be­ing an ‘an­ces­tor’ be­fore be­ing able to be­lieve that it is likely that there will be many hu­mans in the fu­ture. This is an en­tirely differ­ent kind of is­sue.

• Even if the prior prob­a­bil­ity of your sav­ing 3↑↑↑3 peo­ple and kil­ling 3↑↑↑3 peo­ple, con­di­tional on my giv­ing you five dol­lars, ex­actly bal­anced down to the log(3↑↑↑3) dec­i­mal place, the like­li­hood ra­tio for your tel­ling me that you would “save” 3↑↑↑3 peo­ple would not be ex­actly 1:1 for the two hy­pothe­ses down to the log(3↑↑↑3) dec­i­mal place.

The sce­nario is already so out­landish that it seems un­war­ranted to as­sume that the mug­ger is say­ing the truth with more than 0.5 cer­tainty. The mo­tives of such a be­ing to en­gage in this kind of prank, if truly in such a pow­er­ful po­si­tion, would have to be very con­vulted. Isn’t it at least as likely that the op­po­site will hap­pen if I hand over the five dol­lars?

Okay, I guess if that’s my an­swer, I’ll have to hand over the money if the mug­ger says “don’t give me five dol­lars!” Or do I?

• This is one of many rea­sons that the “dis­cover novel physics that im­plies the abil­ity to af­fect (re­ally big num­ber) lives” ver­sion of this thought ex­per­i­ment works bet­ter than the “en­counter su­per­hu­man per­son who as­serts the abil­ity to af­fect (re­ally big num­ber) lives”. That said, if I’m look­ing for rea­sons for in­cre­dulity and pre­pared to stop think­ing about the sce­nario once I’ve found them, I can find them eas­ily enough in both cases.

• Well, one of my re­sponses to the su­per­hu­man sce­nario is that my prior de­pends on the num­ber, so you can’t ex­ceed my prior just by rais­ing the num­ber.

The rea­sons I gave for hav­ing my prior de­pend on the num­ber don’t still ap­ply to the physics sce­nario, but there are new rea­sons that would. For in­stance, the hu­man mind is not good at es­ti­mat­ing or com­pre­hend­ing very small prob­a­bil­ities and very large num­bers; if I had to pay \$5 for re­search that had a very tiny prob­a­bil­ity of pro­duc­ing a break­through that would im­prove lives by a very large amount of util­ity, I would have lit­tle con­fi­dence in my abil­ity to prop­erly com­pute those num­bers and the more ex­treme the num­bers the less my con­fi­dence would be.

(And “I have no con­fi­dence” also means I don’t know my own er­rors are dis­tributed, so you can’t eas­ily fix this up by fac­tor­ing my con­fi­dence into the ex­pected value calcu­la­tion.)

• Yes, agreed, a re­searcher say­ing “give me \$5 to re­search tech­nol­ogy with im­plau­si­ble pay­off” is just some guy say­ing “give me \$5 to use my im­plau­si­ble pow­ers” with differ­ent paint and has many of the same prob­lems.

The sce­nario I’m think­ing of is “I have, af­ter do­ing a bunch of re­search, dis­cov­ered some novel physics which, given my un­der­stand­ing of it and the ex­per­i­men­tal data I’ve gath­ered, im­plies the abil­ity to im­prove (re­ally big num­ber) lives,” which raises the pos­si­bil­ity that I ought to re­ject the re­sults of my own ex­per­i­ments and my own the­o­riz­ing, be­cause the con­clu­sion is just so bloody im­plau­si­ble (at least when ex­pressed in hu­man terms; EY loses me when he starts talk­ing about quan­tify­ing the im­plau­si­bil­ity of the con­clu­sion in terms of bits of ev­i­dence and/​or bits of sen­sory in­put and/​or bits of cog­ni­tive state).

And in par­tic­u­lar, the “you could just as eas­ily harm (re­ally big num­ber) lives!” ob­jec­tion sim­ply dis­ap­pears in this case; it’s no more likely than any­thing else, and van­ishes into un­con­sid­er­abil­ity when com­pared to “noth­ing ter­ribly in­ter­est­ing will hap­pen,” un­less I posit that I ac­tu­ally do know what I’m do­ing.

• Re­v­ersed stu­pidity is not in­tel­li­gence. You are not so con­fused as to guess the op­po­site of what will hap­pen more of­ten than what will ac­tu­ally hap­pen. All your con­fu­sion means is that it is al­most as likely that the op­po­site will hap­pen.

• Sup­pose you could con­ceive of what the fu­ture will be like if it were ex­plained to you.

Are there more or less than a google­plex differ­en­tiable fu­tures which are con­ceiv­able to you? If there are more, then se­lect­ing a spe­cific one of those con­ceiv­able fu­tures is more bits than posited as pos­si­ble. If fewer, then...?

• the vast ma­jor­ity of the im­prob­a­ble-po­si­tion-of-lev­er­age in any x-risk re­duc­tion effort comes from be­ing an Earth­ling in a po­si­tion to af­fect the fu­ture of a hun­dred billion galax­ies,

Why does “Earth­ling” im­ply suffi­cient ev­i­dence for the rest of this (given a lev­er­age ad­just­ment)? Don’t we have in­de­pen­dent rea­son to think oth­er­wise, eg the Great Filter ar­gu­ment?

Mind you, the re­cent MIRI math pa­per and fol­low-up seem (on their face) to dis­prove some clever rea­sons for call­ing seed AGI ac­tu­ally im­pos­si­ble and thereby re­ject­ing a sce­nario in which Earth will “af­fect the fu­ture of a hun­dred billion galax­ies”. There may be a les­son there.

• Typo:

op­por­tu­ni­ties to af­fect small large num­bers of sen­tient beings

• No, it’s sup­posed to say that. 10^80 is ear­lier defined as a small large num­ber.

• I missed that. It’s bad enough no­ta­tion that I ex­pect oth­ers to stum­ble over it, too.

• I think it’s good bad no­ta­tion.

• Maybe “small­ishly large”? That makes it clearer that you are say­ing “(small-kind-of-large)-kind-of num­ber”, not “num­ber that is small and large”

• So it looks like the Pas­cal’s mug­ger prob­lem can be re­duced to two prob­lems that need to be solved any­way for an FAI: how to be op­ti­mally ra­tio­nal given a finite amount of com­put­ing re­sources, and how to as­sign prob­a­bil­ities for math­e­mat­i­cal state­ments in a rea­son­able way.

Does that sound right?

• I’m not sure I agree with that one—where does the ques­tion of an­thropic pri­ors fit in? The ques­tion is how to as­sign prob­a­bil­ities to phys­i­cal state­ments in a rea­son­able way.

• You may be aware of the use of nega­tive prob­a­bil­ities in ma­chine learn­ing and quan­tum me­chan­ics and, of course, Eco­nomics. For the last, the ex­is­tence of a Ma­trix Lord has such a large nega­tive prob­a­bil­ity that it swamps his proffer (per­haps be­cause it is al­tru­is­tic?) and no money changes hands. In other words, there is noth­ing in­ter­est­ing here- it’s just that some type of de­ci­sion the­ory haven’t in­cor­po­rated nega­tive prob­a­bil­ities yet. The re­verse situ­a­tion- Job’s com­plaint against God- is more in­ter­est­ing. It shows why vari­ables with nega­tive prob­a­bil­ities tend to dis­ap­pear out of dis­course to be re­placed by the differ­ence be­tween two in­de­pen­dent ‘nor­mal’ vari­ables- in this case Cos­mic Jus­tice is re­placed by the I-Thou re­la­tion­ship of ‘God’ & ‘Man’.

• Can you give me an ex­am­ple of some­thing with nega­tive prob­a­bil­ity?

I will offer you a bet: if it doesn’t hap­pen, you have to give me a dol­lar, but if it does hap­pen, you have to give me ev­ery­thing you own. I find it hard to be­lieve that there’s any­thing where that’s con­sid­ered good odds.

For the last, the ex­is­tence of a Ma­trix Lord has such a large nega­tive prob­a­bil­ity that it swamps his proffer (per­haps be­cause it is al­tru­is­tic?)

If it has such a large nega­tive prob­a­bil­ity, wouldn’t you try to avoid ever giv­ing some­one five dol­lars, since they anti-might be a Ma­trix Lord, and you can’t risk a nega­tive prob­a­bil­ity of them spar­ing 3^^^3 peo­ple?

Also, when you men­tion quan­tum me­chan­ics, I think you’re con­fus­ing wave­form den­sity and prob­a­bil­ity den­sity. The wave­form can be any com­plex num­ber, but the prob­a­bil­ity is pro­por­tional to the square of the mag­ni­tude of the wave­form. If the wave­form den­sity is 1, −1, i, or -i, the prob­a­bil­ity of see­ing the par­ti­cle there is the same.

• Quan­tum me­chan­ics ac­tu­ally has lead to some study of nega­tive prob­a­bil­ities, though I’m not fa­mil­iar with the de­tails. I agree that they don’t come up in the stan­dard sort of QM and that they don’t seem helpful here.

• I find it truly bizarre that no­body here seems to be tak­ing MWI se­ri­ously. That is, it’s not 1 per­son hand­ing over \$5 or not, it’s all the branch­ing pos­si­ble fu­tures of those pos­si­bil­ities. In other words, I hand over \$5, then de­pend­ing how my head ra­di­ates heat for the next sec­ond there are now many copies of me ex­pe­rienc­ing \$5-less-ness.

How many? Well, an­swer­ing that ques­tion may re­quire a the­ory of mag­i­cal re­al­ity fluid (or “mea­sure”), but naively speak­ing it seems that it should be some­thing more akin to 3^^^3 (or googol­plex) than to 3^^^^3. So the prob­lem may still ex­ist; but this MWI is­sue cer­tainly de­serves con­sid­er­a­tion, and the fact that Eliezer didn’t ap­par­ently con­sider it makes me sus­pi­cious that he hasn’t thought as deeply about this as he claims. Even if throw­ing this ad­di­tional fac­tor of 3^^^3 into the mix doesn’t dis­solve the prob­lem en­tirely, it may well put it into the range where fur­ther ar­gu­ments, such as earth­worm­chuck163′s “there aren’t 3^^^^3 differ­ent peo­ple”, could solve it.

• Any rea­son­ably use­ful de­ci­sion the­ory ought to work in New­to­nian wor­lds as well.

• Damn right! I wish I could trade some of my karma for ex­tra up­votes.

• (This com­ment was origi­nally writ­ten in re­sponse to shminux be­low, but it’s more di­rectly ad­dress­ing nshep­perd’s point, so I’m mov­ing it to here)

I un­der­stand that you’re ar­gu­ing that a good de­ci­sion the­ory should not rely on MWI. I ac­cept that if you can build one with­out that re­li­ance, you should; and, in that case, MWI is a red her­ring here.

But what if you can’t make a good de­ci­sion the­ory that works the same with or with­out MWI? I think that in that case there are an­thropic rea­sons that we should priv­ilege MWI. That is:

1. The fact that the uni­verse ap­par­ently ex­ists, and is ap­par­ently con­sis­tent with MWI, seems to in­di­cate that an MWI uni­verse is at least pos­si­ble.

2. If this uni­verse hap­pens to be “smaller than MWI” for some rea­son (for in­stance, we dis­cover a bet­ter the­ory to­mor­row; or, we’re ac­tu­ally in­side a sim that’s fak­ing it some­how), there is some prob­a­bil­ity that “MWI or larger” does ac­tu­ally ex­ist some­where else. (You can mo­ti­vate this by var­i­ous kinds of hand­wav­ing: from Teg­mark-Level-4 philoso­phiz­ing; to the ques­tion of how a smaller-than-MWI simu­la­tor could have de­cided that a pseudo-MWI sim would be in­ter­est­ing; and prob­a­bly other ar­gu­ments).

3. If in­tel­li­gence ex­ists in both “smaller than MWI” do­mains and “MWI or larger” do­mains, an­thropic ar­gu­ments strongly sug­gest that we should as­sume we’re in one of the lat­ter.

(And to sum­ma­rize, in di­rect re­sponse to nshep­perd:)

That’s prob­a­bly true. But it’s not a good ex­cuse to ig­nore how things would change if you are in an MWI world, as we seem to be.

• If your de­ci­sion the­ory doesn’t work in­de­pen­dently of whether MWI is true or not, then what do you use to de­cide if MWI is true?

And if your de­ci­sion the­ory does al­low for both pos­si­bil­ities (and even if MWI some­how solved Pas­cal’s Mug­ging, which I also dis­agree with) then you would still only win if you as­sign some­where around 1 in 3^^^3 prob­a­bil­ity to MWI be­ing false. On what grounds could you pos­si­bly make such a claim?

• I’m not say­ing I have a de­ci­sion the­ory at all. I’m say­ing that what­ever your de­ci­sion the­ory, MWI be­ing true or not could in prin­ci­ple change the an­swers it gives.

And if there is some chance that MWI is true, and some chance that it is false, the MWI pos­si­bil­ities have a fac­tor of ~3^^^3 in them. They dom­i­nate even if the chance of MWI is small, and far more so if the chance of it be­ing false is small.

• Wait, so you’re say­ing that if MWI is true, then keep­ing \$5 is not only as good as, but out­weighs sav­ing 3^^^3 lives by a huge fac­tor?

Does this also ap­ply to reg­u­lar mug­gers? You know, the gun-in-the-street, your-money-or-your-life kind? If not, what’s the differ­ence?

• No. I’m say­ing that if there’s (say) a 50% chance that MWI is true, then you can ig­nore the pos­si­bil­ity that it isn’t; un­less your de­ci­sion the­ory some­how nor­mal­izes for the to­tal quan­tity of peo­ple.

If you’ve de­cided MWI is true, and that mea­sure is not con­served (ie, as the uni­verse splits, there’s more to­tal re­al­ity fluid to go around), then keep­ing \$5 means keep­ing \$5 in some­thing like 3^^^3 or a google­plex or some­thing uni­verses. If Omega or Ma­trix Lord threat­ens to steal \$5 from 3^^^3 peo­ple in in­di­vi­d­ual, non-MWI sim-wor­lds, then that would … well, of course, not ac­tu­ally bal­ance things out, be­cause there’s a huge hand­wavy er­ror in the ex­po­nent here, so one or the other is go­ing to mas­sively dom­i­nate, but you’d have to ac­tu­ally do some heavy calcu­la­tion to try to figure out which side it is.

If there’s an or­di­nary mug­ger, then you have MWI go­ing on (or not) in­de­pen­dently of how you choose to re­spond, so it can­cels out, and you can treat it as just a sin­gle in­stance.

• If you’ve de­cided MWI is true, and that mea­sure is not con­served (ie, as the uni­verse splits, there’s more to­tal re­al­ity fluid to go around), then keep­ing \$5 means keep­ing \$5 in some­thing like 3^^^3 or a google­plex or some­thing uni­verses.

But if Pas­cal’s Mug­ger de­cides to tor­ture 3^^^3 peo­ple be­cause you kept \$5, he also does this in “some­thing like 3^^^3 or a google­plex or some­thing” uni­verses. In other words, I don’t see why it doesn’t always can­cel out.

• I ex­plic­itly said that mug­ger steal­ing \$5 hap­pens “in in­di­vi­d­ual, non-MWI sim-wor­lds”. I be­lieve that a given de­ter­minis­tic al­gorithm, even if it hap­pens to be run­ning in 3^^^3 iden­ti­cal copies, counts as an in­di­vi­d­ual world. You can stir in quan­tum noise ex­plic­itly, which effec­tively be­comes part of the al­gorithm and thus splits it into many sep­a­rate sims each with its own unique noise; but you can’t do that nearly fast enough to keep up with the quan­tum noise that’s be­ing stirred into real phys­i­cal hu­mans.

• Philos­o­phy ques­tions of what counts as a world aside, who told you that the mug­ger is run­ning some al­gorithm (de­ter­minis­tic or oth­er­wise)? How do you know the mug­ger doesn’t sim­ply have 3^^^3 phys­i­cal peo­ple stashed away some­where, ready to tor­ture, and prone to all the quan­tum branch­ing that en­tails? How do you know you’re not just con­fused about the im­pli­ca­tions of quan­tum noise?

If there’s even a 1-in-a-googol­plex chance you’re wrong about these things, then the di­su­til­ity of the mug­ger’s threat is still pro­por­tional to the 3^^^3-tor­tured-peo­ple, just di­vided by a mere googol­plex (I will be gen­er­ous and say that if we as­sume you’re right, the di­su­til­ity of the mug­ger’s threat is effec­tively zero). That still dom­i­nates ev­ery calcu­la­tion you could make...

...and even if it didn’t, the mug­ger could just threaten 3^^^^^^^3 peo­ple in­stead. Any counter-ar­gu­ment that re­mains valid has to scale with the num­ber of peo­ple threat­ened. Your ar­gu­ment does not so scale.

• At this point, we’re mostly both work­ing with differ­ent im­plic­itly-mod­ified ver­sions of the origi­nal prob­lem, and so if we re­ally wanted to get any­where we’d have to be a lot more spe­cific.

My origi­nal point was that a fac­tor of MWI in the origi­nal prob­lem might be non-neg­ligible, and should have been con­sid­ered. I am act­ing as the Devil’s Con­cern Troll, a po­si­tion which I claim is use­ful even though it bears a pretty low bur­den of proof. I do not deny that there are gap­ing holes in my ar­gu­ment as it re­lates to this post (though I think I am on sig­nifi­cantly firmer ground if you were fac­ing Galaxy Of Com­pu­tro­n­ium Wo­man rather than Ma­trix Lord). But I think that if you look at what you your­self are ar­gu­ing with the same skep­ti­cal eye, you’ll see that it is far from bul­let­proof.

Ad­mit it: when you read my ob­jec­tion, you knew the con­clu­sion (I am wrong) be­fore you’d fully con­structed the ar­gu­ment. That kind of goal-di­rected think­ing is ir­re­place­able for bridg­ing large gaps. But when it leads you to dis­miss fac­tors of 3^^^3 or a googol­plex as petty mat­ters, that’s mighty dan­ger­ous ter­ri­tory.

For in­stance, if MWI means some­one like you is le­gion, and the an­thropic ar­gu­ment means you are more likely to be that some­one rather than a non-MWI simu­lated pseudo-copy thereof, then you do have a per­ti­nent ques­tion to ask the Ma­trix Lord: “You’re ask­ing me to give you \$5, but what if some copies of me do and oth­ers don’t?” If it an­swers, for in­stance, “I’ve turned off MWI for the du­ra­tion of this challenge”, then the an­thropic im­prob­a­bil­ity of the situ­a­tion just sky­rock­eted; not by any­thing like enough to out­weigh the 3^^^^3 threat, but eas­ily by enough to out­weigh the im­prob­a­bil­ity that you’re just hal­lu­ci­nat­ing this (or that you’re just a fig­ment of the imag­i­na­tion of the Ma­trix Lord as it idly con­sid­ers whether to pose this prob­lem for real, to the real you).

Again: if you look for the weak­est, or worse, the most poorly-ex­pressed part of what I’m say­ing, you can eas­ily knock it down. But it’s bet­ter if you steel-man it; I don’t see where the cor­rect re­sponse could pos­si­bly be “Fac­tor of 3^^^3? Hadn’t con­sid­ered that ex­actly, but it’s prob­a­bly ir­rele­vant, let’s see how.”

On an even more gen­eral level, my larger point is that I find that mul­ti­plic­ity (both MWI and Teg­mark level 4) is a fruit­ful in­spira­tion for morals and de­ci­sion the­ory; more fruit­ful, in my ex­pe­rience, than simu­la­tions, Omega, Ma­trix Lords, and GOCW. Note that MWI and TL4, like Omega and GOCW, don’t have to be true or falsifi­able in or­der to be use­ful as in­spira­tion. My ex­pe­rience in­cludes think­ing about these mat­ters more than most, but cer­tainly less than peo­ple like Eliezer. Take that as you will.

• I think we’re talk­ing past each other, and fu­ture dis­cus­sion will not be pro­duc­tive, so I’m tap­ping out now.

• (Moved my re­ply, too)

But what if you can’t make a good de­ci­sion the­ory that works the same with or with­out MWI?

This con­tra­dicts the premise that MWI is untestable ex­per­i­men­tally, and is only a Bayesian ne­ces­sity, the point of view Eliezer seems to hold. In­deed, if an MWI-based DT sug­gests a differ­ent course of ac­tion than a sin­gle-world one, then you can test the ac­cu­racy of each and find out whether MWI is a good model of this world. If fur­ther­more one can show that no sin­gle-world DT is as ac­cu­rate as a many-world one, I will be con­vinced.

The fact that the uni­verse ap­par­ently ex­ists, and is ap­par­ently con­sis­tent with MWI, seems to in­di­cate that an MWI uni­verse is at least pos­si­ble.

It is also con­sis­tent with Chris­ti­an­ity and in­visi­ble pink uni­corns, why do you pre­fer to be MWI-mugged rather than Christ-mugged or uni­corn-mugged?

• This con­tra­dicts the premise that MWI is untestable experimentally

No it doesn’t. DT is about what you should do, es­pe­cially when we’re in­vok­ing Omega and Ma­trix Lords and the like. Which DT is bet­ter is not em­piri­cally testable.

It is also con­sis­tent with Chris­ti­an­ity and in­visi­ble pink unicorns

Yes, ex­cept that MWI is the best the­ory cur­rently available to ex­plain moun­tains of ex­per­i­men­tal ev­i­dence, while Chris­ti­an­ity is em­piri­cally dis­proven (“Look, wine, not blood!”) and in­visi­ble pink uni­corns (and in­visi­ble, pink ver­sions of Chris­ti­an­ity) are in­co­her­ent and un­falsifi­able.

(Later edit: “best the­ory cur­rently available to ex­plain moun­tains of ex­per­i­men­tal ev­i­dence” de­scribes QM in gen­eral, not MWI. I have a hard time imag­in­ing a ver­sion of QM that doesn’t in­clude some form of MWI, though, as shminux points out down­thread, the de­tails are far from be­ing set­tled. Cer­tainly I don’t think that there’s a lot to be gained by com­par­ing MWI to in­visi­ble pink uni­corns. Both have a p value that is nei­ther 0 nor 1, but the similar­ity pretty much ends there.)

• Which DT is bet­ter is not em­piri­cally testable.

You ought to no­tice your con­fu­sion by now.

MWI is the best the­ory cur­rently available to ex­plain moun­tains of ex­per­i­men­tal evidence

What is your level of un­der­stand­ing QM? Con­sider read­ing this post.

• Re DT: OK, I no­tice I am con­fused.

Re MWI: My un­der­stand­ing of QM is quite good for some­one who has never done the ac­tual math. I re­al­ize that there are oth­ers whose un­der­stand­ing is vastly bet­ter. How­ever, this de­bate is not about the equa­tions of QM per se, but about the mea­sure the­ory that tells you how “real” the differ­ent parts of them are. That is also an area where I’m no more than an ad­vanced am­a­teur, but it is also an area in which no­body in this dis­cus­sion has the hal­l­marks of an ex­pert. Which is why we’re us­ing terms like “re­al­ity fluid”.

• My un­der­stand­ing of QM is quite good for some­one who has never done the ac­tual math

And my vi­o­lin skills are quite good for some­one who has never done the ac­tual play­ing.

How­ever, this de­bate is not about the equa­tions of QM per se, but about the mea­sure the­ory that tells you how “real” the differ­ent parts of them are.

Differ­ent parts of what? Of equa­tions? They are all equally real: to­gether they form math­e­mat­i­cal mod­els nec­es­sary to de­scribe ob­served data.

Which is why we’re us­ing terms like “re­al­ity fluid”.

Eliezer is prob­a­bly the only one who uses that and the full term is “mag­i­cal re­al­ity fluid” or some­thing similar, named this way speci­fi­cally to re­mind him that he is con­fused about it.

• I have ac­tu­ally done the math for sim­ple toy cases like Bell’s in­equal­ity. But yeah, you’re right, I’m no ex­pert.

(Out of cu­ri­ousity, are you?)

Differ­ent parts of what?

ψ

• (Out of cu­ri­ousity, are you?)

I have a re­lated de­gree, if that’s what you are ask­ing.

ψ

I’m yet to see any­one writ­ing down any­thing more than a hand­wav­ing of this in MWI. Zurek’s ideas of eins­e­lec­tion and en­var­i­ance go some ways to­ward show­ing why only the eigen­states sur­vive when de­co­her­ence hap­pens, and there is some ex­per­i­men­tal sup­port for this, though the is­sue is far from set­tled.

• Pre­cisely; the is­sue is far from set­tled. That clearly doesn’t mean “any hand­wavy spec­u­la­tion is as good as any other” but it also doesn’t mean “spec­u­la­tion can be dis­missed out of hand be­cause we already un­der­stand this and you’re just wrong”.

• Sup­pose 3^^^3 copies of you are gen­er­ated in the first sec­ond af­ter you de­cide. Each one will have \$5 less as a re­sult of your de­ci­sion. (for the sake of ar­gu­ment, lets say your re­spon­si­bil­ity ends there) Let’s take a dol­lar as a util­ity unit, and say that by giv­ing the ma­trix lord \$5 you pro­duce 5x3^^^3 di­su­til­ity points across fu­ture wor­lds. But since ev­ery­one is pro­duc­ing copies at roughly the same rate (I think), any util­ity gained or lost is always mul­ti­plied by 3^^^3. This means that you can just can­cel the 3^^^3 busi­ness out: for ev­ery­one you benefit, the pos­i­tive util­ity points are also mul­ti­plied by 3^^^3, and so the re­sult is the same.

• Why was this down­voted? Be­cause ev­ery­one knows that Ma­trix Lord simu­la­tions don’t ac­tu­ally fol­low MWI, they just seem to for the poor de­luded sci­en­tists trapped in­side? Sure, I know that. But I was just say­ing, what if they did. Rid­dle me that, down­voter per­son!

Se­ri­ously: I’ve now posted var­i­ants of this idea (that MWI means we are all le­gion, which makes threats/​promises in­volv­ing simu­la­tions sig­nifi­cantly less scary/​en­tic­ing) at least 5 or 6 times, be­tween here and Quora. And it’s down­voted to oblivion ev­ery time. Now, ob­vi­ously, this makes me ques­tion whether there’s some­thing stupid about the idea. But though I’m gen­er­ally ac­knowl­edged to be not a stupid guy, I can’t see the fatal flaw. It’s very tempt­ing to think that you cats are all just too main­stream to see the light, man. That kind of think­ing has to over­come a large self-serv­ing­ness penalty, which is why I state it in ridicu­lous terms, but un­less some­one can talk me down here, I’m close to em­brac­ing it.

So: what is so very wrong about this thought? Aside from the fact that it em­braces two premises which are too un­con­ven­tional for non-LW’ers, but reaches a con­clu­sion that’s too main­stream for LW’ers?

And please, don’t down­vote this com­ment with­out re­spond­ing. I’m happy to take the karma penalty if I learn some­thing, but if all you get for be­ing wrong is down­voted, that’s just a dead end. So, to sweeten the pot: I will up­vote any even-min­i­mally-thought­ful re­sponse to this com­ment or to the one above.

• I didn’t down­vote, but I couldn’t see what MWI ac­tu­ally changed about the prob­lem. The simu­la­tions are also sub­ject to MWI, so you’re mul­ti­ply­ing both sides of the com­par­i­son by the same large num­ber. Hmm. Un­less the simu­la­tions are im­ple­mented on quan­tum com­put­ers, which would min­i­mize the branch­ing. It’s not clear to me that you can mimic the al­gorithm with­out hav­ing the same de­gree of to­tal de­co­her­ence.

• No, the simu­la­tions are not sub­ject to MWI. I mean, we don’t know what “ma­trix lord physics” is, but we have his word that there are 3^^^^3 in­di­vi­d­u­als in­side those simu­la­tions, and pre­sum­ably that’s af­ter any MWI effects are fac­tored in.

If in­stead of Ma­trix Lord, we were just fac­ing Galaxy Of Com­pu­tro­n­ium Wo­man, we’d be even bet­ter off. She can pre­sum­ably shift any given bit of her galaxy be­tween quan­tum and nor­mal com­pu­ta­tion mode, but it doesn’t help her. If GOCW is in nor­mal com­pu­ta­tion mode, her com­pu­ta­tions are de­ter­minis­tic and thus not mul­ti­plied by MWI. And if she’s in quan­tum mode, she only gets a mul­ti­plier pro­por­tional to an ex­po­nen­tial of the num­ber of qubits she’s us­ing. In or­der to get the full mul­ti­plier that or­di­nary made-of-mat­ter you are get­ting nat­u­rally, she has to simu­late ev­ery­thing about the quan­tum wave func­tion of ev­ery par­ti­cle in you and your en­vi­ron­ment. We don’t know how effi­cient her al­gorithms are for do­ing so, but pre­sum­ably it takes her more than a gram of com­pu­tro­n­ium to simu­late a gram of nor­mal mat­ter at that level of de­tail, and ar­guably much more. Ob­vi­ously she can do hy­brid quan­tum/​con­ven­tional tricks, but there’s noth­ing about the hy­bridiza­tion it­self that in­creases her mul­ti­plier.

• So you’re say­ing, what if MWI is just a lo­cal phe­nomenon to our world, and doesn’t ap­ply to these 3^^^^3 other simu­la­tions that the ma­trix lords are work­ing with, be­cause they aren’t quan­tum in the first place?

I agree that in the case of a mere galaxy of com­pu­tro­n­ium, it’s much less cred­ible that one can simu­late an ex­tremely high num­ber of peo­ple com­plex enough that we wouldn’t be able to prove that we aren’t them. In the former case, we’ve got much less in­for­ma­tion.

• Un­like Eliezer, I very pub­li­cly do not priv­ilege MWI on this site, but let’s as­sume that it’s “true” for the sake of ar­gu­ment. How many (sub­tly differ­ent) copies of you got offered the same deal? No way to tell. How many ac­cept or re­ject it? Who knows. If there are 3^...^^3 copies of you who ac­cepted, then the ma­trix lord has a lot of money (as­sum­ing they care for money) to do what it promised. But what if there are only 3^^^3 (or some other con­ve­niently “small” num­ber) of you who ac­cept? Then you are back to the origi­nal prob­lem. Un­til you have a be­liev­able model of this “mag­i­cal re­al­ity fluid”, adding MWI into the mix gives you noth­ing.

• (Note: this com­ment now moved to re­spond to nshep­perd above)

• But what if you can’t make a good de­ci­sion the­ory that works the same with or with­out MWI?

This con­tra­dicts the premise that MWI is untestable ex­per­i­men­tally, and is only a Bayesian ne­ces­sity, the point of view Eliezer seems to hold. In­deed, if an MWI-based DT sug­gests a differ­ent course of ac­tion than a sin­gle-world one, then you can test the ac­cu­racy of each and find out whether MWI is a good model of this world. If fur­ther­more one can show that no sin­gle-world DT is as ac­cu­rate as a many-world one, I will be con­vinced.

1. The fact that the uni­verse ap­par­ently ex­ists, and is ap­par­ently con­sis­tent with MWI, seems to in­di­cate that an MWI uni­verse is at least pos­si­ble.

it is also con­sis­tent with Chris­ti­an­ity and in­visi­ble pink uni­corns, why do you pre­fer to be MWI-mugged rather than Christ-mugged or uni­corn-mugged?

• Isn’t the thought that even if only one Ho­munq is offered the deal and ac­cepts, the next few sec­onds will gen­er­ate [in­sert some large num­ber] of wor­lds in which Ho­munq copies have \$5 less be­cause of that one origi­nal Ho­munq’s de­ci­sion? I don’t think Ho­munq means to re­fer to pre­ex­ist­ing other wor­lds (which couldn’t be af­fected by his ac­tions), but to the wor­lds that will be gen­er­ated just af­ter his de­ci­sion.

• They aren’t gen­er­ated. The one world would be split up among the re­sult­ing wor­lds. The mag­i­cal re­al­ity fluid (a.k.a. square am­pli­tude) is con­served.

• I strongly dis­agree that you can make that as­sump­tion; see my com­ment on your larger ex­pla­na­tion for why.

• Okay, thanks. But I don’t know what mag­i­cal re­al­ity fluid is, so I don’t re­ally un­der­stand you.

• Be­fore I an­swer, I’d like to know how much you do un­der­stand, so I can an­swer at an ap­pro­pri­ate level. Is this a ‘I don’t know what’s go­ing on here’ ques­tion, or is it a state­ment that you un­der­stand the sys­tem well enough that the ba­sics no longer are con­vinc­ingly ba­sic?

• The former, mostly. I’ve read the se­quences on this point and done a lit­tle side read­ing on my own, but I don’t un­der­stand the math and I have no real ed­u­ca­tion in quan­tum physics. In other words, I would re­ally ap­pre­ci­ate an ex­pla­na­tion, but I will also en­tirely un­der­stand if this is more work than you’re pre­pared to put in.

• To con­dense to a near-ab­surd de­gree:

QM in­di­cates that if you take any old state of the uni­verse, you can split it up any way you feel like. Take any state, and you can split it up as a sum of 2 or more other states (A = B + C + D+ E, say). If you then ‘run’ each of the parts sep­a­rately (i.e. calcu­late what the fu­ture state would be, yield­ing B’, C’, D’, E’) and then com­bine the re­sults by adding, it’s the same as if you ran the origi­nal (A’ = B’ + C’ + D’ + E’).

This is be­cause QM is a lin­ear the­ory. You can add and sub­tract and rescale en­tire states and those op­er­a­tions pass right through into the out­comes.

This doesn’t mean that you won’t get any sur­prises if you make pre­dic­tions based on just B, C, D, and E in­di­vi­d­u­ally, then add those to­gether. In gen­eral, with ar­bi­trary B, C, D, and E, com­bin­ing them can yield things that just don’t hap­pen when you’d ex­pect that they would based on the parts in­di­vi­d­u­ally (and other things that hap­pen more than you’d ex­pect, to com­pen­sate).

De­co­her­ence tells you how and when you can pick these B, C, D, and E so that you in fact won’t get any such sur­prises. That this is pos­si­ble is how we can per­ceive a clas­si­cal world made of the quan­tum world.

One tiny and in no way suffi­cient part of the tech­nique of de­co­her­ence to re­quire that B, C, D and E are all per­pen­dicu­lar to each other. What does that do? You can ap­ply the Pythagorean the­o­rem. When work­ing with vec­tors In gen­eral, with A be­ing the hy­potenuse and B, C, D, and E the per­pen­dicu­lar vec­tor com­po­nents, we get AA = BB + CC + DD + EE (try do­ing this with three vec­tors near the cor­ner of a room. Have a point sus­pended in air. Drop a line to the floor. Con­struct a right tri­an­gle from that point to one of the walls. You’ll get AA = WW + ZZ, then split W into X and Y, for AA = XX + YY + ZZ)

Any­way, what the Pythagorean the­o­rem says is that if you take a vec­tor and split it up into per­pen­dicu­lar com­po­nents, one thing that stays the same is the sum of the squared mag­ni­tudes.

And it turns out that if you do the math, the math­e­mat­i­cal struc­ture that works like prob­a­bil­ity in QM-with-de­co­her­ence is pro­por­tional to this squared mag­ni­tude. This is the ba­sis of call­ing this square mag­ni­tude ‘re­al­ity fluid’. It seems to be the mea­sure of how much some­thing ac­tu­ally hap­pens—how real it is.

• Thanks, that’s re­ally quite helpful. I take it then that the prob­lem with Ho­munq’s ob­jec­tion is that all the sub­se­quent ‘wor­lds’ would have the same to­tal re­al­ity fluid as the one in which he made the dis­tinc­tion, and so the ‘split­ting’ wouldn’t have any real effect on the to­tal util­ity: \$5 less for one per­son with re­al­ity R is the same di­su­til­ity as \$5 less for a [large num­ber of] peo­ple with re­al­ity R/​[large num­ber]?

But maybe that’s not right. At the end, you talked about ‘how much re­al­ity fluid some­thing has’ as be­ing a mat­ter of how much some­thing hap­pens. This makes sense as a way of talk­ing about events, but what about sub­stances? I gather that sub­stances like peo­ple don’t see much play in the math of QM (and have no role in physics at all re­ally), but in this case the ques­tions seems rele­vant.

• Your first para­graph is cor­rect.

As for the sec­ond, well, sub­stances are kind of made of colos­sal num­bers of events in a con­ve­nient pat­tern such that it’s use­ful to talk about the pat­tern. Like, I’m not fal­ling through my chair over and over and over again, and I an­ti­ci­pate this con­tin­u­ing to be the case… that and a bunch of other things lead me to think of the chair as sub­stan­tial.

• sub­stances are kind of made of colos­sal num­bers of events...

Right, but I’m not some­thing that hap­pens. The con­tinu­a­tion of me into the next sec­ond might be some­thing that hap­pens, and so we might say that this con­tinu­a­tion have more or less re­al­ity fluid, but I don’t know that the same can be said of me sim­plic­iter. You might think that I am in fact some­thing that hap­pens, a se­ries or pat­tern of events, but I think this a claim that would at least need some work­ing out: one im­pli­ca­tion of this claim is that it takes time (in the way a mo­tion takes time) to be me. But this is off the QM (maybe off the sci­en­tific) path, and I should say I very much ap­pre­ci­ate your time thus far. I can’t take it per­son­ally if you don’t want to join me in some arm­chair spec­u­la­tion.

• Your thoughts are things that hap­pen. What­ever’s do­ing those is you. I don’t see the prob­lem.

• 14 May 2013 14:43 UTC
−1 points
Parent

But it seems prob­le­matic to say that I am my thoughts. I seem to per­sist in time de­spite changes in what I think, for ex­am­ple. Afew days ago, I thought wor­lds were ‘gen­er­ated’ on the MWI view. I now no longer think that. I’m differ­ent as a re­sult, but I’m not a differ­ent per­son. I wasn’t de­stroyed, or re­made. (I don’t mean this to be a point speci­fi­cally about hu­man per­sonal iden­tity, this should ap­ply to an­i­mals and plants and maybe blocks of wood too).

To re­it­er­ate my con­cern in the grand­par­ent, if my thoughts are a pro­cess that takes time (as they seem to be), and I am my thoughts, then it takes time to be me. Be­ing me would then be some­thing in­ter­rupt­ible, so that I could only get half way to be­ing me. This is at least odd.

I don’t mean to sug­gest that this is a knock down ar­gu­ment or any­thing, it’ not. It’s lit­tle more than an arm­chair ob­jec­tion on the ba­sis of nat­u­ral lan­guage. But it’s the sort of thing for which this the­ory should have an an­swer. We might just dis­cover that the tem­po­ral per­sis­tance or iden­tity of macro­scopic ob­jects is a phys­i­cally in­co­her­ent idea (like iden­tity based on hav­ing a cer­tain set of atoms). But if we do dis­cover some­thing rad­i­cal like that, we should have some­thing to say to ward off the idea that we’ve just mi­s­un­der­stood the ques­tion or changed the topic. Again, thanks for your in­dul­gence.

• You are a 4-di­men­sional re­gion of space­time. What you nor­mally call ‘you’ is a mu­tu­ally-spacelike-sep­a­rated cut of this 4-di­men­sional re­gion, but the whole rea­son for call­ing this slice spe­cial is be­cause of causal chains that have ex­tent in time. For in­stance, your hand is con­sid­ered yours be­cause your brain can tell it what to do*. That causal chain takes time to roll out.

• if each of us had a part­ner and could con­trol the other’s hands, the terms would prob­a­bly soon switch so that your hands are the pair on their body, not the pair on your own body.

• Do you think there is a mean­ingful dis­tinc­tion to be drawn be­tween the kinds of things I can talk about via mu­tu­ally-spacelike cuts (like ar­range­ments, shapes, trom­bones, maybe dogs) ver­sus the kinds of things I can­not talk about via mu­tu­ally-spacelike cuts, like the mo­tion of a fast-ball, Beethoven’s Ode to Joy, or the life of a star? Pro­cesses that take time ver­sus...I donno, things?

I ask be­cause nat­u­ral lan­guage and my ev­ery­day ex­pe­rience of the world (un­re­li­able or ir­rele­vant though they may be to the ques­tion of phys­i­cal re­al­ity) makes a great deal of fuss over this dis­tinc­tion.

• There is a dis­tinc­tion, and you just gave it—some things are defined by their pro­cesses, and some things are not. Imag­ine in­stan­ta­neously re­duc­ing some­thing to an ar­bi­trar­ily low tem­per­a­ture and leav­ing it that way for­ever as a sub­sti­tute for stop­ping time, and see if the thing still counts as the same thing (this rule of thumb is not guaran­teed to ap­ply in all cases).

A frozen hu­man body is not a hu­man. It’s the corpse of a now-de­funct hu­man (will stay this way for­ever, so no cry­onic restora­tion). So, the life—a pro­cess—is part of the defi­ni­tion of ‘hu­man’. BUT since it was done in­stan­ta­neously you could say it’s a corpse with a par­tic­u­lar ter­mi­nal men­tal state.

A trom­bone or tri­an­gle that’s re­duced to ep­silon kelv­ins is just a cold trom­bone or tri­an­gle.

A com­puter re­mains a com­puter, but it ceases to have any role-based iden­tities like ’www.less­wrong.com′ or 230.126.52.85 (to name a ran­dom IP ad­dress). But, like the corpse, you can say it has a mem­ory state cor­re­spond­ing to such roles.

• Very in­ter­est­ing an­swer, thank you. So, for those things not defined by pro­cesses, is it un­prob­le­matic to talk about their be­ing more or less real in terms of re­al­ity fluid?

• Well, we haven’t ex­actly nailed down the ul­ti­mate na­ture of this mag­i­cal re­al­ity fluid, but I don’t think that whether you define an ob­ject by shape or pro­cess changes how the mag­i­cal re­al­ity fluid con­cept ap­plies.

• Alright, thanks for your time, and for cor­rect­ing me on the MWI point. I found this very in­ter­est­ing and helpful.

• if my thoughts are a pro­cess that takes time (as they seem to be), and I am my thoughts, then it takes time to be me. Be­ing me would then be some­thing in­ter­rupt­ible, so that I could only get half way to be­ing me.

What’s this “me” thing? Your thoughts are most likely re­ducible to an ar­range­ment of neu­rons, their con­nec­tions and elec­tric po­ten­tials and chem­i­cal pro­cesses (ion chan­nels open­ing and clos­ing, Cal­cium and other ions go­ing in and out of den­drites, elec­tric po­ten­tial ris­ing and fal­ling, elec­tric im­pulses trav­el­ing back and forth, pro­teins and other sub­stances be­ing cre­ated, de­posited and re­moved, etc.) Some of these pro­cesses are com­pletely de­ter­minis­tic, oth­ers are chaotic, yet oth­ers are quan­tum-ran­dom (for ex­am­ple, ion chan­nel open­ing and clos­ing is due to quan­tum-me­chan­i­cal tun­nel­ing effects). In that sense, your thoughts do take time, as it takes time for chem­i­cal and elec­tri­cal effects to run their course. But what do you mean by “it takes time to be me”?

• Let’s drop the talk of peo­ple, that’s too com­pli­cated. Really, I’m just ask­ing about how ‘re­al­ity fluid’ talk gets ap­plied to ev­ery­day things as op­posed to ‘hap­pen­ings’. The claim on the table is that ev­ery­day things (in­clud­ing peo­ple) are hap­pen­ings, and I’m wor­ried about that.

Sup­pose ‘be­ing a com­bus­tion en­g­ine’ meant ac­tu­ally firing a pis­ton and ro­tat­ing the drive shaft 360 de­grees. If that what it meant to be a com­bus­tion en­g­ine, then if I in­ter­rupted the ac­tion of the pis­ton af­ter it had only ro­tated the drive shaft 180 de­grees, the thing be­fore me wouldn’t be a com­bus­tion en­g­ine. At best it would be sort of half way there. The rea­son be­ing that on this ac­count of com­bus­tion en­g­ines, it takes time to be a com­bus­tion en­g­ine (speci­fi­cally, the time it takes for the drive shaft to ro­tate 360 de­grees).

If we did talk about com­bus­tion en­g­ines this way, for ex­am­ple, it wouldn’t be pos­si­ble to point to a com­bus­tion en­g­ine in a pho­to­graph. We could point to some­thing that might be a sort of tem­po­ral part of a com­bus­tion en­g­ine, but a pho­to­graph (which shows us only a mo­ment of time) couldn’t cap­ture a com­bus­tion en­g­ine any more than it could cap­ture a piece of mu­sic, or the ro­ta­tion of a ball, or a free throw or any­thing that con­sists in be­ing a kind of mo­tion.

But, at least so far as I know, a com­bus­tion en­g­ine, un­like a mo­tion, is not di­visi­ble into tem­po­ral parts. If all hap­pen­ings take time and are di­visi­ble into tem­po­ral parts, and if com­bus­tion en­g­ines are not so di­visi­ble, then com­bus­tion en­g­ines are not hap­pen­ings. If they’re not hap­pen­ings, how does ‘re­al­ity fluid’ talk ap­ply to them?

EDIT:

yet oth­ers are quan­tum-ran­dom (for ex­am­ple, ion chan­nel open­ing and clos­ing is due to quan­tum-me­chan­i­cal tun­nel­ing effects).

Really? That’s fas­ci­nat­ing, I have to look that up.

• a com­bus­tion en­g­ine, un­like a mo­tion, is not di­visi­ble into tem­po­ral parts

A com­bus­tion en­g­ine is de­ter­minis­tic. The be­hav­ior of a com­bus­tion en­g­ine is defined by the un­der­ly­ing physics. If prop­erly de­signed, tuned and started as pre­scribed, it will cause the drive shaft to ro­tate a num­ber of turns. A com­plete speci­fi­ca­tion of the en­g­ine is enough to pre­dict what it will do. If you de­sign some­thing that gets stuck af­ter half a turn, it’s not what most peo­ple would con­sider a proper com­bus­tion en­g­ine, de­spite out­ward ap­pear­ances. If you want to use the term “re­al­ity fluid”, then its flow is de­ter­mined by the ini­tial con­di­tions. You can call this flow “mo­tion” if you like.

• I think you think I’m say­ing some­thing much more com­pli­cated than what I’m try­ing to say. Noth­ing I’m say­ing has any­thing to do with pre­dic­tion, de­sign, de­ter­minism, (not that I know of, any­way) and I’m cer­tainly not say­ing that ‘re­al­ity fluid’ moves. By ‘mo­tion’ I mean what hap­pens when you throw a base­ball.

The dis­tinc­tion I’m try­ing to draw is this: on the one hand, some things take time and have tem­po­ral parts (like a piece of mu­sic, a walk in the park, the life-cy­cle of a star, or the elec­tro­chem­i­cal pro­cesses in a neu­ron). Call these pro­cesses. Th­ese are op­posed, on the other hand, to things which so far as I can see, don’t have tem­po­ral parts, like a trom­bone, a dog, an in­ter­nal com­bus­tion en­g­ine, or a star. Call these fubs (I don’t have a good name).

If re­al­ity fluid is a way of talk­ing about de­co­her­ence, and de­co­her­ence talk always in­volves dis­tinc­tions of time, then can we use re­al­ity fluid talk to talk about how real fubs are? We could if all fubs were re­ducible to pro­cesses. That would be a sur­pris­ing re­sult. Are all fubs re­ducible to pro­cesses? If so, is this an elimi­na­tive re­duc­tion (fun­da­men­tally, there are no fubs)? If not...well, if not I have some other, even weirder ques­tions.

• You seem to have a philo­soph­i­cal ap­proach to this, while I pre­fer in­stru­men­tal re­duc­tion­ism. If a col­lec­tion of “fubs” plus the rules of their be­hav­ior pre­dict what these fubs do at any point in time, why do you need to worry about some “tem­po­ral parts”? If you take an MP3 file and a mu­sic player and press “start”, you will have mu­sic play­ing. If this time stuff sounds mys­te­ri­ous, con­sider Eliezer’s time­less pic­ture, where these fubs are slices of the flow. You can gen­er­al­ize it some­what to quan­tum things, but there will be gaps (de­nied by hand­wav­ing MWIers, ex­plicit in shut-up-and-calcu­late), hence the prob­a­bil­is­tic na­ture of it.

• You seem to have a philo­soph­i­cal ap­proach to this, while I pre­fer in­stru­men­tal re­duc­tion­ism.

We share the im­pres­sion that the right an­swer will be a re­duc­tive, em­piri­cally grounded one. We might differ on the in­stru­men­tal­ism part: I re­ally do want to know what the fur­ni­ture of the uni­verse is. I have no in­tended use for such knowl­edge, and its pre­dic­tive power is not so im­por­tant. So far as I un­der­stand in­stru­men­tal­ism, you might just re­ply that I’m bark­ing up the wrong tree. But in case I’m not...

But let me ask this ques­tion again di­rectly, be­cause I think I need an an­swer to un­der­stand where you’re com­ing from: are fubs (ev­ery­day ob­jects like ta­bles and chairs and peo­ple, or if you like el­e­men­tary par­ti­cles or what­ever) re­ducible to pro­cesses at some level of phys­i­cal ex­pla­na­tion? Or is the whole idea of a fub in­co­her­ent? Is the ques­tion some­how in­co­her­ent? Or would you guess that when we ar­rive at the right phys­i­cal the­ory, it will in­clude refer­ence to both pro­cesses (like de­co­her­ence, mo­tion, heat­ing, etc.) and fubs?

• are fubs (ev­ery­day ob­jects like ta­bles and chairs and peo­ple, or if you like el­e­men­tary par­ti­cles or what­ever) re­ducible to pro­cesses at some level of phys­i­cal ex­pla­na­tion?

Hmm, I’m not sure how to avoid re­peat­ing my­self. I’ve already said, and so has Luke_A_Somers, that “fubs” are 3d spa­tial slices of 4d space­time re­gions. If this state­ment does not make sense to you, we can try to dis­sect it fur­ther. is there a par­tic­u­lar part of it that is prob­le­matic?

• I’ve already said, and so has Luke_A_Somers, that “fubs” are 3d spa­tial slices of 4d space­time re­gions.

Ah! I didn’t catch that. Thanks. Sup­pose a man-made satel­lite (Fubly 1) is re­leased into (non-geosyn­chronous) or­bit around the earth di­rectly over Phoenix, Ari­zona. Each time it or­bits the earth, it passes over Phoenix, and we can count its or­bits this way. One or­bit of Fubly 1 is ex­tended in time in the sense that it takes one month (say) to get around the whole planet. In any time less than one month, the or­bit is in­com­plete. So the or­bit of Fubly 1 is tem­po­rally di­visi­bile in the sense that if I di­vide it in half, I get two things nei­ther of which is an or­bit of Fubly 1, but both of which are parts of an or­bit of Fubly 1.

Now, Fubly 1 it­self seems differ­ent. Sup­pose Fubly 1 only com­pletes one or­bit and then is de­stroyed. Sup­pos­ing it’s as­sem­bled and then im­me­di­ately re­leased, the spa­ciotem­po­ral re­gion that is Fubly 1 and the spa­ciotem­po­ral re­gion that is the or­bit of Fubly 1 have the same ex­ten­sion in time. If I di­vide the spa­ciotem­po­ral re­gion of the or­bit in half, time-wise, I get two halves of an or­bit. If I di­vide the spa­cio-tem­po­ral re­gion of Fubly 1 it­self, I don’t get two halves of a satel­lite. Fubly 1 can’t be di­vided time-wise in the way its or­bit and its lifes­pan can. Does that make any sense? My ques­tion, in case it does, is this ’Is the dis­tinc­tion I’ve just made likely to be mean­ingful in the cor­rect physics, or is this a mere ar­ti­fact of in­tu­ition and nat­u­ral lan­guage?

• Fubly 1 can’t be di­vided time-wise in the way its or­bit and its lifes­pan can.

It’s already the re­sult of such a di­vi­sion. As for or­bits and lifes­pans, they are not phys­i­cal ob­jects but rather log­i­cal ab­strac­tions, just like lan­guage is (as op­posed to the air re­leased from the mouth of the speaker and the pres­sure waves hit­ting the ear of the listener).

• It’s already the re­sult of such a di­vi­sion.

If you mean that Fubly 1 is a given 3d slice, can Fubly 1 per­sist through time? I mean that if we take two tem­po­rally differ­ent 3d slices (one at noon, the other at 1:00PM), would they be the same Fubly 1? I sup­pose if we were to call them ‘the same’ it would be in virtue of a same­ness of their 3d prop­er­ties, ab­stracted from their tem­po­ral po­si­tions.

• I don’t know what same­ness is, sorry. It’s not a defi­ni­tion I have en­coun­tered in physics, and SEP is silent on the is­sue, as well. I sort of un­der­stand it in­tu­itively, but I am not sure how you for­mal­ize it. Maybe you can think about it in terms of the non-con­ser­va­tion of the coarse grained area around the evolved dis­tri­bu­tion func­tion, similar to the way Eliezer dis­cussed the Liou­ville the­o­rem in his Quan­tum Se­quence. Maybe similar ar­eas cor­re­spond to more same­ness, or some­thing. But this is a wild spec­u­la­tion, I haven’t tried to work through this.

• Well, thanks for dis­cussing it, I ap­pre­ci­ate the time you took. I’ll look over that se­quence post.

• Good ex­pla­na­tion. But you’re as­sum­ing a the­ory in which “re­al­ity fluid” is con­served. To me, that seems ob­vi­ously wrong (and thus even more ob­vi­ously un­proven). I mean, if that were true, my ex­pe­riences would be get­ting rapidly and ex­po­nen­tially less real as time pro­gresses and I de­co­here with more and more parts of the wave func­tion.

I ac­knowl­edge that it is difficult to make prob­a­bil­ity work right in MWI. I have an in­tu­itive un­der­stand­ing which feels as if it works to me, that does not con­serve “re­al­ity fluid”; but I’m not so un­wise as to imag­ine that a solid in­tu­ition is worth a hill of beans in these do­mains. But again, your the­ory where “re­al­ity fluid” is equal to squared am­pli­tude seems to me prob­a­bly prov­ably wrong, and definitely not proven right. And it was not the as­sump­tion I was work­ing un­der.

• But you’re as­sum­ing a the­ory in which “re­al­ity fluid” is con­served.

Well, yes, I’m as­sum­ing that QM is cor­rect. That’s kind of the point: we’re talk­ing about pre­dic­tions of QM.

I mean, if that were true, my ex­pe­riences would be get­ting rapidly and ex­po­nen­tially less real as time pro­gresses and I de­co­here with more and more parts of the wave func­tion.

No… why do you think that you would be able to feel it? It seems to me rather like the ar­gu­ment that the Earth can’t be mov­ing since we don’t feel a strong wind.

An im­por­tant part of QM be­ing a lin­ear the­ory is that it is 100% in­de­pen­dent of over­all am­pli­tude. Scale ev­ery­thing up or down by an ar­bi­trary (finite nonzero) fac­tor and all the bits on the in­side work ex­actly the same.

So, whether some­thing likely hap­pens or some­thing un­likely hap­pens, the only differ­ence be­tween those two out­comes is a mat­ter of scale and what­ever it was that hap­pened differ­ently.

• QM has no “re­al­ity fluid”. The whole point of call­ing it “re­al­ity fluid” is to re­mind your­self that it’s stand­ing in for some as­sump­tions about mea­sure the­ory which are fuzzy and un­proven.

My own (equally fuzzy and un­proven) no­tion about mea­sure the­ory is that any­thing which has nonzero am­pli­tude, ex­ists. Yes, you can then ask why prob­a­bil­is­tic pre­dic­tions seem to work, while my mea­sure the­ory would seem to sug­gest that ev­ery­thing should be 5050 (“maybe it hap­pens, maybe it doesn’t; that’s 50/​50”). But I be­lieve that there is some form of en­tropy in the wave func­tion, and that prob­a­ble out­comes are high-en­tropy out­comes. No, I ob­vi­ously don’t have the math on this worked out; but nei­ther do you on the “re­al­ity fluid”.

I could eas­ily be wrong. So could you. Prob­a­bly, we both are. Mea­sure the­ory is not a solved prob­lem.

• QM may not have ‘re­al­ity fluid’, but the thing we’re tongue-in-cheek call­ing ‘re­al­ity fluid’ is con­served un­der QM!

• I don’t think Ho­munq means to re­fer to pre­ex­ist­ing other wor­lds (which couldn’t be af­fected by his ac­tions), but to the wor­lds that will be gen­er­ated just af­ter his de­ci­sion.

Right, I should have been clearer. What I meant is that s/​he is priv­ileg­ing one as­pect of MWI from uni­mag­in­ably many, and I sim­ply pointed out an­other one just as valid, but one that s/​he over­looked. Once you start spec­u­lat­ing about the struc­ture of Many Wor­lds, you can come up with as many points and coun­ter­points as you like, all on the same foot­ing (of the same com­plex­ity).

• I don’t think I had over­looked the point you brought up: I said ”...naively speak­ing it seems that [MWI] should be some­thing more akin to 3^^^3 (or googol­plex) than to 3^^^^3. So the prob­lem may still ex­ist...”

As to the idea that ev­ery­thing is just a hope­less mess once you bring MWI into it: that may in­deed be a rea­son that this en­tire dis­cus­sion is ir­re­solv­able and pointless, or it may be that the “MWI” fac­tors pre­cisely bal­ance out on ei­ther side of the ar­gu­ment; but there’s no rea­son to as­sume that ei­ther of those is true un­til you’ve ex­plored the is­sue care­fully.

• As I said, I don’t think MWI leads to re­ally large num­bers of copies; back-of-the-en­velope calcu­la­tions sug­gest it should be “closer to” 3^^^3 or googlplex than to 3^^^^3. So yes: I tried to in­di­cate that this idea does NOT solve the dilemma on its own. How­ever, even if 3^^^^3 is so big as to make 3^^^3 look tiny, the lat­ter is still not neg­ligible, and de­serves at least a men­tion. If Eleizer had men­tioned it and dis­missed it, I would have no ob­jec­tion. But I think it is no­table that he did not.

For in­stance: Say that there earth­worm­chuck163 is right and there are fewer than 3^^^^3 in­tel­li­gent be­ings pos­si­ble be­fore you start to du­pli­cate. For in­stance say it’s (x^^^x)^y, and that due to MWI there are (x^^^x) copies of a reg­u­lar hu­man spawned per fort­night. So MWI is re­duc­ing Ma­trix Lord’s threat from (x^^^x)^y to (x^^^x)^(y-1). Doesn’t seem like a big change; but if you sup­pose that only one of them is de­ci­sive for this par­tic­u­lar Ma­trix Lord threat, you’ve just changed the cost/​benefit ra­tio from or­der-of-1 to or­der-of-1/​(x^^^x), which is a big shift.

I know that there are a num­ber of pos­si­ble ob­jec­tions to that spe­cific ar­gu­ment. For in­stance, it’s rely­ing on the sym­me­try of in­tel­li­gence; if Ma­trix Lord were offer­ing 3^^^^3 pa­per­clips to clippy, it wouldn’t help figure out the clip­per­ific thing to do. The in­tent is not to make a con­vinc­ing ar­gu­ment, but sim­ply to demon­strate that a fac­tor on the or­der of x^^^x can in prin­ci­ple be sig­nifi­cant, even when the threat is on the or­der of 3^^^^3.

• TL:DR

If a poorly-dressed street per­son offers to save 10(10^100) lives (googol­plex lives) for \$5 us­ing their Ma­trix Lord pow­ers, and you claim to as­sign this sce­nario less than 10-(10^100) prob­a­bil­ity, then ap­par­ently you should con­tinue to be­lieve ab­solutely that their offer is bo­gus even af­ter they snap their fingers and cause a gi­ant silhou­ette of them­selves to ap­pear in the sky.

I don’t see why this is nec­es­sar­ily the case. What am I miss­ing here?

Here is a Sum­mary of what I un­der­standso far

A “cor­rect” episte­mol­ogy would satisfy our in­tu­ition that we should ig­nore the Pas­cal’s Mug­ger who doesn’t show any ev­i­dence, and pay the Ma­trix Lord, who snaps his fingers and shows his power.

The prob­lem is that no mat­ter how low a prob­a­bil­ity we as­sign to the mug­ger tel­ling the truth, the mug­ger can name an ar­bi­trar­ily large num­ber of peo­ple to save, and thus make it worth it to pay him any­way. If we weigh the mug­ger’s claim at in­finites­i­mally small, how­ever, we won’t be suffi­ciently con­vinced by the Ma­trix Lord’s ev­i­dence.

The mat­ter is fur­ther com­pli­cated by the fact that the num­ber of peo­ple Ma­trix Lord claims to save sug­gests a uni­verse which is so com­plex that it gets a ma­jor com­plex­ity penalty.

Here is my At­tempt at solution

Here is the set of all pos­si­ble uni­verses

Each pos­si­ble uni­verse has a prob­a­bil­ity. They all add up to one. Since there are in­finite pos­si­ble uni­verses, many of these uni­verses have in­finites­i­mally low prob­a­bil­ity. Bayes the­o­rem ad­justs the prob­a­bil­ity of each.

The Ma­trix Lord /​ per­son turn­ing into a cat sce­nario is such that a uni­verse which pre­vi­ously had an in­finites­i­mally low prob­a­bil­ity now has a rather large like­li­hood.

What hap­pens when a per­son turns into a cat?

All of the most likely hy­poth­e­sis are sud­denly elimi­nated, and ev­ery­thing changes.

Work­ing through some ex­am­ples to demon­strate that this is a solution

You have mod­els U1, U2, U3...and so on. P(Un) is the prob­a­bil­ity that you live in Uni­verse n. Your cur­rent pri­ors:

P(U1) = 60%

P(U2) = 30%

P(U3) = epsilon

P(U4) = delta

...and so on.

Mr. Ma­trix turns into a cat or some­thing. Now our hy­poth­e­sis space is as fol­lows:

P(U1) = 0

P(U2) = 0

P(U3) = 5% (pre­vi­ously Ep­silon)

P(U4) = delta

In essence, the ut­ter elimi­na­tion of all the re­motely likely hy­poth­e­sis sud­denly makes sev­eral uni­verses which were pre­vi­ously ep­silon/​delta/​ar­bi­trar­ily small in prob­a­bil­ity much more con­vinc­ing.

Ba­si­cally, if the sce­nario with the Time Lord hap­pened to us, we aught to act in ap­prox­i­mately the same way that the ideal­ized “ra­tio­nal agent” would act if it were given no in­for­ma­tion what­so­ever (so all prior prob­a­bil­ities are as­signed us­ing com­plex­ity alone), and then a voice from the sky sud­denly speci­fies a hy­poth­e­sis of ar­bi­trar­ily high com­plex­ity from the space of pos­si­ble uni­verses and claims that it is true.

Come to think of it, you might even think of your cur­rent mem­o­ries as play­ing the role of the “voice from the sky”. There is no meta-prior say­ing you should trust your mem­o­ries, but you have noth­ing else. Similarly, when Mr. Ma­trix turned into a cat, he elimi­nated all your non-ex­tremely-un­likely hy­pothe­ses, so you have noth­ing to go on but his word.

• Eliezer:

But to con­clude some­thing whose prior prob­a­bil­ity is on the or­der of one over googol­plex, I need on the or­der of a googol bits of ev­i­dence, and you can’t pre­sent me with a sen­sory ex­pe­rience con­tain­ing a googol bits. In­deed, you can’t ever pre­sent a mor­tal like me with ev­i­dence that has a like­li­hood ra­tio of a googol­plex to one—ev­i­dence I’m a googol­plex times more likely to en­counter if the hy­poth­e­sis is true, than if it’s false—be­cause the chance of all my neu­rons spon­ta­neously re­ar­rang­ing them­selves to fake the same ev­i­dence would always be higher than one over googol­plex. You know the old say­ing about how once you as­sign some­thing prob­a­bil­ity one, or prob­a­bil­ity zero, you can never change your mind re­gard­less of what ev­i­dence you see? Well, odds of a googol­plex to one, or one to a googol­plex, work pretty much the same way.

• But to con­clude some­thing whose prior prob­a­bil­ity is on the or­der of one over googol­plex, I need on the or­der of a googol bits of ev­i­dence, and you can’t pre­sent me with a sen­sory ex­pe­rience con­tain­ing a googol bits.

Huh? You don’t need to con­clude any­thing whose prior prob­a­bil­ity was “on the or­der of one over googol­plex.”

You just need to be­lieve it enough that it out-com­petes the sug­gested ac­tions of any of the other hy­pothe­ses...and nearly all the hy­poth­e­sis which had, prior to the mirac­u­lous event, non-neg­ligible like­li­hood just got falsified, so there is very lit­tle com­pe­ti­tion...

Even if the prob­a­bil­ity of the Ma­trix lord tel­ling the truth is 1%, you’re still go­ing to give him the five dol­lars, be­cause there are in­finite ways in which he could lie.

In fact, even if the uni­verses in which the Ma­trix Lord is ly­ing are all sim­pler than the one in which he is tel­ling the truth, the ac­tions pro­posed by the var­i­ous kinds of lie-uni­verses can­cel each other out. (In one lie-uni­verse, he ac­tu­ally saves only one per­son, in an­other equally likely lie-verse, he ac­tu­ally kills one per­son, and so on)

When a ra­tio­nal agent makes the de­ci­sion, it calcu­lates the ex­pected value of the in­tended ac­tion over ev­ery pos­si­ble uni­verse, weighted by prob­a­bil­ity.

By anal­ogy:

If I tell you I’m go­ing to pick a ran­dom nat­u­ral num­ber, and I ad­di­tion­ally tell you that there is a 1% chance that I pick “42”, and ask you to make a bet about which num­ber comes up. You are go­ing to bet “42″, be­cause the chance that I pick any other num­ber is ar­bi­trar­ily small...you can even try giv­ing larger num­bers a com­plex­ity penalty, it won’t change the prob­lem. Any ev­i­dence for any num­ber that brings it up above “ar­bi­trar­ily small” will do.

the chance of all my neu­rons spon­ta­neously re­ar­rang­ing them­selves to fake the same ev­i­dence would always be higher than one over googol­plex.

Anal­ogy still holds. Just pre­tend that there is a 99% chance that you mis­heard me when I said “42”, and I might have said any other num­ber. You still end up bet­ting on 42.

• In­for­mally speak­ing, it seems like su­per­ex­po­nen­tial num­bers of peo­ple shouldn’t be pos­si­ble. If a per­son is some par­tic­u­lar type of com­pu­ta­tion, and ex­actly iden­ti­cal copies of a per­son should only count once, then num­ber of peo­ple is bounded by num­ber of unique com­pu­ta­tions (ex­po­nen­tial). It does not seem like the raw Kol­mogorov com­plex­ity of the num­ber will be the right com­plex­ity penalty if each per­son has to be a differ­ent com­pu­ta­tion.

• I ques­tion whether keep­ing prob­a­bil­ities sum­ming to one is a valid jus­tifi­ca­tion for act­ing as if the mug­ger be­ing hon­est has a prob­a­bil­ity of roughly 1/​3^^^3. Since we know that due to our im­perfect rea­son­ing, the prob­a­bil­ity is greater than 1/​3^^^3, we know that the ex­pected value of giv­ing the mug­ger \$5 is uni­mag­in­ably large. Of course, ac­knowl­edg­ing this fact causes our prob­a­bil­ities to sum to above one, but this seems like a small price to pay.

Edit: Could some­one ex­plain why I’ve lost points for this?

• You lost points be­cause noth­ing you said even be­gins to ad­dress the prob­lem. You seem to be ar­gu­ing that con­tra­dict­ing our­selves isn’t that bad, which might be defen­si­ble if we ob­served that some par­tic­u­lar type of im­proper prior got good re­sults in prac­tice. (Though Eliezer would still ar­gue against us­ing it un­less you’ve tried and failed to find a bet­ter way.) But here we want to know:

• whether or not we have a rea­son to act on bizarre claims like the mug­ger’s—which we pre­sum­ably don’t if the ar­gu­ment for do­ing so is incoherent

• what prin­ci­ple we could use to re­ject the mug­ger’s un­helpful and in­tu­itively ridicu­lous de­mand with­out caus­ing prob­lems el­se­where.

On a side-note, we don’t care whether this seem­ingly crazy per­son is “hon­est”, but whether his claim is cor­rect (or whether pay­ing him has higher ex­pected value than not).