In the pre­vi­ous ar­ti­cle in this se­quence, I con­ducted a thought ex­per­i­ment in which sim­ple prob­a­bil­ity was not suffi­cient to choose how to act. Ra­tion­al­ity re­quired rea­son­ing about meta-prob­a­bil­ities, the prob­a­bil­ities of prob­a­bil­ities.

Re­lat­edly, luke­prog has a brief post that ex­plains how this mat­ters; a long ar­ti­cle by Hold­enKarnofsky makes meta-prob­a­bil­ity cen­tral to util­i­tar­ian es­ti­mates of the effec­tive­ness of char­i­ta­ble giv­ing; and Jonathan_Lee, in a re­ply to that, has used the same frame­work I pre­sented.

In my pre­vi­ous ar­ti­cle, I ran thought ex­per­i­ments that pre­sented you with var­i­ous col­ored boxes you could put coins in, gam­bling with un­cer­tain odds.

The last box I showed you was blue. I ex­plained that it had a fixed but un­known prob­a­bil­ity of a twofold pay­out, uniformly dis­tributed be­tween 0 and 0.9. The over­all prob­a­bil­ity of a pay­out was 0.45, so the ex­pec­ta­tion value for gam­bling was 0.9—a bad bet. Yet your op­ti­mal strat­egy was to gam­ble a bit to figure out whether the odds were good or bad.

Let’s con­tinue the ex­per­i­ment. I hand you a black box, shaped rather differ­ently from the oth­ers. Its sealed fa­ce­plate is carved with runic in­scrip­tions and el­dritch figures. “I find this one par­tic­u­larly in­ter­est­ing,” I say.

What is the pay­out prob­a­bil­ity? What is your op­ti­mal strat­egy?

In the frame­work of the pre­vi­ous ar­ti­cle, you have no knowl­edge about the in­sides of the box. So, as with the “sports­ball” case I an­a­lyzed there, your meta-prob­a­bil­ity curve is flat from 0 to 1.

The blue box also has a flat meta-prob­a­bil­ity curve; but these two cases are very differ­ent. For the blue box, you know that the curve re­ally is flat. For the black box, you have no clue what the shape of even the meta-prob­a­bil­ity curve is.

The re­la­tion­ship be­tween the blue and black boxes is the same as that be­tween the coin flip and sports­ball—ex­cept at the meta level!

So if we’re go­ing on in this style, we need to look at the dis­tri­bu­tion of prob­a­bil­ities of prob­a­bil­ities of prob­a­bil­ities. The blue box has a sharp peak in its meta-meta-prob­a­bil­ity (around flat­ness), whereas the black box has a flat meta-meta-prob­a­bil­ity.

You ought now to be a lit­tle un­easy. We are putting epicy­cles on epicy­cles. An in­finite regress threat­ens.

Maybe at this point you sud­denly re­con­sider the blue box… I told you that its meta-prob­a­bil­ity was uniform. But per­haps I was ly­ing! How re­li­able do you think I am?

Let’s say you think there’s a 0.8 prob­a­bil­ity that I told the truth. That’s the meta-meta-prob­a­bil­ity of a flat meta-prob­a­bil­ity. In the worst case, the ac­tual pay­out prob­a­bil­ity is 0, so the av­er­age just plain prob­a­bil­ity is 0.8 x 0.45 = 0.36. You can feed that worst case into your de­ci­sion anal­y­sis. It won’t dras­ti­cally change the op­ti­mal policy; you’ll just quit a bit ear­lier than if you were en­tirely con­fi­dent that the meta-prob­a­bil­ity dis­tri­bu­tion was uniform.

To get this re­ally right, you ought to make a best guess at the meta-meta-prob­a­bil­ity curve. It’s not just 0.8 of a uniform prob­a­bil­ity dis­tri­bu­tion, and 0.2 of zero pay­out. That’s the worst case. Even if I’m ly­ing, I might give you bet­ter than zero odds. How much bet­ter? What’s your con­fi­dence in your meta-meta-prob­a­bil­ity curve? Ought you to draw a meta-meta-meta-prob­a­bil­ity curve? Yikes!

Mean­while… that black box is rather sinister. See­ing it makes you won­der. What if I rigged the blue box so there is a small prob­a­bil­ity that when you put a coin in, it jabs you with a poi­son dart, and you die hor­ribly?

Ap­par­ently a zero pay­out is not the worst case, af­ter all! On the other hand, this seems para­noid. I’m odd, but prob­a­bly not that evil.

Still, what about the black box? You re­al­ize now that it could do any­thing.

• It might spring open to re­veal a col­lec­tion of fos­sil trilo­bites.

• It might play Corvus Co­rax’s Vitium in Opere at ear-split­ting vol­ume.

• It might an­a­lyze the trace DNA you left on the coin and use it to write you a per­son­al­ized love poem.

• It might emit a strip of pa­per with a recipe for dun­dun noo­dles writ­ten in Chi­nese.

• It might sprout six me­chan­i­cal legs and jump into your lap.

What is the prob­a­bil­ity of its giv­ing you \$2?

That no longer seems quite so rele­vant. In fact… it might be ut­terly mean­ingless! This is now a situ­a­tion of rad­i­cal un­cer­tainty.

I’ll an­swer that later in this se­quence. You might like to figure it out for your­self now, though.

The black box is an in­stance of Knigh­tian un­cer­tainty. That’s a catch-all cat­e­gory for any type of un­cer­tainty that can’t use­fully be mod­eled in terms of prob­a­bil­ity (or meta-prob­a­bil­ity!), be­cause you can’t make mean­ingful prob­a­bil­ity es­ti­mates. Cal­ling it “Knigh­tian” doesn’t help solve the prob­lem, be­cause there’s lots of sources of non-prob­a­bil­is­tic un­cer­tainty. How­ever, it’s use­ful to know that there’s a liter­a­ture on this.

The blue box is closely re­lated to Ells­berg’s para­dox, which com­bines prob­a­bil­ity with Knigh­tian un­cer­tainty. In­ter­est­ingly, it was in­vented by the same Daniel Ells­berg who re­leased the Pen­tagon Papers in 1971. I won­der how his work in de­ci­sion the­ory might have af­fected his de­ci­sion to leak the Papers?

• In­stead of metaprob­a­bil­ities, the black box might be bet­ter thought of in terms of hi­er­ar­chi­cally par­ti­tion­ing pos­si­bil­ity space.

• It could dis­pense money un­der some conditions

• It could be a peg-and-wheel box like from the pre­vi­ous post

• With zero pegs

• One peg

• ...

• Those con­di­tions could be tem­per­a­ture-dependant

• ...

• It could be a mu­sic box

• Opera

• Country

• Yodeling

• ...

• It could be a bomb

• ...

Each sub­list’s prob­a­bil­ity’s should add up to the head­ing above, and the top-level head­ings should add up to 1. Given how long the list is, all the prob­a­bil­ities are very small, though we might be able to or­ga­nize them into high-level cat­e­gories with rea­son­able prob­a­bil­ities and then tack on a “some­thing else” cat­e­gory. Cat­e­gories are map, not ter­ri­tory, so we can rewrite them to our con­ve­nience.

It’s use­ful to call the num­ber of pegs the “prob­a­bil­ity” which makes the prob­a­bil­ity of 45 pegs a “meta-prob­a­bil­ity”. It isn’t use­ful to call opera or yo­del­ing a “prob­a­bil­ity” so call­ing the prob­a­bil­ity that a mu­sic box is opera a “meta-prob­a­bil­ity” is re­ally weird, even though it’s ba­si­cally the same sort of thing be­ing dis­cussed.

• This is in­ter­est­ing—it seems like the pro­ject here would be to con­struct a uni­ver­sal, hi­er­ar­chi­cal on­tol­ogy of ev­ery pos­si­ble thing a de­vice could do? This seems like a very big job… how would you know you hadn’t left out im­por­tant pos­si­bil­ities? How would you go about as­sign­ing prob­a­bil­ities?

(The ap­proach I have in mind is sim­pler...)

• how would you know you hadn’t left out im­por­tant pos­si­bil­ities?

At least one of the top-level head­ings should be a catch-all “None of the above”, which rep­re­sents your es­ti­mated prob­a­bil­ity that you left some­thing out.

• That’s good, yes!

How would you as­sign a prob­a­bil­ity to that?

• Ideally, by look­ing a the num­ber of times that I’ve ex­pe­rienced out-of-con­text prob­lems in the past. You can op­ti­mize fur­ther by cre­at­ing mod­els that pre­dict the base amount of nov­elty in your cur­rent en­vi­ron­ment—if you have rea­son to be­lieve that your cur­rent en­vi­ron­ment is more un­usual /​ novel than nor­mal, in­crease your as­signed “none of the above” pro­por­tion­ally. (And con­versely, when­ever ev­i­dence trig­gers the cre­ation of a new top-level head­ing, that top-level head­ing’s prob­a­bil­ity should get sliced out of the “none of the above”, but the fact that you had to cre­ate a top-level head­ing should be used as ev­i­dence that you’re in a novel en­vi­ron­ment, thus slightly in­creas­ing ALL “none of the above” cat­e­gories. If you’re us­ing hard-coded heuris­tics in­stead of ac­tu­ally com­put­ing prob­a­bil­ity ta­bles, this might come out as a form of hy­per­vigilance and/​or cu­ri­os­ity trig­gered by novel stim­u­lus.)

• “How of­ten do list­ing sorts of prob­lems with some rea­son­able con­sid­er­a­tions re­sult in an an­swer of ‘None of the above’ for me?”

If “rea­son­able con­sid­er­a­tions” are not available, then we can still:

“How of­ten did list­ing sorts of prob­lems with no other in­for­ma­tion available re­sult in an an­swer of ‘None of the above’ for me?”

Even if we sup­pose that maybe this prob­lem bears no re­sem­blance to any pre­vi­ously en­coun­tered prob­lem, we can still (be­cause the fact that it bears no re­sem­blance is it­self a sig­nifier):

“How of­ten did prob­lems I’d en­coun­tered for the first time have an an­swer I never thought of?”

• which rep­re­sents your es­ti­mated prob­a­bil­ity that you left some­thing out.

The prob­a­bil­ity as­signed to “none of the above” should be smaller than your prob­a­bil­ity that you left some­thing out, since “none of the above is true” is a strict sub­set of “I left out a pos­si­bil­ity”.

(It’s pos­si­ble I mis­in­ter­preted you, so apolo­gies if I’m stat­ing the ob­vi­ous.)

• A uni­ver­sal on­tol­ogy is in­tractable, no ar­gu­ment there. As is a tree of (meta)*-prob­a­bil­ities. My point was about how to re­gard the prob­lem.

As for an ac­tual solu­tion, we start with propo­si­tions like “this box has a non­triv­ial po­ten­tial to kill, in­jure or mad­den me.”. I can find a prob­a­bil­ity for that based on my knowl­edge of you and on what you’ve said. If the prob­a­bil­ity is small enough, I can sub­di­vide that by con­sid­er­ing an­other propo­si­tion.

• One as­pect of what I con­sider the cor­rect solu­tion is that the only ques­tion that needs to be an­swered is “do I think putting a coin in the box has pos­i­tive or nega­tive util­ity”, and one can an­swer that with­out any guess about what it is ac­tu­ally go­ing to do.

What is your base rate for boxes be­ing able to drive you mad if you put a coin in them?

Can you imag­ine any mechanism whereby a box would drive you mad if you put a coin in it? (I can’t.)

• Given that I’m in­side a hy­po­thet­i­cal situ­a­tion pro­posed on less­wrong, the like­li­hood of be­ing in­side a Love­craft crossover or some­thing similar is about .001. As­sum­ing a Love­craft crossover, the like­li­hood of a box marked in el­dritch runes con­tain­ing some form of Far Realm por­tal is around .05. So say .0005 from that method, which is what was on my mind when I wrote that.

• Can you imag­ine any mechanism whereby a box would drive you mad if you put a coin in it? (I can’t.)

Per­haps stick­ing a coin in it trig­gers the re­lease of some psy­choac­tive gas or aerosol?

• Are there any psy­choac­tive gases or aerosols that drive you mad?

I sup­pose a psychedelic might push some­one over the edge if they were suffi­ciently psy­cholog­i­cally frag­ile. I don’t know of any sub­stances that speci­fi­cally make peo­ple mad, though.

• I’m not a psy­chi­a­trist. Maybe? It looks like air­borne trans­mis­sion of pri­ons might be pos­si­ble, and along an un­re­lated path the box could go the Phineas Gage route.

• Alter­na­tively, aerosolized ag­o­nium, for ad­e­quate val­ues of suffi­ciently long-lived and finely-tuned ag­o­nium.

• I’m cur­rently mostly won­der­ing how I get the black box to do any­thing at all, and par­tic­u­larly how I can pro­tect my­self against the dan­ger­ous things it might be fea­si­ble for an el­dritch box to do.

• This is now a situ­a­tion of rad­i­cal un­cer­tainty.

The Bayesian Univer­sal­ist an­swer to this would be that there is no sep­a­rate meta-prob­a­bil­ity. You have a uni­ver­sal prior over all pos­si­ble hy­pothe­ses, and mut­ter a bit about Solomonoff in­duc­tion and AIXI.

I am putting it this way, dis­tanc­ing my­self from the con­cept, be­cause I don’t ac­tu­ally be­lieve it, but it is the stan­dard an­swer to draw out from the LessWrong meme space, and it has not yet been posted in this thread. Is there any­one who can make a bet­ter fist of ex­pound­ing it?

• You can give a meta-prob­a­bil­ity if you want. How­ever, this makes no differ­ence in your fi­nal re­sult. If you are 50% cer­tain that a box has a di­a­mond in it with 20% prob­a­bil­ity, and you are 50% cer­tain that it has a di­a­mond with 30% prob­a­bil­ity, then you are 50% sure that it has an ex­pected value of 0.2 di­a­monds and 50% sure that it has an ex­pected value of 0.3 di­a­monds, so it has an ex­pected ex­pected value of 0.25 di­a­monds. Why not just be 25% sure from the be­gin­ning?

Sup­pos­edly, David gave an ex­am­ple of meta-prob­a­bil­ity be­ing nec­es­sary in the ear­lier post her refer­ences. How­ever, us­ing con­di­tional prob­a­bil­ities give you the right an­swer. There is a differ­ence be­tween a gam­bling ma­chine hav­ing in­de­pen­dent 50% chances of giv­ing out two coins when you put in one, and one that has a 50% chance the first time, but has a 100% chance of giv­ing out two coins the nth time given that it did the first time and a 0% chance given it did not. Since there are times where you need con­di­tional prob­a­bil­ities and meta-prob­a­bil­ities won’t suffice, you need to have con­di­tional prob­a­bil­ities any­way, so why bother with meta-prob­a­bil­ities?

That’s not to say that meta-prob­a­bil­ities can’t be use­ful. If the prob­a­bil­ity of A de­pends on B, and all you care about is A, meta-prob­a­bil­ities will model this perfectly, and will be much sim­pler to use than con­di­tional prob­a­bil­ities. A good ex­am­ple of a suc­cess­ful use of meta-prob­a­bil­ities is Stu­dent’s t-test, which can be thought of as a dis­tri­bu­tion of nor­mal dis­tri­bu­tions, in which the stan­dard de­vi­a­tion it­self has a prob­a­bil­ity dis­tri­bu­tion.

• Yes, I’m not at all com­mit­ted to the metaprob­a­bil­ity ap­proach. In fact, I con­cocted the black box ex­am­ple speci­fi­cally to show its limi­ta­tions!

Solomonoff in­duc­tion is ex­traor­di­nar­ily un­helpful, I think… that it is un­com­putable is only one rea­son.

I think there’s a fairly sim­ple and straight­for­ward strat­egy to ad­dress the black box prob­lem, which has not been men­tioned so far...

• Solomonoff in­duc­tion is ex­traor­di­nar­ily un­helpful, I think… that it is un­com­putable is only one rea­son.

Be­cause it’s out­put is not hu­man-read­able be­ing the other?

I mean, even if I’ve got a TARDIS to use as a halt­ing or­a­cle, an In­duc­tive Tur­ing Ma­chine isn’t go­ing to out­put some­thing I can ac­tu­ally use to make pre­dic­tions about spe­cific events such as “The black box gives you money un­der X, Y, and Z cir­cum­stances.”

• Well, the prob­lem I was think­ing of is “the uni­verse is not a bit string.” And any un­bi­ased rep­re­sen­ta­tion we can make of the uni­verse as a bit string is go­ing to be ex­tremely large—much too large to do even sane sorts of com­pu­ta­tion with, never mind Solomonoff.

Maybe that’s say­ing the same thing you did? I’m not sure...

• Can you please give us a top level post at some point, be it in Dis­cus­sion or Main, ar­gu­ing that “the uni­verse is not a bit string”? I find that very in­ter­est­ing, rele­vant, and plau­si­ble.

• Thanks for the en­courage­ment! I have way too many half-com­pleted writ­ing pro­jects, but this does seem an im­por­tant point.

• Go­ing back to the ba­sic ques­tion about the black box:

What is the prob­a­bil­ity of its giv­ing you \$2?

Too small to be worth con­sid­er­ing. I might as well ask, what’s the prob­a­bil­ity that I’ll find \$2 hid­den half way up the near­est tree? Noth­ing has been claimed about the black box to speci­fi­cally draw “it will pay you \$2 for \$1” out of hy­poth­e­sis space.

• Hmm… given that the pre­vi­ous sev­eral boxes have ei­ther paid \$2 or done noth­ing, it seems like that primes the hy­poth­e­sis that the next in the se­ries also pays \$2 or does noth­ing. (I’m not ac­tu­ally dis­agree­ing, but doesn’t that ar­gu­ment seem rea­son­able?)

• it seems like that primes the hy­poth­e­sis that the next in the se­ries also pays \$2 or does nothing

Prim­ing a hy­poth­e­sis merely draws it to at­ten­tion; it does not make it more likely. Every piece of spam, ev­ery con game, “primes the hy­poth­e­sis” that it is gen­uine. It also “primes the hy­poth­e­sis” that it is not. “Prim­ing the hy­poth­e­sis” is no more ev­i­dence than a pur­ple giraffe is ev­i­dence of the black­ness of crows.

Ex­pli­cltly avoid­ing say­ing that it does pay \$2, and say­ing in­stead that it is “in­ter­est­ing”, well, that pretty much stomps the “prim­ing” into a stain on the side­walk.

• .… pur­ple giraf­fes are ev­i­dence of the black­ness of crows, though. Just, re­ally re­ally ter­rible ev­i­dence.

• Well, yes. As is the mere pres­ence of the idea of \$2 for \$1 ter­rible ev­i­dence that the black box will do any such thing.

Eliezer speaks in the Twelve Virtues of let­ting one­self be as light as a leaf, blown un­re­sist­ingly by the wind of ev­i­dence, but ev­i­dence of this sort is on the level of the in­di­vi­d­ual molecules and Brow­n­ian mo­tion of that leaf.

• It de­pends on your priors

• I don’t have a full strat­egy, but I have an idea for a data-gath­er­ing ex­per­i­ment:

I hand you a coin and try to get you to put it in the box for me. If you re­fuse, I up­date in the di­rec­tion of the box harm­ing peo­ple who put coins in it. If you com­ply, I watch and see what hap­pens.

• Ex­cel­lent! This is very much point­ing in the di­rec­tion of what I con­sider the cor­rect gen­eral ap­proach. I hadn’t thought of what you sug­gest speci­fi­cally, but it’s an in­stance of the gen­eral cat­e­gory I had in mind.

• Meta-prob­a­bil­ity seems like some­thing that is re­ducible to ex­pected out­comes and reg­u­lar prob­a­bil­ity. I mean, what kind of box the black box is, is noth­ing more than what you ex­pect it to do con­di­tional on what you might have seen it do. If it gives you three dol­lars the next three times you play it, you’d then ex­pect the fourth time to also give you three dol­lars (4/​5ths of the time, via Bayes’ The­o­rem, via Laplace’s Rule of Suc­ces­sion).

Meta-prob­a­bil­ity may be a nifty short­cut, but it’s re­ducible to ex­pected out­comes and con­di­tional prob­a­bil­ity.

• Laplace’s Rule of Suc­ces­sion can only be used once you have iden­ti­fied a set of pos­si­ble out­comes and made cer­tain as­sump­tions on the un­der­ly­ing prob­a­bil­ity dis­tri­bu­tion. That’s not the case at hand.

Ap­ply­ing Bayesian rea­son­ing to such cases re­quires an uni­ver­sal prior. It could be that hu­mans do a form of ap­prox­i­mate Bayesian rea­son­ing with some­thing like an uni­ver­sal prior when rea­son­ing in­for­mally, but we know no satis­fac­tory way of for­mal­iz­ing that rea­son­ing in math­e­mat­i­cal terms.

• The idea of metaprob­a­bil­ity still isn’t par­tic­u­larly satis­fy­ing to me as a game-level strat­egy choice. It might be use­ful as a de­scrip­tion of some­thing my brain already does, and thus give me more in­for­ma­tion about how my brain re­lates to or em­u­lates an AI ca­pa­ble of perfect Bayesian in­fer­ence. But in terms of pick­ing op­ti­mal strate­gies, perfect Bayesian in­fer­ence has no sub­rou­tine called CalcMe­taProb­a­bil­ity.

My first thought was that your ap­proach ele­vates your brain’s state above states of the world as sym­bols in the de­ci­sion graph, and calls the differ­ence “Meta.” By Luke’s anal­ogy, in­for­ma­tion about the black box is un­sta­ble, but all that means is that the (yes, sin­gle) prob­a­bil­ity value we get when we query the Bayesian net­work is con­di­tion­ally de­pen­dent on nodes with a high de­gree of ex­pected fu­ture change (in­clud­ing many nodes refer­ring to your brain). If you main­tain dis­ci­pline and keep your­self (and your fu­ture selves) as a part of the sys­tem, you can as perfectly calcu­late your cur­rent self’s ex­pected prob­a­bil­ity with­out “metaprob­a­bil­ity.” If you’re look­ing to (losslessly or oth­er­wise) op­ti­mize your brain to calcu­late prob­a­bil­ities, then “metaprob­a­bil­ity” is a use­ful con­cept. But then we’re no longer play­ing the game, we’re de­sign­ing minds.

• But then we’re no longer play­ing the game, we’re de­sign­ing minds.

I find it helpful to think of “the op­ti­mal way to play game X” as “de­sign the mind that is best at play­ing game X.” Does that not seem helpful to you?

• It is helpful, and was one of the ways that helped me to un­der­stand One-box­ing on a gut level.

And yet, when the prob­lem space seems harder, when “op­ti­mal” be­comes un­com­putable and wrapped up in the fact that I can’t fully in­tro­spect, play­ing cer­tain games doesn’t feel like de­sign­ing a mind. Although, this is prob­a­bly just due to the fact that games have time limits, while mind-de­sign is un­con­strained. If I had an eter­nity to play any given game, I would spend a lot of time in­tro­spect­ing, chang­ing my mind into the sort that could play iter­a­tions of the game in smaller time chunks. Although there would still always be a part of my brain (that part cre­ated in mo­tion) that I can’t change. And I would still use that part to play the black box game.

In re­gards to metaprob­a­bil­ities, I’m start­ing to see the point. I don’t think it al­ters any the­ory about how prob­a­blity “works,” but its in­tu­itive value could be ev­i­dence that op­ti­mal AIs might be able to more effi­ciently em­u­late perfect de­ci­sion the­ory with CalcMe­taProb­a­bil­ity im­ple­mented. And it’s cer­tainly use­ful to many here.

• When we query the Bayesian net­work is con­di­tion­ally de­pen­dent on nodes with a high de­gree of ex­pected fu­ture change [...].

But the point about meta prob­a­bil­ity is that we do not have the nodes. Each meta level cor­re­sponds to one nest­ing of net­works in nodes.

If you main­tain dis­ci­pline and keep your­self [...] as a part of the sys­tem, you can as perfectly calcu­late your cur­rent self’s ex­pected prob­a­bil­ity with­out “metaprob­a­bil­ity.”

Only in so far as you ap­prox­i­mate your­self sim­ply as per above.This dis­cards in­for­ma­tion.

• But the point about meta prob­a­bil­ity is that we do not have the nodes. Each meta level cor­re­sponds to one nest­ing of net­works in nodes.

Think of Bayesian graphs as im­plic­itly com­plete, with the set of nodes be­ing ev­ery thing to which you have a refer­ent. If you can even say “this propo­si­tion” mean­ingfully, a perfect Bayesian im­ple­mented as a brute-force Bayesian net­work could as­sign it a node con­nected to all other nodes, just with triv­ial con­di­tional prob­a­bil­ities that give the same re­sults as an un­con­nected node.

A big part of this dis­cus­sion has been whether some refer­ents (like black boxes) ac­tu­ally do have such triv­ial con­di­tional prob­a­bil­ities which end up re­turn­ing an in­fer­ence of 50%. It cer­tainly feels like some refer­ents should have no prece­dent, and yet it also feels like we still don’t say 50%. This is be­cause they ac­tu­ally do have prece­dent (and con­di­tional prob­a­bil­ities), it’s just that our in­ter­nal rea­son­ings are not always con­sciously available.

• Sure you can always use the to­tal net of all pos­si­ble propo­si­tion. But the set of all propo­si­tions is in­tractable. It may not even be sen­si­bly enu­mer­able.

For nested nets at least you can con­struct the net of the pow­er­set of the nodes and that will do the job—in the­ory. In prac­tive even that is hor­ribly in­effi­cient. And even though our brain is mas­sively par­allel it surely doesn’t do that.

• Well, re­gard­less of the value of metaprob­a­bil­ity, or its lack of value, in the case of the black box, it doesn’t seem to offer any help in find­ing a de­ci­sion strat­egy. (I find it helpful in un­der­stand­ing the prob­lem, but not in for­mu­lat­ing an an­swer.)

How would you go about choos­ing a strat­egy for the black box?

• My LessWron­gian an­swer is that I would ask my mind that was cre­ated already in mo­tion what the prob­a­bil­ity is, then re­fine it with as many fur­ther re­flec­tions as I can come up with. Em­body an AI long enough in this world, and it too will have pri­ors about black boxes, ex­cept that re­port­ing that prob­a­bil­ity in the form of a num­ber is in­her­ent to its source code rather than strange and oth­er­wor­ldly like it is for us.

The point that was made in that ar­ti­cle (and in the Me­taethics se­quence as a whole) is that the only mind you have to solve a prob­lem is the one that you have, and you will in­evitably use it to solve prob­lems un­op­ti­mally, where “un­op­ti­mal” if taken strictly means ev­ery­thing any­body has ever done.

The re­flec­tion part of this is im­por­tant, as it’s the only thing we have con­trol over, and I sup­pose could in­volve dis­cus­sions about metaprob­a­bil­ities. It doesn’t re­ally do it for me though, al­though I’m only just a sin­gle point in the mind de­sign space. To me, metaprob­a­bil­ity seems iso­mor­phic to a col­lec­tion of re­ducible con­sid­er­a­tions, and so doesn’t seem like a use­ful short­cut or ab­strac­tion. My par­tic­u­lar strat­egy for re­flec­tion would be some­thing like that in dspeyer’s com­ment, things such as rea­son­ing about the source of the box, pos­si­bil­ities for what could be in the box that I might rea­son­ably ex­pect to be there. Depend­ing on how much time I have, I’d be very sys­tem­atic about it, list­ing out pos­si­bil­ities, solv­ing in­finite se­ries on ex­pected value, etc.

• Part of the mo­ti­va­tion for the black box ex­per­i­ment is to show that the metaprob­a­bil­ity ap­proach breaks down in some cases. Maybe I ought to have made that clearer! The ap­proach I would take to the black box does not rely on metaprob­a­bil­ity, so let’s set that aside.

So, your mind is already in mo­tion, and you do have pri­ors about black boxes. What do you think you ought to in this case? I don’t want to waste your time with that… Maybe the thought ex­per­i­ment ought to have speci­fied a time limit. Per­son­ally, I don’t think enu­mer­at­ing things the boxes could pos­si­bly do would be helpful at all. Isn’t there an eas­ier ap­proach?

• Part of the mo­ti­va­tion for the black box ex­per­i­ment is to show that the metaprob­a­bil­ity ap­proach breaks down in some cases.

Ah! I didn’t quite pick up on that. I’ll note that in­finite regress prob­lems aren’t nec­es­sar­ily defeaters of an ap­proach. Good minds that could fall into that trap im­ple­ment a “Screw it, I’m go­ing to bed” trig­ger to keep from wast­ing cy­cles even when us­ing an oth­er­wise helpful heuris­tic.

Maybe the thought ex­per­i­ment ought to have speci­fied a time limit. Per­son­ally, I don’t think enu­mer­at­ing things the boxes could pos­si­bly do would be helpful at all. Isn’t there an eas­ier ap­proach?

Maybe, but I can’t guaran­tee you won’t get blown up by a black box with a bomb in­side! As a friend, I would be fu­ri­ously lend­ing you my rea­son­ing to help you make the best de­ci­sion, wor­ry­ing very lit­tle what minds bet­ter and faster than both of ours would be able to do.

It is, at the end of the day, just the Gen­eral AI prob­lem: Don’t think too hard on brute-force but perfect meth­ods or else you might skip a heuris­tic that could have got­ten you an an­swer within the time limit! But when do you know whether the time limit is at that thresh­old? You could spend cy­cles on that too, but time is wast­ing! Time limit games pre­sume that the par­ti­ci­pant has already un­der­went a lot of un­in­ten­tional de­sign (by evolu­tion, his­tory, past re­flec­tions, etc.). This is the “already in-mo­tion” part which, frus­trat­ingly, can­not ever be op­ti­mal un­less some­body on the out­side de­signed you for it. It’s a for­mal prob­lem what source code performs best un­der what game. Be­ing a source code in­volves tak­ing the dis­cus­sion we’re hav­ing now and ap­ply­ing it the best you can, be­cause that’s what your source code does.

• I can’t guaran­tee you won’t get blown up

Yes—this is part of what I’m driv­ing at in this post! The kinds of prob­lems that prob­a­bil­ity and de­ci­sion the­ory work well for have a well-defined set of hy­pothe­ses, ac­tions, and out­comes. Often the real world isn’t like that. One point of the black box is that the hy­poth­e­sis and out­come spaces are effec­tively un­bounded. Try­ing to enu­mer­ate ev­ery­thing it could do isn’t re­ally fea­si­ble. That’s one rea­son the un­cer­tainty here is “Knigh­tian” or “rad­i­cal.”

In fact, in the real world, “and then you get eaten by a black hole in­com­ing near the speed of light” is always a pos­si­bil­ity. Life comes with no guaran­tees at all.

Often in Knigh­tian prob­lems you are just screwed and there’s noth­ing ra­tio­nal you can do. But in this case, again, I think there’s a straight­for­ward, sim­ple, sen­si­ble ap­proach (which so far no one has sug­gested...)

• Often in Knigh­tian prob­lems you are just screwed and there’s noth­ing ra­tio­nal you can do.

As you know, this at­ti­tude isn’t par­tic­u­larly com­mon ’round these parts, and while I fall mostly in the “De­ci­sion the­ory can ac­count for ev­ery­thing” camp, there may still be a point there. “Ra­tional” isn’t re­ally a cat­e­gory so much as a de­gree. For­mally, it’s a func­tion on ac­tions that some­how mea­sures how much that ac­tion cor­re­sponds to the perfect de­ci­sion-the­o­retic ac­tion. My im­pres­sion is that some­where there’s Godelian con­sid­er­a­tion lurk­ing, which is where the “Omega fines you ex­or­bitantly for us­ing TDT” thought ex­per­i­ment comes into play.

That thought ex­per­i­ment never both­ered me much, as it just is what it is: a prob­lem where you are just screwed, and there’s noth­ing ra­tio­nal you can do to im­prove your situ­a­tion. You’ve already rightly pro­grammed your­self to use TDT, and even your de­ci­sion to stop us­ing TDT would be made us­ing TDT, and un­less Omega is mak­ing ex­cep­tions for that par­tic­u­lar choice (in which case you should self-mod­ify to non-TDT), Omega is just a jerk that goes around fin­ing ra­tio­nal peo­ple.

In such situ­a­tions, the words “ra­tio­nal” and “ir­ra­tional” are less use­ful de­scrip­tors than just ob­serv­ing source code be­ing ex­e­cuted. If you’re for­mal about it us­ing met­ric R, then you would be more R, but its cor­re­la­tion to “ra­tio­nal” wouldn’t re­ally be at point.

But in this case, again, I think there’s a straight­for­ward, sim­ple, sen­si­ble ap­proach (which so far no one has sug­gested...)

So, I don’t think the black box is re­ally one of the situ­a­tions I’ve de­scribed. It seems to me a de­ci­sion the­o­rist train­ing her­self to be more gen­er­ally ra­tio­nal is in fact im­prov­ing her odds at win­ning the black box game. All the ap­proaches out­lined so far do seem to also im­prove her odds. I don’t think a bet­ter solu­tion ex­ists, and she will of­ten lose if she lacks time to re­flect. But the more ra­tio­nal she is, the more of­ten she will win.

• I won­der how his work in de­ci­sion the­ory might have af­fected his de­ci­sion to leak the Papers?

Ob­vi­ously he was a ra­tio­nal thinker. And that seems to have im­plied think­ing aout­side of the rules and cus­toms. For him leak­ing the pa­pers was just one non­tiriv­ial op­tion among lots.

• A few ter­minolog­i­cal headaches in this post. Sorry for the nega­tive tone.

There is talk of a “fixed but un­known prob­a­bil­ity,” which should always set alarm bells ring­ing.

More gen­er­ally, I pro­pose that when­ever one as­signs a prob­a­bil­ity to some pa­ram­e­ter, that pa­ram­e­ter is guaran­teed not to be a prob­a­bil­ity.

I am also dis­turbed by the men­tion of Knigh­tian un­cer­tainty, de­scried as “un­cer­tainty that can’t be use­fully mod­eled in terms of prob­a­bil­ity.” Now there’s a char­i­ta­ble in­ter­pre­ta­tion of that phrase, and I can see that there may be a psy­cholog­i­cally rele­vant sub­set of prob­a­bil­ities that vaguely fits this de­scrip­tion, but if the phrase “can’t be mod­eled” is to be taken liter­ally, then I’m left won­der­ing if the au­thor has paid enough at­ten­tion to the mind pro­jec­tion fal­lacy, or the differ­ence be­tween prob­a­bil­ity and fre­quency.

• I throw the box into the cor­ner of the room with a high pitched scream of ter­ror. Then I run away to try to find ther­mite.

Edit: then I throw the ashes into a black hole, and trig­ger a True Vacuum co­lapse just in case.

• This raises the very im­por­tant point that the over­whelming ma­jor­ity of wor­ld­states are bad, bad, bad, and so when pre­sented with a box that could give liter­ally any out­come, run­ning might be a good idea. (Me­taphor­i­cally, that is. I doubt it would do you much good.)

• I think back­ing away slowly and quietly is the bet­ter play. The box might feast off your screams or sense your fear.

• The box might feast off your screams or sense your fear.

Then again, screams might hurt it. That’s the prob­lem with true rad­i­cal un­cer­tainty—if you’re suffi­ciently un­cer­tain that you can’t even con­jec­ture about meta-prob­a­bil­ities, how do you know if ANY ac­tion (or lack of ac­tion) might have a net pos­i­tive or nega­tive out­come?

• You need to take ad­van­tage of the fact that prob­a­bil­ity is a con­se­quence of in­com­plete in­for­ma­tion, and think about the mod­els of the world peo­ple have that en­code their in­for­ma­tion. “Meta-prob­a­b­bil­ity” only ex­ists within a cer­tain model of the prob­lem, and if you to­tally ig­nore that you get some dras­ti­cally con­fus­ing con­clu­sions.

• So, how would you an­a­lyze this prob­lem, more speci­fi­cally? What do you think the op­ti­mal strat­egy is?

• The prob­lem of what to ex­pect from the black box?

I’d think about it like this: sup­pose that I hand you a box with a slot in it. What do you ex­pect to hap­pen if you put a quar­ter into the slot?

To an­swer this we en­gage our big amount of hu­man knowl­edge about boxes and peo­ple who hand them to you. It’s very likely that noth­ing at all will hap­pen, but I’ve also seen plenty of boxes that also emit sound, or gum­balls, or tem­po­rary tat­toos, or some­times more quar­ters. But sup­pose that I have pre­vi­ously handed you a box that emits more quar­ters some­times when you put quar­ters in. Then maybe you raise the prob­a­bil­ity that it also emits quar­ters, et cetera.

Now, within this model you have a prob­a­bil­ity of some pay­off, but only if it’s one of the re­ward-emit­ting boxes, and it also has some prob­a­bil­ity of emit­ting sound etc. What you call a “meta-prob­a­bil­ity” is ac­tu­ally the prob­a­bil­ity of some sub-model be­ing ver­ified or con­firmed. Sup­pose I put in one quar­ter in and two quar­ters come out—now you’ve dras­ti­cally cut down the mod­els that can de­scribe the box. This is “up­dat­ing the meta-prob­a­bil­ity.”

• It also has elrich mark­ers, and is be­ing used in a de­ci­sion the­ory ex­per­i­ment, and given in as­so­ci­a­tion with om­nius word­ing. Th­ese in­di­cates it does some­thing nasty.

• I guess it will raise the prob­a­bil­ity a lit­tle bit, but out of all el­dritch-marked things I’ve ever seen, about 100% have been or­na­men­tal. We can’t over-weight small prob­a­bil­ities just be­cause they’re vivid.

• ...

Maybe we’re get­ting differ­ent men­tal images for “el­drich”. I as­sumed things that’d get me banned to even vaguely de­scribe, not ten­tack­les and pen­ta­grams.

• To an­swer this we en­gage our big amount of hu­man knowl­edge about boxes and peo­ple who hand them to you.

Of com­ments so far, this comes clos­est to the an­swer I have in mind… for what­ever that’s worth!

• I like this ar­ti­cle /​ post but I find my­self want­ing more at the end. A pay­off or a punch line or at least a les­son to take away.

• Well, I hope to con­tinue the se­quence… I ended this ar­ti­cle with a ques­tion, or puz­zle, or home­work prob­lem, though. Any thoughts about it?

• IMO the cor­rect re­sponse is to run like hell from the box. In Thingspace, most things are very un­friendly, in much the same way that most of Mindspace con­tains un­friendly AIs.

• Tech­ni­cally, al­most all things in thingspace are high en­ergy plasma.

Edit: ac­tu­ally most of them are prob­a­bly some kind of ex­otic (anti-, strange-, dark- etc.) mat­ter that’ll blow up the planet.

• The high-en­ergy ex­otic plasma not from this uni­verse does not love or hate you. Your uni­verse is sim­ply a false vac­uum with re­spect to its home uni­verse’s, which it ac­ci­den­tally col­lapses.

• So… you think I am prob­a­bly evil, then? :-)

I gave you the box (in the thought ex­per­i­ment). I may not have se­lected it from Thingspace at ran­dom!

In fact, there’s strong ev­i­dence in the text of the OP that I didn’t...

• I am pat­tern-match­ing from fic­tion on “black box with evil-look­ing in­scrip­tions on it”. Those do not tend to end well for any­one. Also, what do you mean by strong ev­i­dence against that the box is less harm­ful than a given ran­dom ob­ject from Thingspace? I can /​barely sort of/​ see “not a ran­dom ob­ject from Thingspace”; I can­not see “EV(U(spoopy creppy black box)) > EV(U(ob­ject from Thingspace))”.

• EBWOP: On fur­ther re­flec­tion I find that since most of Thingspace in­stan­ta­neously de­stroys the uni­verse,

EV(U(spoopy creppy black box)) >>>

EV(U(ob­ject from Thingspace)).

How­ever, what I was try­ing to get at was that

EV(U(spoopy creppy black box)) ⇐

EV(U(rep­re­sen­ta­tive ob­ject from-class: chance-based deal boxes with “nor­mal” out­comes)) <=

EV(U(rep­re­sen­ta­tive ob­ject from-class: chance-based deal boxes with Thingspace-like out­comes)) <=

EV(U(rep­re­sen­ta­tive ob­ject from-class: chance-based deal boxes with ter­rify­ingly cre­atively imag­in­able out­comes))

• The ev­i­dence that I didn’t se­lect it at ran­dom was my say­ing “I find this one par­tic­u­larly in­ter­est­ing.”

I also claimed that “I’m prob­a­bly not that evil.” Of course, I might be ly­ing about that! Still, that’s a fact that ought to go into your Bayesian eval­u­a­tion, no?

• “In­ter­est­ing” tends to mean “what­ever it would be, it does that more” in the con­text of pos­si­bly psuedo-Faus­tian bar­gains and sig­nals of prob­a­ble de­ceit. From what I know, I do not start with rea­son to trust you, and the ev­i­dence found in the OP sug­gests that I should up­date the prob­a­bil­ity that you are con­ceal­ing in­for­ma­tion up­dat­ing on which would lead me not to use the black box to “much higher”.

• Oh, good­ness, in­ter­est­ing, you do think I’m evil!

I’m not sure whether to be flat­tered or up­set or what. It’s kinda cool, any­way!

• I think that avatar-of-you-in-this-pre­sented-sce­nario does not re­motely have avatar-of-me-in-this-sce­nario’s best in­ter­ests at heart, yes.

• I hope you con­tinue the se­quence as well. :V