A Possible Decision Theory for Many Worlds Living

Hey LessWrong! I may have gone in over my head as I am not well-versed in de­ci­sion the­ory liter­a­ture, but I ten­ta­tively be­lieve I have a new de­ci­sion the­ory for de­ci­sion-mak­ing in a MWI uni­verse. Let me know what you think!

--------------------------------------------

Origi­nally posted at: https://​​www.evan­ward.org/​​a-de­ci­sion-the­ory-for-many-wor­lds-liv­ing/​​

----------------------------------------------

Here, I de­scribe a de­ci­sion the­ory that I be­lieve ap­plies to Many-Wor­lds liv­ing that com­bines prin­ci­ples of quan­tum me­chan­i­cal ran­dom­ness, evolu­tion­ary the­ory, and choice-wor­thi­ness. Un­til some­one comes up with a bet­ter term for it, I will re­fer to it as Ran­dom Evolu­tion­ary Choice-wor­thy Many-wor­lds De­ci­sions The­ory, or RECMDT.

Background

If the Many World’s In­ter­pre­ta­tion (MWI) of quan­tum me­chan­ics is true, does that have any eth­i­cal im­pli­ca­tions? Should we be­have any differ­ently in or­der to max­i­mize eth­i­cal out­comes? This is an ex­tremely im­por­tant ques­tion that I’m not aware has been satis­fac­to­rily an­swered. If MWI is true and if we can af­fect the dis­tri­bu­tion of wor­lds through our ac­tions, it means that our ac­tions have su­per-ex­po­nen­tially more im­pact on eth­i­cally rele­vant phe­nom­ena. I take eth­i­cally rele­vant phe­nom­ena to be cer­tain fun­da­men­tal physics op­er­a­tions re­spon­si­ble for the suffer­ing and well-be­ing as­so­ci­ated with the minds of con­scious crea­tures.

My Proposal

We ought to make de­ci­sions prob­a­bil­is­ti­cally based on sources of en­tropy which cor­re­spond with the split­ting of wor­lds (e.g. par­ti­cle de­cay) and the com­par­a­tive choice-wor­thi­ness of differ­ent courses of ac­tion (CoA). By choice-wor­thi­ness, I mean a com­bi­na­tion of the sub­jec­tive de­gree of nor­ma­tive un­cer­tainty and ex­pected util­ity of a CoA. I will go into de­ter­min­ing choice-wor­thi­ness in an­other post.

If one CoA is twice as choice-wor­thy as an­other, then I ar­gue that we should com­mit to do­ing that CoA with 2:1 odds or 66% of the time based on ra­dioac­tive par­ti­cle de­cay.

Why?

Un­der a sin­gle un­fold­ing of his­tory, the tra­di­tional view is that we should choose whichever CoA available to us which has the high­est choice-wor­thi­ness. When pre­sented with a bi­nary de­ci­sion, the thought is that we should choose the most choice-wor­thy op­tion given the sum of ev­i­dence ev­ery sin­gle time. How­ever, the fact that a de­ci­sion is sub­jec­tively choice-wor­thy does not mean it is guaran­teed to ac­tu­ally be the right de­ci­sion—it could ac­tu­ally move us to­wards worse pos­si­ble wor­lds. If we think we are liv­ing in a sin­gle un­fold­ing of his­tory but are ac­tu­ally liv­ing un­der MWI, then a sig­nifi­cant sub­set of the 3↑↑↑3+ (but a finite num­ber) of ex­ist­ing wor­lds end up con­verg­ing on similar fu­tures, which are by no means des­tined to be good.

How­ever, if we are liv­ing in a re­al­ity of con­stantly split­ting wor­lds, I as­sert that it is in ev­ery­one’s best in­ter­est to in­crease the var­i­ance of out­comes in or­der to more quickly move to­wards ei­ther a utopia or ex­tinc­tion. This es­sen­tially in­creases evolu­tion­ary se­lec­tion pres­sure that child wor­lds ex­pe­rience so that they ei­ther more quickly be­come de­void of con­scious life or more quickly con­verge on wor­lds that are utopian.

As a rough anal­ogy, imag­ine hav­ing a planet cov­ered with trillions of iden­ti­cal, sim­ple microbes. You want them to evolve to­wards in­tel­li­gent life that ex­pe­riences much more well-be­ing. You could leave these trillions of microbes alone and al­low them to slowly in­cur gene ed­its so that some of their de­scen­dants drift to­wards more in­tel­li­gent/​evolved crea­tures. How­ever, if you had the op­tion, why not just in­crease the rate of the gene ed­its, by say, UV ex­po­sure? This will surely push up the timeline for in­tel­li­gence and well-be­ing and al­low a greater mag­ni­tude of well-be­ing to take place. Each world un­der MWI is like a microbe, and we might as well in­crease the var­i­ance, and thus, evolu­tion­ary se­lec­tion pres­sure in or­der to help utopias hap­pen as soon and as abun­dantly as pos­si­ble.

What this The­ory Isn’t

A key com­po­nent of this de­ci­sion heuris­tic is not max­i­miz­ing chaos and treat­ing differ­ent CoAs equally, but choos­ing CoAs rel­a­tive to their choice-wor­thi­ness. For ex­am­ple, in a utopian world with, some­how, 99% of the proper CoAs figured out, only in 1 out of 100 child wor­lds must a less choice wor­thy course of ac­tion be taken. In other words, once we get con­fi­dent in par­tic­u­lar CoA, we can take that ac­tion the ma­jor­ity of the time. After all, the goal isn’t for 1 world to end up hy­per-utopian, but to max­i­mize util­ity over all wor­lds.

If we wanted just a sin­gle world to end up hy­per utopian, then we want to act in as many pos­si­ble ways based on the re­sults of true sources of en­tropy. It would be ideal to come up with any and flip a (quan­tum) coin and go off its re­sults like Two-Face. Again, the goal is to max­i­mize util­ity over all wor­lds, so we only want to ex­plore paths in pro­por­tion to the odds that we think a par­tic­u­lar path is op­ti­mal.

Is it In­cre­men­tally Use­ful?

A key com­po­nent of most use­ful de­ci­sion the­o­ries is that they are use­ful in­so­far as they are fol­lowed. As long as MWI is true, each time RECMDT is de­liber­ately ad­hered to, it is sup­posed to in­crease the var­i­ance of child wor­lds. Fol­low­ing this rule just once, de­pend­ing on the like­li­hood of wor­lds be­com­ing utopian rel­a­tive to the prob­a­bil­ity of them be­ing full of suffer­ing, likely en­sures many fu­ture utopias will ex­ist.

Cru­cial Considerations

While RECMDT should in­crease the var­i­ance and se­lec­tion pres­sure on any child wor­lds of wor­lds that im­ple­ment it, we do not know enough about the like­li­hood and mag­ni­tude of suffer­ing at an as­tro­nom­i­cal level to guaran­tee that the wor­lds that re­main full of life will over­whelm­ingly tend to be net-pos­i­tive in sub­jec­tive well-be­ing. It could be pos­si­ble that wor­lds with net-suffer­ing are very sta­ble and do not tend to ap­proach ex­tinc­tion. The merit of RECMDT may largely rest on the land­scape of en­ergy-effi­ciency of suffer­ing as op­posed to well-be­ing. If suffer­ing is very en­ergy in­effi­cient com­pared to well-be­ing, then that is good ev­i­dence in fa­vor of this the­ory. I will write more about the im­pli­ca­tions of the en­ergy-effi­ciency of suffer­ing soon.

Is RECMDT Safer if Ap­plied Only with Par­tic­u­lar Mind­sets?

One way to hedge against as­tro­nom­i­cally bad out­comes may be to only em­ploy RECMDT when one fully un­der­stands and is com­mit­ted to en­sur­ing that sur­viv­abil­ity re­mains de­pen­dent on well-be­ing. This works be­cause fol­low­ing this de­ci­sion the­ory es­sen­tially in­creases the var­i­ance of child wor­lds like us­ing bird­shot in­stead of a slug. If one em­ploys this heuris­tic only while hav­ing a firm be­lief and com­mit­ment to a strong heuris­tic to re­duce the prob­a­bil­ity of net-suffer­ing wor­lds, then it seems that your­self in child wor­lds will also have this be­lief and be pre­pared to act on it. You can also only em­ploy RECMDT while you be­lieve in your abil­ity to take mas­sive-ac­tion on be­half of your be­lief that sur­viv­abil­ity should re­main de­pen­dant on well-be­ing. When­ever you feel un­able to carry out this value, you should per­haps not act to in­crease the var­i­ance of child wor­lds be­cause you will not be pre­pared to deal with the worst-case sce­nar­ios in those child wor­lds.

Ev­i­dence against ap­ply­ing RECMDT only when one holds cer­tain val­ues strongly, how­ever, is all the Nth-or­der effects of our ac­tions. For de­ci­sions that have ex­tremely lo­cal­ized effects where one’s be­liefs dom­i­nate the ul­ti­mate out­come, the plau­si­ble value of RECMDT over not ap­ply­ing it is rather small.

For de­ci­sion with many Nth or­der effects, such as de­cid­ing which job to take (which, for ex­am­ple, has many un­pre­dictable effects on the econ­omy), it seems that one can­not con­trol for the ma­jor­ity of the effects of one’s ac­tions af­ter an ini­tial de­ci­sion is made. The ul­ti­mate effects likely rest on fea­tures of our uni­verse (e.g. the na­ture of hu­man mar­ket economies in our lo­cal group of many-wor­lds) that one’s par­tic­u­lar be­lief has lit­tle in­fluence over. In other words, for many de­ci­sions, one can af­fect the world once, but they can­not con­trol the Nth or­der effects through act­ing a sec­ond time. Thus, while cer­tain mind­sets are use­ful to hold dearly re­gard­less of whether one em­ploys RECMDT, it seems that it is not gen­er­ally use­ful for one to not em­ploy RECMDT if they are not hold­ing any par­tic­u­lar mind­sets.

Con­vert­ing Ra­dioac­tive De­cay to Ran­dom Bit Strings

In or­der to im­ple­ment this de­ci­sion the­ory, agents much re­quire ac­cess to a true source of en­tropy—pseudo-ran­dom num­ber gen­er­a­tors will NOT work. There are a va­ri­ety of ways to im­ple­ment this, such as by hav­ing an ar­ray of Geiger coun­ters sur­round­ing a ra­dioac­tive iso­tope and look­ing at which groups of sen­sors get trig­gered first in or­der to yield a de­ci­sion. How­ever, I sus­pect one of the cheap­est and most re­li­ably ran­dom sen­sors would be built to im­ple­ment the fol­low­ing al­gorithm from HotBits:

Since the time of any given de­cay is ran­dom, then the in­ter­val be­tween two con­sec­u­tive de­cays is also ran­dom. What we do, then, is mea­sure a pair of these in­ter­vals, and emit a zero or one bit based on the rel­a­tive length of the two in­ter­vals. If we mea­sure the same in­ter­val for the two de­cays, we dis­card the mea­sure­ment and try again, to avoid the risk of in­duc­ing bias due to the re­s­olu­tion of our clock.
John Walker
from HotBits

Con­vert­ing Ran­dom Bit Strings to Choices

We have a means above to gen­er­ate truly ran­dom bit strings that should differ be­tween child wor­lds. The next ques­tion is how do we con­vert these bit strings to choices re­gard­ing which CoA we will ex­e­cute? This de­pends on the num­ber of CoAs we were con­sid­er­ing and the spe­cific ra­tios that we ar­rived at for com­par­a­tive choice-wor­thi­ness. We sim­ply need to de­ter­mine the least com­mon mul­ti­ple of all the in­di­vi­d­ual odds of each CoA, and ac­quire a bit string that is long enough that its rep­re­sen­ta­tion as a bi­nary num­ber is higher than the least com­mon mul­ti­ple. From there, we can use a sim­ple pre­con­ceived en­cod­ing scheme to have the base 2 num­ber en­coded in the bit string se­lect for a par­tic­u­lar course of ac­tion.

For ex­am­ple, in a sce­nario where one CoA is 4x as choice-wor­thy as an­other, we need a ran­dom num­ber that rep­re­sents the digits 0 to 4 equally. Draw­ing the num­ber 4 can mean we must do the less-choice wor­thy CoA, and draw­ing 0-3 can mean we do the more choice-worth CoA. We need at least 3 ran­dom bits in or­der to do this. Since 2^3 is 8 and there is no way to di­vide the states 5, 6, 7 equally to the states 0, 1, 2, 3, and 4, we can­not use this bit string if it is over 4, and must ac­quire an­other one un­til we ac­quire a num­ber un­der 4. Once we se­lect a bit­string with a num­ber be­low our least-com­mon-mul­ti­ple, we can use the value of the bit string to se­lect our course of ac­tion.

The above se­lec­tion method pre­vents us from hav­ing to make any round­ing er­rors, and it shouldn’t take that many bits to im­ple­ment as any given bit string of the proper length always has over a 50% chance of work­ing out. Other en­cod­ing schemes in­tro­duce round­ing er­rors, which only de­tract from the un­cer­tainty of our choice-wor­thi­ness calcu­la­tions.

What Does Ap­pli­ca­tion Look Like?

I think ev­ery­one with solid choice-wor­thy cal­ibrat­ing abil­ity should have ac­cess to truly ran­dom bits to choose courses of ac­tion from.

Im­por­tantly, the time of the pro­duc­tion of these ran­dom bits is rele­vant. A one-year-old ran­dom bit­string cap­tured from ra­dioac­tivity is just as ran­dom as one cap­tured 5 sec­onds ago, but em­ploy­ing the lat­ter is key for en­sur­ing the max­i­mum num­ber of re­cent sister uni­verses make differ­ent de­ci­sions.

Thus, peo­ple need ac­cess to re­cently cre­ated bit strings. Th­ese could be from a portable, per­sonal Gieger counter, but it could also be from a cen­tral­ized Gieger counter in say, the mid­dle of the United States. The lo­ca­tion does not mat­ter as much as the re­cency of bit pro­duc­tion. Im­por­tantly, how­ever, bit strings should not ever be reused as this is not as ran­dom as us­ing new bit strings as what­ever made you de­cide to reuse them is non-ran­dom in­for­ma­tion.

Can We Really Affect the Distri­bu­tion of Other Wor­lds through Our Ac­tions?

One may think that since ev­ery­thing is quan­tum me­chan­ics in­clud­ing our brains, can we re­ally af­fect the dis­tri­bu­tion of child wor­lds from our in­ten­tions and de­ci­sions? This raises the clas­sic prob­lem of free will and our place in a de­ter­minis­tic uni­verse. I think the sim­plest ques­tion to ask is: do our choices have an effect on eth­i­cally-rele­vant phe­nom­ena? If the an­swer is no, then why should we care about de­ci­sion the­ory in gen­eral? I think it’s use­ful to think of the an­swer as yes.

What if Many Wor­lds Isn’t True?

If MWI isn’t true, then RECMDT op­ti­mizes for wor­lds that will not ex­ist at the po­ten­tial cost to our own. This may seem to be in­cred­ibly dan­ger­ous and costly. How­ever, as long as peo­ple make ac­cu­rate choice-wor­thi­ness com­par­i­sons be­tween differ­ent CoAs, then I will ac­tu­ally ar­gue that ad­her­ing to RECMDT is not that risky. After all, choice-wor­thi­ness is dis­tinct from ex­pected-util­ity.

It would be a waste to have peo­ple, in a bi­nary choice of ac­tions with one hav­ing 9x more ex­pected-util­ity than the other, choose the ac­tion with less ex­pected-util­ity even 10% of the time. How­ever, it seems best, even in a sin­gle un­fold­ing of his­tory, that where we are morally un­cer­tain, we should ac­tu­ally cy­cle through ac­tions based on our moral un­cer­tainty via rel­a­tive choice-wor­thi­ness.

By always act­ing to max­i­mize choice-wor­thi­ness, we risk not cap­tur­ing any value at all through our ac­tions. While I agree that we should max­i­mize ex­pected-util­ity in both one shot and iter­a­tive sce­nar­ios al­ike and be risk neu­tral as­sum­ing we ad­e­quately defined our util­ity func­tion, I think that given the fun­da­men­tal un­cer­tainty at play in a nor­ma­tive un­cer­tainty as­sess­ment, it is risk neu­tral to prob­a­bil­is­ti­cally de­cide to im­ple­ment differ­ent CoAs rel­a­tive to their com­par­a­tive choice-wor­thi­ness. Im­por­tantly, this is only the ideal method if the CoAs are mu­tu­ally ex­clu­sive—if they are not, one might as well op­ti­mize for both moral frame­works.

Hence, while I think RECMDT is true, I also think that even if MWI is proven false, a de­ci­sion the­ory ex­ists which com­bines ran­dom­ness and rel­a­tive choice-wor­thi­ness. Per­haps we can call this Ran­dom Choice-wor­thy De­ci­sion The­ory, or RCDT.

---------------------------------------------------------

Thanks for read­ing. Let me know what you think of this!