Big Advance in Infinite Ethics

Summary

It is pos­si­ble that our uni­verse is in­finite in both time and space. We might there­fore rea­son­ably con­sider the fol­low­ing ques­tion: given some se­quences and (where each rep­re­sents the welfare of per­sons liv­ing at time ), how can we tell if is morally prefer­able to ?

It has been demon­strated that there is no “rea­son­able” eth­i­cal al­gorithm which can com­pare any two such se­quences. There­fore, we want to look for sub­sets of se­quences which can be com­pared, and (per­haps retro-jus­tified) ar­gu­ments for why these sub­sets are the only ones which prac­ti­cally mat­ter.

Adam Jon­s­son has pub­lished a preprint of what seems to me to be the first le­gi­t­i­mate such eth­i­cal sys­tem. He con­sid­ers the fol­low­ing: sup­pose at any time we are choos­ing be­tween a finite set of op­tions. We have an in­finite num­ber of times in which we make a choice (giv­ing us an in­finite se­quence), but at each time step we have only finitely many choices. (For­mally, he con­sid­ers Markov De­ci­sion Pro­cesses.) He has shown that an eth­i­cal al­gorithm he calls “limit-dis­counted util­i­tar­i­anism” (LDU) can com­pare any two such se­quences, and more­over the out­come of LDU agrees with our eth­i­cal in­tu­itions.

This is the first time that (to my knowl­edge), we have some jus­tifi­ca­tion for think­ing that a cer­tain al­gorithm is all we will “prac­ti­cally” need when com­par­ing in­finite util­ity streams.

Limit-dis­counted Utili­tar­i­anism (LDU)

Given and it seems rea­son­able to say if

Of course, the prob­lem is that this se­ries may not con­verge and then it’s un­clear which se­quence is prefer­able. A clas­sic ex­am­ple is the choice be­tween and . (See the ex­am­ple be­low.)

LDU han­dles this by us­ing Abel sum­ma­tion. Here is a rough ex­pla­na­tion of how that works.

In­tu­itively, we might con­sider adding a dis­count fac­tor like this:

This mod­ified se­ries may con­verge even though the origi­nal one doesn’t. Of course, this con­ver­gence is at the cost of us car­ing more about peo­ple who are born ear­lier, which might not en­dear us to our chil­dren.

There­fore, we can take the limit case:

This mod­ified sum­mand is what’s used for LDU.

LDU has a num­ber of de­sir­able prop­er­ties, which are sum­ma­rized on page 7 of this pa­per by Jon­s­son and Voorn­eveld. I won’t go into them much here other than to say that LDU gen­er­ally ex­tends our in­tu­itions about what should hap­pen in the finite case to the in­finite one.

Example

Sup­pose we want to com­pare and . Let’s take the stan­dard se­ries:

This is Grandi’s se­ries, which fa­mously does not con­verge un­der the usual defi­ni­tions of con­ver­gence.

LDU though will place in a dis­count term to get:

It is clear that this is sim­ply a ge­o­met­ric se­ries, and we can find its value us­ing the stan­dard for­mula for ge­o­met­ric se­ries:

Tak­ing the limit:

There­fore, the Abel sum of this se­ries is one half, and, since , we have de­ter­mined that is bet­ter than (morally prefer­able to) .

This seems kind of in­tu­itive: as you add more and more terms, the value of the se­ries os­cillates be­tween zero and one, so in some sense the limit of the se­ries is one half.

Markov De­ci­sion Pro­cesses (MDP)

Markov De­ci­sion Pro­cesses, ac­cord­ing to Wikipe­dia, are:

At each time step, the pro­cess is in some state , and the de­ci­sion maker may choose any ac­tion that is available in state . The pro­cess re­sponds at the next time step by ran­domly mov­ing into a new state , and giv­ing the de­ci­sion maker a cor­re­spond­ing re­ward .
The prob­a­bil­ity that the pro­cess moves into its new state is in­fluenced by the cho­sen ac­tion. Speci­fi­cally, it is given by the state tran­si­tion func­tion . Thus, the next state de­pends on the cur­rent state and the de­ci­sion maker’s ac­tion .

At each time step the de­ci­sion-maker chooses be­tween a finite num­ber of op­tions, which causes the uni­verse to (prob­a­bil­is­ti­cally) move into one of a finite num­ber of states, giv­ing the de­ci­sion-maker a (finite) pay­off. By re­peat­ing this pro­cess an in­finite num­ber of times, we can con­struct a se­quence where is the pay­off at time .

The set of all se­quences gen­er­ated by a de­ci­sion-maker who fol­lows a sin­gle, time in­de­pen­dent, (i.e. sta­tion­ary) policy is what is con­sid­ered by Jon­s­son. Cru­cially, he shows that LDU is able to com­pare any two streams gen­er­ated by a sta­tion­ary Markov de­ci­sion pro­cess. [1]

Why This Matters

My im­me­di­ate ob­jec­tion upon read­ing this pa­per was “of course if you limit us to only finitely many choices then the prob­lem is sol­u­ble – the en­tire prob­lem only oc­curs be­cause we want to ex­am­ine in­finite things!”

After hav­ing thought about it more though, I think this is an im­por­tant step for­ward, and MDPs rep­re­sent an im­por­tantly large class of de­ci­sion pro­cesses.

Even though the uni­verse may be in­finite in time and space, in any time in­ter­val there is plau­si­bly only finitely many states I could be in, e.g. per­haps be­cause there are only finitely many neu­rons in my brain.

(Some­one who knows more about physics than I might be able to com­ment on a stronger ar­gu­ment: if lo­cal­ity holds, then per­haps it is a law of na­ture that only finitely many things can af­fect us within a finite time win­dow?)

Se­quences gen­er­ated by MDPs are there­fore plau­si­bly the only set of se­quences a de­ci­sion-maker may need to prac­ti­cally con­sider.

Out­stand­ing Issues

My biggest out­stand­ing con­cern with mod­el­ing our de­ci­sions with an MDP is that the pay­offs have to re­main con­stant. It seems likely that, as we learn more, we will dis­cover that cer­tain states are more or less valuable than we had pre­vi­ously thought. E.g. we may learn that in­sects are more con­scious than pre­vi­ously ex­pected, and there­fore in­sect suffer­ing af­fects our pay­offs more highly than we had origi­nally thought. It seems like maybe one could have a “meta-MDP” which some­how mod­els this, but I’m not fa­mil­iar enough with the area to say for sure.

A more the­o­ret­i­cal ques­tion is: what se­quences can be gen­er­ated via MDPs? My hope is that one day some­one will show LDU (or a similarly in­tu­itive al­gorithm) can com­pare any two com­putable se­quences, but I don’t think that this is that proof.

Lastly, we have the stan­dard prob­lems of in­fini­tar­ian fa­nat­i­cism and paral­y­sis. E.g. even if our cur­rent best model of the uni­verse pre­dicted that MDP was ex­actly cor­rect, there would still be some pos­i­tive prob­a­bil­ity that it was wrong and then our “meta-de­ci­sion pro­ce­dure” is un­clear.

Conclusion

Over­all, I don’t think that this com­pletely solves the ques­tions with com­par­ing in­finite util­ity streams, but it’s a large step for­ward. Pre­vi­ous al­gorithms like the over­tak­ing crite­rion had fairly “ob­vi­ous” in­com­pa­rable streams, with no real jus­tifi­ca­tion for why those streams would not be en­coun­tered by a de­ci­sion-maker. LDU is not com­plete, but we at least have some rea­son to think that it may be all we “prac­ti­cally” need.

I would like to thank Adam Jon­s­son for dis­cussing this with me. I have done my best to rep­re­sent LDU, but any er­rors in the above are mine. Notably, the jus­tifi­ca­tion for why MDP’s are all we need to con­sider is en­tirely mine, and I’m not sure what Adam thinks about it.

1. This is not ex­plic­itly stated in Jon­s­son’s pa­per, but it fol­lows from the proof of the­o­rem 1. Jon­s­son con­firmed this in email dis­cus­sions with me.