# Big Advance in Infinite Ethics

Summary

It is pos­si­ble that our uni­verse is in­finite in both time and space. We might there­fore rea­son­ably con­sider the fol­low­ing ques­tion: given some se­quences and (where each rep­re­sents the welfare of per­sons liv­ing at time ), how can we tell if is morally prefer­able to ?

It has been demon­strated that there is no “rea­son­able” eth­i­cal al­gorithm which can com­pare any two such se­quences. There­fore, we want to look for sub­sets of se­quences which can be com­pared, and (per­haps retro-jus­tified) ar­gu­ments for why these sub­sets are the only ones which prac­ti­cally mat­ter.

Adam Jon­s­son has pub­lished a preprint of what seems to me to be the first le­gi­t­i­mate such eth­i­cal sys­tem. He con­sid­ers the fol­low­ing: sup­pose at any time we are choos­ing be­tween a finite set of op­tions. We have an in­finite num­ber of times in which we make a choice (giv­ing us an in­finite se­quence), but at each time step we have only finitely many choices. (For­mally, he con­sid­ers Markov De­ci­sion Pro­cesses.) He has shown that an eth­i­cal al­gorithm he calls “limit-dis­counted util­i­tar­i­anism” (LDU) can com­pare any two such se­quences, and more­over the out­come of LDU agrees with our eth­i­cal in­tu­itions.

This is the first time that (to my knowl­edge), we have some jus­tifi­ca­tion for think­ing that a cer­tain al­gorithm is all we will “prac­ti­cally” need when com­par­ing in­finite util­ity streams.

### Limit-dis­counted Utili­tar­i­anism (LDU)

Given and it seems rea­son­able to say if

Of course, the prob­lem is that this se­ries may not con­verge and then it’s un­clear which se­quence is prefer­able. A clas­sic ex­am­ple is the choice be­tween and . (See the ex­am­ple be­low.)

LDU han­dles this by us­ing Abel sum­ma­tion. Here is a rough ex­pla­na­tion of how that works.

In­tu­itively, we might con­sider adding a dis­count fac­tor like this:

This mod­ified se­ries may con­verge even though the origi­nal one doesn’t. Of course, this con­ver­gence is at the cost of us car­ing more about peo­ple who are born ear­lier, which might not en­dear us to our chil­dren.

There­fore, we can take the limit case:

This mod­ified sum­mand is what’s used for LDU.

LDU has a num­ber of de­sir­able prop­er­ties, which are sum­ma­rized on page 7 of this pa­per by Jon­s­son and Voorn­eveld. I won’t go into them much here other than to say that LDU gen­er­ally ex­tends our in­tu­itions about what should hap­pen in the finite case to the in­finite one.

#### Example

Sup­pose we want to com­pare and . Let’s take the stan­dard se­ries:

This is Grandi’s se­ries, which fa­mously does not con­verge un­der the usual defi­ni­tions of con­ver­gence.

LDU though will place in a dis­count term to get:

It is clear that this is sim­ply a ge­o­met­ric se­ries, and we can find its value us­ing the stan­dard for­mula for ge­o­met­ric se­ries:

Tak­ing the limit:

There­fore, the Abel sum of this se­ries is one half, and, since , we have de­ter­mined that is bet­ter than (morally prefer­able to) .

This seems kind of in­tu­itive: as you add more and more terms, the value of the se­ries os­cillates be­tween zero and one, so in some sense the limit of the se­ries is one half.

### Markov De­ci­sion Pro­cesses (MDP)

Markov De­ci­sion Pro­cesses, ac­cord­ing to Wikipe­dia, are:

At each time step, the pro­cess is in some state , and the de­ci­sion maker may choose any ac­tion that is available in state . The pro­cess re­sponds at the next time step by ran­domly mov­ing into a new state , and giv­ing the de­ci­sion maker a cor­re­spond­ing re­ward .
The prob­a­bil­ity that the pro­cess moves into its new state is in­fluenced by the cho­sen ac­tion. Speci­fi­cally, it is given by the state tran­si­tion func­tion . Thus, the next state de­pends on the cur­rent state and the de­ci­sion maker’s ac­tion .

At each time step the de­ci­sion-maker chooses be­tween a finite num­ber of op­tions, which causes the uni­verse to (prob­a­bil­is­ti­cally) move into one of a finite num­ber of states, giv­ing the de­ci­sion-maker a (finite) pay­off. By re­peat­ing this pro­cess an in­finite num­ber of times, we can con­struct a se­quence where is the pay­off at time .

The set of all se­quences gen­er­ated by a de­ci­sion-maker who fol­lows a sin­gle, time in­de­pen­dent, (i.e. sta­tion­ary) policy is what is con­sid­ered by Jon­s­son. Cru­cially, he shows that LDU is able to com­pare any two streams gen­er­ated by a sta­tion­ary Markov de­ci­sion pro­cess. [1]

### Why This Matters

My im­me­di­ate ob­jec­tion upon read­ing this pa­per was “of course if you limit us to only finitely many choices then the prob­lem is sol­u­ble – the en­tire prob­lem only oc­curs be­cause we want to ex­am­ine in­finite things!”

After hav­ing thought about it more though, I think this is an im­por­tant step for­ward, and MDPs rep­re­sent an im­por­tantly large class of de­ci­sion pro­cesses.

Even though the uni­verse may be in­finite in time and space, in any time in­ter­val there is plau­si­bly only finitely many states I could be in, e.g. per­haps be­cause there are only finitely many neu­rons in my brain.

(Some­one who knows more about physics than I might be able to com­ment on a stronger ar­gu­ment: if lo­cal­ity holds, then per­haps it is a law of na­ture that only finitely many things can af­fect us within a finite time win­dow?)

Se­quences gen­er­ated by MDPs are there­fore plau­si­bly the only set of se­quences a de­ci­sion-maker may need to prac­ti­cally con­sider.

### Out­stand­ing Issues

My biggest out­stand­ing con­cern with mod­el­ing our de­ci­sions with an MDP is that the pay­offs have to re­main con­stant. It seems likely that, as we learn more, we will dis­cover that cer­tain states are more or less valuable than we had pre­vi­ously thought. E.g. we may learn that in­sects are more con­scious than pre­vi­ously ex­pected, and there­fore in­sect suffer­ing af­fects our pay­offs more highly than we had origi­nally thought. It seems like maybe one could have a “meta-MDP” which some­how mod­els this, but I’m not fa­mil­iar enough with the area to say for sure.

A more the­o­ret­i­cal ques­tion is: what se­quences can be gen­er­ated via MDPs? My hope is that one day some­one will show LDU (or a similarly in­tu­itive al­gorithm) can com­pare any two com­putable se­quences, but I don’t think that this is that proof.

Lastly, we have the stan­dard prob­lems of in­fini­tar­ian fa­nat­i­cism and paral­y­sis. E.g. even if our cur­rent best model of the uni­verse pre­dicted that MDP was ex­actly cor­rect, there would still be some pos­i­tive prob­a­bil­ity that it was wrong and then our “meta-de­ci­sion pro­ce­dure” is un­clear.

### Conclusion

Over­all, I don’t think that this com­pletely solves the ques­tions with com­par­ing in­finite util­ity streams, but it’s a large step for­ward. Pre­vi­ous al­gorithms like the over­tak­ing crite­rion had fairly “ob­vi­ous” in­com­pa­rable streams, with no real jus­tifi­ca­tion for why those streams would not be en­coun­tered by a de­ci­sion-maker. LDU is not com­plete, but we at least have some rea­son to think that it may be all we “prac­ti­cally” need.

I would like to thank Adam Jon­s­son for dis­cussing this with me. I have done my best to rep­re­sent LDU, but any er­rors in the above are mine. Notably, the jus­tifi­ca­tion for why MDP’s are all we need to con­sider is en­tirely mine, and I’m not sure what Adam thinks about it.

1. This is not ex­plic­itly stated in Jon­s­son’s pa­per, but it fol­lows from the proof of the­o­rem 1. Jon­s­son con­firmed this in email dis­cus­sions with me.

• A prob­lem with this ap­proach is that the or­der­ing of the things in the se­quence mat­ters ((1,0,1,0,1...) re­orders to (1,0,0,1,0,0,1...)). This method works here, where the or­der­ing is by mo­ments of time, but not for, say, sum­ming the util­ity of in­fin­tely many agents, where there is no clear or­der­ing.

I have a method of com­par­i­son that doesn’t de­pend on the or­der­ing: https://​​agent­foun­da­tions.org/​​item?id=1455

• Thanks! Your idea is in­ter­est­ing – I put a com­ment on that post.

Some­thing you are prob­a­bly aware of is that ac­cept­ing “anonymity” (al­low­ing the se­quence to be re­ordered ar­bi­trar­ily) re­quires us to re­ject seem­ingly in­tu­itive prin­ci­ples like Pareto (if you can make some­one bet­ter off and no one worse off, then you should).

Per­son­ally, I would rather keep Pareto than anonymity, but I think it’s cool to ex­plore what anony­mous or­der­ings can do.

• I have not looked through the math in de­tail, but I ap­pre­ci­ate the non-tech­n­cial dis­cus­sion at the end, and I like sum­maries of con­tri­bu­tions to an im­por­tant prob­lem, so I’ve moved it to the front­page.

• I be­lieve that the solu­tion to this prob­lem in­volves sur­real num­bers. Here’s an ex­tract from an email that I sent to Amanda Askell. I’m plan­ning on writ­ing up a full post on this soon­ish, but I’m also look­ing for jobs at the mo­ment, so there is a bit of a con­flict there. I know this needs to be for­mal­ised more though.

“Thanks for feed­back on us­ing sur­real num­bers.

Eddy Chen and Daniel Ru­bio seem to be us­ing an ap­proach quite similar to me. In par­tic­u­lar, they made two key in­sights:

• If non-stan­dard in­fini­ties are go­ing to be used, then sur­real num­bers are a much more nat­u­ral non-stan­dard class num­ber to use than hyperreals

• Se­quences can also have a sur­real num­ber at­tached as a length

How­ever, that pre­sen­ta­tion is not quite a com­plete the­ory. One of the biggest is­sues is that they ar­gued it is in­valid to re-ar­range se­quences, when the spa­tial or­der should not make a differ­ence. In par­tic­u­lar, they wanted to say that it was in­valid to re­ar­range 1,-1/​2,1/​3,-1/​4… is re­ar­ranged to 1,-1/​2,1/​3,1/​5,-1/​4,1/​7,1/​9,1/​11,1/​13,-1/​6 as the origi­nal se­quence had the same num­ber of pos­i­tive and nega­tive terms, but the later se­quence has more pos­i­tive terms up to any par­tic­u­lar point.

An in­for­mal de­scrip­tion of my ap­proach to re­solve this works as fol­lows:

• In­stead of sim­ply at­tach­ing a sin­gle length to a se­quence, we need lengths at­tached to all sub-se­quences. If we do this, then we can take a countably in­finite se­quence 1,1,1… with length X (with X is sur­real) and an­other se­quence 2,2,2… with length Y and splice them to­gether into a se­quence 1,2,1,2… with to­tal util­ity X+2Y. We could splice them to be 1,1,2,1,1,2… or 1,2,2,1,2,2… in­stead, but this won’t change the to­tal util­ity as long as we keep in mind that there are X ones and Y twos.

• Similarly, for the 1,-1/​2,1/​3,-1/​4 se­quence. If there are X pos­i­tive terms and Y nega­tive terms, then this will re­main the same af­ter it is rearranged

• We define ho­moge­nous se­quences as se­quence where that the odd num­bered places have the same “length” as the even num­bered places and three sub­se­quences of ev­ery third el­e­ment have the same length and so on for ev­ery fourth ques­tion ect. (It’s ac­tu­ally a bit more com­plex than this)

• After we have defined ho­moge­nous se­quences, we can say 1,2,1,2 (length X, ho­moge­nous) is differ­ent from 1,1,2,1,1,2… (length X, ho­moge­nous) as Eddy Chen and Daniel Ru­bio wanted to do, but us­ing a more for­mal ac­count.

As per Eddy Chen and Daniel Ru­bio’s model, this will be­have as ex­pected with re­gards to stan­dard changes – adding el­e­ments, delet­ing el­e­ments, in­creas­ing sin­gle val­ues, de­creas­ing sin­gle val­ues, in­creas­ing all val­ues, de­creas­ing all val­ues, mul­ti­ply­ing all val­ues ect. At the same time, re­ar­range­ments pre­serve util­ity.”

• Thanks! Some­one (maybe it was you?) pointed me to Chen and Ru­bio’s stuff be­fore, and it sounds in­ter­est­ing.

I don’t fully un­der­stand the in­for­mal write up you have above, but I’m look­ing for­ward to see­ing the fi­nal thing!

• >My hope is that one day some­one will show LDU (or a similarly in­tu­itive al­gorithm) can com­pare any two com­putable se­quences, but I don’t think that this is that proof.

I’m pretty sure you can’t use a com­putable al­gorithm to do this for gen­eral com­putable se­quences while main­tain­ing weak Pareto effi­ciency due to a di­ag­o­nal­iza­tion ar­gu­ment. Let be the al­gorithm you use to choose be­tween two com­putable se­quences, which re­turns 0 if the first se­quence is bet­ter and 1 oth­er­wise. Let be the in­finite se­quence whose value is always 0.5. Con­sider the se­quence where has value . That is, if chooses , then is an in­finite se­quence of s, and if chooses , then is an in­finite se­quence of s. Either way, vi­o­lates weak Pareto effi­ciency, since it tells you to choose a se­quence whose value at ev­ery timestep is less than the other se­quence.

• Thanks! But is that cor­rect? I no­tice that your ar­gu­ment seems to work for finite se­quences as well (or even sin­gle ra­tio­nal num­bers), but clearly we can or­der the ra­tio­nal num­bers.

• I think the is­sue here is whether you’re com­par­ing func­tions (which al­low self-refer­ence in a way that can break or­der­ing) or num­bers (which don’t); for or­der­ing ar­bi­trary com­putable se­quences, you need to have a way to avoid the sort of di­ag­o­nal­iza­tion that makes your choices af­fect the se­quences you have to sort.

• FYI, I was still con­fused about this so I posted on math .se. Some­one re­sponded that the above proof is in­cor­rect, but they gave their own proof that there is no com­putable or­der­ing over which re­spects Pareto.

• Warn­ing: I haven’t read the pa­per so take this with a grain of salt

Here’s how it would go wrong if I un­der­stand it right: For ex­po­nen­tially dis­counted MDPs there’s some­thing called an effec­tive hori­zon. That means ev­ery­thing af­ter that time is es­sen­tially ig­nored.

You pick a tiny . Say (with­out loss of gen­er­al­ity) that all util­ities . Then there is a time with . So the dis­counted cu­mu­la­tive util­ity from any­thing af­ter is bounded by (which fol­lows from the limit of the ge­o­met­ric se­ries). That’s an ar­bi­trar­ily small con­stant.

We can now eas­ily con­struct pairs of se­quences for which LDU gives coun­ter­in­tu­itive con­clu­sions. E.g. a se­quence which is max­i­mally bet­ter than for any un­til the end of time but ever so slightly worse (by ) for .

So any­thing that hap­pens af­ter is es­sen­tially ig­nored—we’ve es­sen­tially made the prob­lem finite.

Ex­po­nen­tial dis­count­ing in MDPs is stan­dard prac­tice. I’m sur­prised that this is pre­sented as a big ad­vance in in­finite ethics as peo­ple have cer­tainly thought about this in eco­nomics, ma­chine learn­ing and ethics be­fore.

Btw, your meta-MDP prob­a­bly falls into the cat­e­gory of Bayes-Adap­tive MDP (BAMDP) or Bayes-Adap­tive par­tially ob­serv­able MDP (BAPOMDP) with learned re­wards.

• Thanks for the re­sponse. EDIT: Adam pointed out to me that LDU does not suffer from dic­ta­tor­ship of the pre­sent as I origi­nally stated be­low and as you ar­gued above. What you are say­ing is true for a fixed dis­count fac­tor, but in this case we take the limit as .

The prop­erty you de­scribe is known as “dic­ta­tor­ship of the pre­sent”, and you can read more about it here. In or­der to get rid of this “dic­ta­tor­ship” you end up hav­ing to do things like re­ject sta­tion­ary, which are plau­si­bly just as coun­ter­in­tu­itive.

> I’m sur­prised that this is pre­sented as a big ad­vance in in­finite ethics as peo­ple have cer­tainly thought about this in eco­nomics, ma­chine learn­ing and ethics be­fore.

Could you elab­o­rate? The rea­son that I thought this was im­por­tant was:

> Pre­vi­ous al­gorithms like the over­tak­ing crite­rion had fairly “ob­vi­ous” in­com­pa­rable streams, with no real jus­tifi­ca­tion for why those streams would not be en­coun­tered by a de­ci­sion-maker. LDU is not com­plete, but we at least have some rea­son to think that it may be all we “prac­ti­cally” need.

Are there other al­gorithms which you think are all we will “prac­ti­cally” need?

• Ad­mit­tedly, I do not have much of an idea about In­finite Ethics, but it ap­peared to me that the prob­lem was to a large ex­tent about how to deal with an in­finite num­ber of agents on which you can define no mea­sure/​or­der so that you can dis­count util­ities.

Right now, I don’t see how this ap­proach helps with that?

• The canon­i­cal prob­lem in in­finite ethics is to cre­ate a prefer­ence re­la­tion over which is in some sense “rea­son­able”. That’s what this ap­proach does.

For more back­ground you can see this back­ground ar­ti­cle or my overview of the field.