The Preference Utilitarian’s Time Inconsistency Problem

In May of 2007, DanielLC asked at Feli­cifa, an “on­line util­i­tar­i­anism com­mu­nity”:

If prefer­ence util­i­tar­i­anism is about mak­ing peo­ples’ prefer­ences and the uni­verse co­in­cide, wouldn’t it be much eas­ier to change peo­ples’ prefer­ences than the uni­verse?

In­deed, if we were to pro­gram a su­per-in­tel­li­gent AI to use the util­ity func­tion U(w) = sum of w’s util­ities ac­cord­ing to peo­ple (i.e., morally rele­vant agents) who ex­ist in world-his­tory w, the AI might end up kil­ling ev­ery­one who is al­ive now and cre­at­ing a bunch of new peo­ple whose prefer­ences are more eas­ily satis­fied, or just use its su­per in­tel­li­gence to per­suade us to be more satis­fied with the uni­verse as it is.

Well, that can’t be what we want. Is there an al­ter­na­tive for­mu­la­tion of prefer­ence util­i­tar­i­anism that doesn’t ex­hibit this prob­lem? Per­haps. Sup­pose we in­stead pro­gram the AI to use U’(w) = sum of w’s util­ities ac­cord­ing to peo­ple who ex­ist at the time of de­ci­sion. This solves the Daniel’s prob­lem, but in­tro­duces a new one: time in­con­sis­tency.

The new AI’s util­ity func­tion de­pends on who ex­ists at the time of de­ci­sion, and as that time changes and peo­ple are born and die, its util­ity func­tion also changes. If the AI is ca­pa­ble of re­flec­tion and self-mod­ifi­ca­tion, it should im­me­di­ately no­tice that it would max­i­mize its ex­pected util­ity, ac­cord­ing to its cur­rent util­ity func­tion, by mod­ify­ing it­self to use U’’(w) = sum of w’s util­ities ac­cord­ing to peo­ple who ex­isted at time T0, where T0 is a con­stant rep­re­sent­ing the time of self-mod­ifi­ca­tion.

The AI is now re­flec­tively con­sis­tent, but is this the right out­come? Should the whole fu­ture of the uni­verse be shaped only by the prefer­ences of those who hap­pen to be al­ive at some ar­bi­trary point in time? Pre­sum­ably, if you’re a util­i­tar­ian in the first place, this is prob­a­bly not the kind of util­i­tar­i­anism that you’d want to sub­scribe to.

So, what is the solu­tion to this prob­lem? Robin Han­son’s ap­proach to moral philos­o­phy may work. It tries to take into ac­count ev­ery­one’s prefer­ences—those who lived in the past, those who will live in the fu­ture, and those who have the po­ten­tial to ex­ist but don’t—but I don’t think he has worked out (or writ­ten down) the solu­tion in de­tail. For ex­am­ple, is the util­i­tar­ian AI sup­posed to sum over ev­ery log­i­cally pos­si­ble util­ity func­tion and weigh them equally? If not, what weigh­ing scheme should it use?

Per­haps some­one can fol­low up Robin’s idea and see where this ap­proach leads us? Or does any­one have other ideas for solv­ing this time in­con­sis­tency prob­lem?