Synthesising divergent preferences: an example in population ethics

This post is an ex­am­ple of how one could go about sythe­sis­ing con­tra­dic­tory hu­man prefer­ences and meta-prefer­ences, us­ing a some­what sim­plified ver­sion of the method of this post.

The aim is to illus­trate how I imag­ine the pro­cess go­ing. I’ll take the ex­am­ple of pop­u­la­tion ethics, since it’s an area with di­ver­gent in­tu­ition and ar­gu­ments.

Over-com­pli­cated val­ues?

One key point of this syn­the­sis is that I am very re­luc­tant to throw a prefer­ence away com­pletely, prefer­ring to keep in around in a very weak­ened form. The rea­son­ing be­hind this de­ci­sion is 1) A weak prefer­ence will make lit­tle prac­ti­cal differ­ence for most de­ci­sions, and 2) Most AI catas­tro­phes in­volve over-sim­ple util­ity func­tions, so sim­plic­ity it­self is some­thing to be wary of, and there is no clear bound­ary be­tween “good sim­plic­ity” and “bad sim­plic­ity”.

So we might end up los­ing most of hu­man value if we go too sim­ple, and be a lit­tle bit in­effi­cient if we don’t go sim­ple enough.

Ba­sic pieces

Hu­man H has not thought hard about pop­u­la­tion ethics at all. In term of the­ory, they have a vague prefer­ence for to­tal util­i­tar­i­anism and they have some egal­i­tar­ian prefer­ences. They also have a lot of “good/​bad” ex­am­ples drawn from some knowl­edge of his­tory and some fic­tional ex­pe­riences (ie book, movies, and TV shows).

Note that it is wrong to use fic­tion as ev­i­dence of how the world works. How­ever, us­ing fic­tion as source of val­ues is not in­trin­si­cally wrong.

At the meta-level, they have one rele­vant prefer­ence: they have a prefer­ence for sim­ple mod­els.

Weights of the preference

H has these prefer­ences with differ­ent strengths, de­noted by (un-nor­mal­ised) weights to these prefer­ences. Their weight on to­tal util­i­tar­i­anism is , their weight on egal­i­tar­i­anism is , their weight on their ex­am­ples is , and their weight on model sim­plic­ity is .

Fu­ture arguments

The AI checks how the hu­man would re­spond if pre­sented with the mere ad­di­tion ar­gu­ment, the re­pug­nant con­clu­sion, and the very re­pug­nant con­clu­sion.

They ac­cept the logic of the mere ad­di­tion ar­gu­ment, but feel emo­tion­ally against the two re­pug­nant con­clu­sions; most of the ex­am­ples they can bring to mind are against it.


There are mul­ti­ple ways of do­ing the syn­the­sis; here is one I be­lieve is ac­cept­able. The model sim­plic­ity weight is ; how­ever, the mere ad­di­tion ar­gu­ment only works un­der to­tal-util­i­tar­ian prefer­ences, it does not work un­der egal­i­tar­ian prefer­ences. The ra­tio of to­tal to egal­i­tar­ian is , so the weight given to to­tal util­i­tar­ian-style ar­gu­ments by the mere ad­di­tion ar­gu­ment is .

(No­tice here that egal­i­tar­i­anism has no prob­lems with the re­pug­nant con­clu­sion at all—how­ever, it doesn’t be­lieve in the mere ad­di­tion ar­gu­ment, so this pro­vides no ex­tra weight)

If we tried to do a best fit of util­ity the­ory to all of H’s men­tal ex­am­ples, we’d end up with a util­ity that is mainly some form of pri­ori­tar­i­anism with some ex­tra com­pli­ca­tions; write this as , for a util­ity func­tion whose changes in mag­ni­tude are small com­pared with .

Let and be to­tal and egal­i­tar­ian pop­u­la­tion ethics util­ities. Put­ting this to­gether, we could get an over­all util­ity func­tion:

  • .

The on is its ini­tial val­u­a­tion, the is the ex­tra com­po­nent it picked up from the mere ad­di­tion ar­gu­ment.

In practice

So what does max­imis­ing look like in prac­tice? Well, be­cause the weights of the three main util­ities are com­pa­rable, they will push to­wards wor­lds that al­ign with all three—high to­tal util­ity, high equal­ity. Mild in­creases in in­equal­ity are ac­cept­able for large in­creases in to­tal util­ity or util­ity of the worst of pop­u­la­tion, and the other trade­offs are similar.

What is mainly ruled out are wor­lds where one fac­tor is max­imised strongly, but the oth­ers are ruth­lessly min­imised.


The above syn­the­sis raises a num­ber of is­sues (which is why I was keen to write it down).

Utility normalisation

First of all, there’s the ques­tion of nor­mal­is­ing the var­i­ous util­ity func­tions. A slight prefer­ence for a util­ity func­tion can swamp all other prefer­ences, if the mag­ni­tude changes in across differ­ent choices are huge. I tend to as­sume a min-max nor­mal­i­sa­tion for any util­ity , so that

with the op­ti­mal policy for , and the op­ti­mal policy for (hence the worst policy for ). This min-max nor­mal­i­sa­tion doesn’t have par­tic­u­larly nice prop­er­ties, but then again, nei­ther do any other nor­mal­i­sa­tions.

Fac­tual er­rors and consequences

I said that fic­tional ev­i­dence and his­tor­i­cal ex­am­ples are fine for con­struct­ing prefer­ences. But what if part of this ev­i­dence is based on fac­tu­ally wrong sup­po­si­tions? For ex­am­ple, maybe one of strong ex­am­ples in favour of egal­i­tar­i­anism is H imag­in­ing them­selves in very poor situ­a­tions. But maybe peo­ple who are gen­uinely poor don’t suffer as much as H imag­ines they would. Or, con­verely, maybe peo­ple do suffer from lack of equal­ity, more than H might think.

Things like this seem as if its just a di­vi­sion be­tween fac­tual and prefer­ences be­liefs, but it’s not so easy to dis­en­tan­gle. A re­li­gious fun­da­men­tal­ist might have a pic­ture of heaven in which ev­ery­one would ac­tu­ally be mis­er­able. To con­vince them of this fact, it is not suffi­cient to point that out; the pro­cess of caus­ing them to be­lieve it may break many other as­pects of their prefer­ences and world-view as well. More im­por­tantly, learn­ing some true facts will likely cause peo­ple to change the strength (the weight) with which they hold cer­tain prefer­ences.

This does not seem in­sol­u­ble, but it is a challenge to be aware of.

Order of op­er­a­tions and underdefined

Many peo­ple con­struct ex­plicit prefer­ences by com­par­ing their con­se­quences with men­tal ex­am­ples, then re­ject­ing or ac­cept­ing the prefer­ences based on this fit. This pro­cess is very vuln­er­a­ble to which ex­am­ples spring to mind at the time. I showed a pro­cess that ex­trap­o­lated cur­rent prefer­ences by imag­in­ing H en­coun­ter­ing the mere ad­di­tion ar­gu­ment, and the very re­pug­nant con­clu­sion. But I chose those ex­am­ples be­cause they are salient to me and to a lot of philoso­phers in that area, so the choice is some­what ar­bi­trary. When H’s prefer­ences are taken as fixed—be­fore or af­ter which hy­po­thet­i­cal ar­gu­ments—will be im­por­tant for the fi­nal re­sult.

Similarly, I used and sep­a­rately to con­sider H’s re­ac­tions to the mere ad­di­tion ar­gu­ment. I could in­stead have used to do so, or even . For these util­ities, the mere ad­di­tion ar­gu­ment doesn’t go through at all, so does not give any ex­tra weight to . Why did I do it that way? Be­cause I judged that the mere ad­di­tion ar­gu­ment sounds per­sua­sive, even to peo­ple who should ac­tu­ally re­ject it based on some syn­the­sis of their cur­rent prefer­ences.

So there re­mains a lot of de­tails to fill in and choices to make.


And, of course, H might have higher or­der prefer­ences about util­ity nor­mal­i­sa­tion, fac­tual er­rors, or­ders of op­er­a­tion, and so on. Th­ese provide an ex­tra layer of pos­si­ble com­pli­ca­tions to add on.