Harsanyi’s Social Aggregation Theorem and what it means for CEV

A Friendly AI would have to be able to ag­gre­gate each per­son’s prefer­ences into one util­ity func­tion. The most straight­for­ward and ob­vi­ous way to do this is to agree on some way to nor­mal­ize each in­di­vi­d­ual’s util­ity func­tion, and then add them up. But many peo­ple don’t like this, usu­ally for rea­sons in­volv­ing util­ity mon­sters. If you are one of these peo­ple, then you bet­ter learn to like it, be­cause ac­cord­ing to Harsanyi’s So­cial Ag­gre­ga­tion The­o­rem, any al­ter­na­tive can re­sult in the sup­pos­edly Friendly AI mak­ing a choice that is bad for ev­ery mem­ber of the pop­u­la­tion. More for­mally,

Ax­iom 1: Every per­son, and the FAI, are VNM-ra­tio­nal agents.

Ax­iom 2: Given any two choices A and B such that ev­ery per­son prefers A over B, then the FAI prefers A over B.

Ax­iom 3: There ex­ist two choices A and B such that ev­ery per­son prefers A over B.

(Edit: Note that I’m as­sum­ing a fixed pop­u­la­tion with fixed prefer­ences. This still seems rea­son­able, be­cause we wouldn’t want the FAI to be dy­nam­i­cally in­con­sis­tent, so it would have to draw its val­ues from a fixed pop­u­la­tion, such as the peo­ple al­ive now. Alter­na­tively, even if you want the FAI to ag­gre­gate the prefer­ences of a chang­ing pop­u­la­tion, the the­o­rem still ap­plies, but this comes with it’s own prob­lems, such as giv­ing peo­ple (pos­si­bly in­clud­ing the FAI) in­cen­tives to cre­ate, de­stroy, and mod­ify other peo­ple to make the ag­gre­gated util­ity func­tion more fa­vor­able to them.)

Give each per­son a unique in­te­ger la­bel from

, where
is the num­ber of peo­ple. For each per­son
, let
be some func­tion that, in­ter­preted as a util­ity func­tion, ac­cu­rately de­scribes
’s prefer­ences (there ex­ists such a func­tion by the VNM util­ity the­o­rem). Note that I want
to be some par­tic­u­lar func­tion, dis­tinct from, for in­stance,
, even though
rep­re­sent the same util­ity func­tion. This is so it makes sense to add them.

The­o­rem: The FAI max­i­mizes the ex­pected value of

, for some set of scalars

Ac­tu­ally, I changed the ax­ioms a lit­tle bit. Harsanyi origi­nally used “Given any two choices A and B such that ev­ery per­son is in­differ­ent be­tween A and B, the FAI is in­differ­ent be­tween A and B” in place of my ax­ioms 2 and 3 (also he didn’t call it an FAI, of course). For the proof (from Harsanyi’s ax­ioms), see sec­tion III of Harsanyi (1955), or sec­tion 2 of Ham­mond (1992). Ham­mond claims that his proof is sim­pler, but he uses jar­gon that scared me, and I found Harsanyi’s proof to be fairly straight­for­ward.

Harsanyi’s ax­ioms seem fairly rea­son­able to me, but I can imag­ine some­one ob­ject­ing, “But if no one else cares, what’s wrong with the FAI hav­ing a prefer­ence any­way. It’s not like that would harm us.” I will con­cede that there is no harm in al­low­ing the FAI to have a weak prefer­ence one way or an­other, but if the FAI has a strong prefer­ence, that be­ing the only thing that is re­flected in the util­ity func­tion, and if ax­iom 3 is true, then ax­iom 2 is vi­o­lated.

proof that my ax­ioms im­ply Harsanyi’s: Let A and B be any two choices such that ev­ery per­son is in­differ­ent be­tween A and B. By ax­iom 3, there ex­ists choices C and D such that ev­ery per­son prefers C over D. Now con­sider the lot­ter­ies and

, for
. No­tice that ev­ery per­son prefers the first lot­tery to the sec­ond, so by ax­iom 2, the FAI prefers the first lot­tery. This re­mains true for ar­bi­trar­ily small
, so by con­ti­nu­ity, the FAI must not pre­fer the sec­ond lot­tery for
; that is, the FAI must not pre­fer B over A. We can “sweeten the pot” in fa­vor of B the same way, so by the same rea­son­ing, the FAI must not pre­fer A over B.

So why should you ac­cept my ax­ioms?

Ax­iom 1: The VNM util­ity ax­ioms are widely agreed to be nec­es­sary for any ra­tio­nal agent.

Ax­iom 2: There’s some­thing a lit­tle redicu­lous about claiming that ev­ery mem­ber of a group prefers A to B, but that the group in ag­gre­gate does not pre­fer A to B.

Ax­iom 3: This ax­iom is just to es­tab­lish that it is even pos­si­ble to ag­gre­gate the util­ity func­tions in a way that vi­o­lates ax­iom 2. So es­sen­tially, the the­o­rem is “If it is pos­si­ble for any­thing to go hor­ribly wrong, and the FAI does not max­i­mize a lin­ear com­bi­na­tion of the peo­ple’s util­ity func­tions, then some­thing will go hor­ribly wrong.” Also, ax­iom 3 will al­most always be true, be­cause it is true when the util­ity func­tions are lin­early in­de­pen­dent, and al­most all finite sets of func­tions are lin­early in­de­pen­dent. There are ter­ror­ists who hate your free­dom, but even they care at least a lit­tle bit about some­thing other than the op­po­site of what you care about.

At this point, you might be protest­ing, “But what about equal­ity? That’s definitely a good thing, right? I want some­thing in the FAI’s util­ity func­tion that ac­counts for equal­ity.” Equal­ity is a good thing, but only be­cause we are risk averse, and risk aver­sion is already ac­counted for in the in­di­vi­d­ual util­ity func­tions. Peo­ple of­ten talk about equal­ity be­ing valuable even af­ter ac­count­ing for risk aver­sion, but as Harsanyi’s the­o­rem shows, if you do add an ex­tra term in the FAI’s util­ity func­tion to ac­count for equal­ity, then you risk de­sign­ing an FAI that makes a choice that hu­man­ity unan­i­mously dis­agrees with. Is this ex­tra equal­ity term so im­por­tant to you that you would be will­ing to ac­cept that?

Re­mem­ber that VNM util­ity has a pre­cise de­ci­sion-the­o­retic mean­ing. Twice as much util­ity does not cor­re­spond to your in­tu­itions about what “twice as much good­ness” means. Your in­tu­itions about the best way to dis­tribute good­ness to peo­ple will not nec­es­sar­ily be good ways to dis­tribute util­ity. The ax­ioms I used were ex­tremely rudi­men­tary, whereas the in­tu­ition that gen­er­ated “there should be a term for equal­ity or some­thing” is un­trust­wor­thy. If they come into con­flict, you can’t keep all of them. I don’t see any way to jus­tify giv­ing up ax­ioms 1 or 2, and ax­iom 3 will likely re­main true whether you want it to or not, so you should prob­a­bly give up what­ever else you wanted to add to the FAI’s util­ity func­tion.


Harsanyi, John C. “Car­di­nal welfare, in­di­vi­d­u­al­is­tic ethics, and in­ter­per­sonal com­par­i­sons of util­ity.” The Jour­nal of Poli­ti­cal Econ­omy (1955): 309-321.

Ham­mond, Peter J. “Harsanyi’s util­i­tar­ian the­o­rem: A sim­pler proof and some eth­i­cal con­no­ta­tions.” IN R. SELTEN (ED.) RATIONAL INTERACTION: ESSAYS IN HONOR OF JOHN HARSANYI. 1992.