Harsanyi’s Social Aggregation Theorem and what it means for CEV

A Friendly AI would have to be able to ag­gre­gate each per­son’s prefer­ences into one util­ity func­tion. The most straight­for­ward and ob­vi­ous way to do this is to agree on some way to nor­mal­ize each in­di­vi­d­ual’s util­ity func­tion, and then add them up. But many peo­ple don’t like this, usu­ally for rea­sons in­volv­ing util­ity mon­sters. If you are one of these peo­ple, then you bet­ter learn to like it, be­cause ac­cord­ing to Harsanyi’s So­cial Ag­gre­ga­tion The­o­rem, any al­ter­na­tive can re­sult in the sup­pos­edly Friendly AI mak­ing a choice that is bad for ev­ery mem­ber of the pop­u­la­tion. More for­mally,

Ax­iom 1: Every per­son, and the FAI, are VNM-ra­tio­nal agents.

Ax­iom 2: Given any two choices A and B such that ev­ery per­son prefers A over B, then the FAI prefers A over B.

Ax­iom 3: There ex­ist two choices A and B such that ev­ery per­son prefers A over B.

(Edit: Note that I’m as­sum­ing a fixed pop­u­la­tion with fixed prefer­ences. This still seems rea­son­able, be­cause we wouldn’t want the FAI to be dy­nam­i­cally in­con­sis­tent, so it would have to draw its val­ues from a fixed pop­u­la­tion, such as the peo­ple al­ive now. Alter­na­tively, even if you want the FAI to ag­gre­gate the prefer­ences of a chang­ing pop­u­la­tion, the the­o­rem still ap­plies, but this comes with it’s own prob­lems, such as giv­ing peo­ple (pos­si­bly in­clud­ing the FAI) in­cen­tives to cre­ate, de­stroy, and mod­ify other peo­ple to make the ag­gre­gated util­ity func­tion more fa­vor­able to them.)

Give each per­son a unique in­te­ger la­bel from

$1$
to
$n$
, where
$n$
is the num­ber of peo­ple. For each per­son
$k$
, let
$u_{k}$
be some func­tion that, in­ter­preted as a util­ity func­tion, ac­cu­rately de­scribes
$k$
’s prefer­ences (there ex­ists such a func­tion by the VNM util­ity the­o­rem). Note that I want
$u_{k}$
to be some par­tic­u­lar func­tion, dis­tinct from, for in­stance,
$2u_{k}-7$
, even though
$u_{k}$
and
$2u_{k}-7$
rep­re­sent the same util­ity func­tion. This is so it makes sense to add them.

The­o­rem: The FAI max­i­mizes the ex­pected value of

${\displaystyle \sum_{k=1}^{n}c_{k}u_{k}}$
, for some set of scalars
$\left\{ c_{k}\right\} _{k=1}^{n}$
.

Ac­tu­ally, I changed the ax­ioms a lit­tle bit. Harsanyi origi­nally used “Given any two choices A and B such that ev­ery per­son is in­differ­ent be­tween A and B, the FAI is in­differ­ent be­tween A and B” in place of my ax­ioms 2 and 3 (also he didn’t call it an FAI, of course). For the proof (from Harsanyi’s ax­ioms), see sec­tion III of Harsanyi (1955), or sec­tion 2 of Ham­mond (1992). Ham­mond claims that his proof is sim­pler, but he uses jar­gon that scared me, and I found Harsanyi’s proof to be fairly straight­for­ward.

Harsanyi’s ax­ioms seem fairly rea­son­able to me, but I can imag­ine some­one ob­ject­ing, “But if no one else cares, what’s wrong with the FAI hav­ing a prefer­ence any­way. It’s not like that would harm us.” I will con­cede that there is no harm in al­low­ing the FAI to have a weak prefer­ence one way or an­other, but if the FAI has a strong prefer­ence, that be­ing the only thing that is re­flected in the util­ity func­tion, and if ax­iom 3 is true, then ax­iom 2 is vi­o­lated.

proof that my ax­ioms im­ply Harsanyi’s: Let A and B be any two choices such that ev­ery per­son is in­differ­ent be­tween A and B. By ax­iom 3, there ex­ists choices C and D such that ev­ery per­son prefers C over D. Now con­sider the lot­ter­ies $pC+\left(1-p\right)A$ and

$pD+\left(1-p\right)B$
, for
$p>0$
. No­tice that ev­ery per­son prefers the first lot­tery to the sec­ond, so by ax­iom 2, the FAI prefers the first lot­tery. This re­mains true for ar­bi­trar­ily small
$p>0$
, so by con­ti­nu­ity, the FAI must not pre­fer the sec­ond lot­tery for
$p=0$
; that is, the FAI must not pre­fer B over A. We can “sweeten the pot” in fa­vor of B the same way, so by the same rea­son­ing, the FAI must not pre­fer A over B.

So why should you ac­cept my ax­ioms?

Ax­iom 1: The VNM util­ity ax­ioms are widely agreed to be nec­es­sary for any ra­tio­nal agent.

Ax­iom 2: There’s some­thing a lit­tle redicu­lous about claiming that ev­ery mem­ber of a group prefers A to B, but that the group in ag­gre­gate does not pre­fer A to B.

Ax­iom 3: This ax­iom is just to es­tab­lish that it is even pos­si­ble to ag­gre­gate the util­ity func­tions in a way that vi­o­lates ax­iom 2. So es­sen­tially, the the­o­rem is “If it is pos­si­ble for any­thing to go hor­ribly wrong, and the FAI does not max­i­mize a lin­ear com­bi­na­tion of the peo­ple’s util­ity func­tions, then some­thing will go hor­ribly wrong.” Also, ax­iom 3 will al­most always be true, be­cause it is true when the util­ity func­tions are lin­early in­de­pen­dent, and al­most all finite sets of func­tions are lin­early in­de­pen­dent. There are ter­ror­ists who hate your free­dom, but even they care at least a lit­tle bit about some­thing other than the op­po­site of what you care about.

At this point, you might be protest­ing, “But what about equal­ity? That’s definitely a good thing, right? I want some­thing in the FAI’s util­ity func­tion that ac­counts for equal­ity.” Equal­ity is a good thing, but only be­cause we are risk averse, and risk aver­sion is already ac­counted for in the in­di­vi­d­ual util­ity func­tions. Peo­ple of­ten talk about equal­ity be­ing valuable even af­ter ac­count­ing for risk aver­sion, but as Harsanyi’s the­o­rem shows, if you do add an ex­tra term in the FAI’s util­ity func­tion to ac­count for equal­ity, then you risk de­sign­ing an FAI that makes a choice that hu­man­ity unan­i­mously dis­agrees with. Is this ex­tra equal­ity term so im­por­tant to you that you would be will­ing to ac­cept that?

Re­mem­ber that VNM util­ity has a pre­cise de­ci­sion-the­o­retic mean­ing. Twice as much util­ity does not cor­re­spond to your in­tu­itions about what “twice as much good­ness” means. Your in­tu­itions about the best way to dis­tribute good­ness to peo­ple will not nec­es­sar­ily be good ways to dis­tribute util­ity. The ax­ioms I used were ex­tremely rudi­men­tary, whereas the in­tu­ition that gen­er­ated “there should be a term for equal­ity or some­thing” is un­trust­wor­thy. If they come into con­flict, you can’t keep all of them. I don’t see any way to jus­tify giv­ing up ax­ioms 1 or 2, and ax­iom 3 will likely re­main true whether you want it to or not, so you should prob­a­bly give up what­ever else you wanted to add to the FAI’s util­ity func­tion.

Ci­ta­tions:

Harsanyi, John C. “Car­di­nal welfare, in­di­vi­d­u­al­is­tic ethics, and in­ter­per­sonal com­par­i­sons of util­ity.” The Jour­nal of Poli­ti­cal Econ­omy (1955): 309-321.

Ham­mond, Peter J. “Harsanyi’s util­i­tar­ian the­o­rem: A sim­pler proof and some eth­i­cal con­no­ta­tions.” IN R. SELTEN (ED.) RATIONAL INTERACTION: ESSAYS IN HONOR OF JOHN HARSANYI. 1992.

• So when you’re talk­ing about de­ci­sion the­ory and your in­tu­itions come into con­flict with the math, listen to the math.

I think you’re over­sel­ling your case a lit­tle here. The cool thing about the­o­rems is that their con­clu­sions fol­low from their premises. If you then try to ap­ply the the­o­rem to the real world and some­one dis­likes the con­clu­sion, the ap­pro­pri­ate re­sponse isn’t “well it’s math, so you can’t do that,” it’s “tell me which of my premises you dis­like.”

An ad­di­tional is­sue here is premises which are not ex­plic­itly stated. For ex­am­ple, there’s an im­plicit premise in your post of there be­ing some fixed col­lec­tion of agents with some fixed col­lec­tion of prefer­ences that you want to ag­gre­gate. Not point­ing out this premise ex­plic­itly leaves your im­plied so­cial policy po­ten­tially vuln­er­a­ble to var­i­ous at­tacks in­volv­ing cre­at­ing agents, de­stroy­ing agents, or mod­ify­ing agents, as I’ve pointed out in other com­ments.

• I sug­gest the VNM Ex­pected Utility The­o­rem and this the­o­rem should be used as a test on po­ten­tial FAI re­searchers. Is their re­ac­tion to these the­o­rems “of course, the FAI has to be de­signed that way” or “that’s a cool piece of math, now let’s see if we can’t break it some­how”? Maybe you don’t need ev­ery­one on the re­search team to in­stinc­tively have the lat­ter re­ac­tion, but I think you definitely want to make sure at least some do. (I won­der what von Neu­mann’s re­ac­tion was to his own the­o­rem...)

• I think you’re over­sel­ling your case a lit­tle here. The cool thing about the­o­rems is that their con­clu­sions fol­low from their premises. If you then try to ap­ply the the­o­rem to the real world and some­one dis­likes the con­clu­sion, the ap­pro­pri­ate re­sponse isn’t “well it’s math, so you can’t do that,” it’s “tell me which of my premises you dis­like.”

That’s a good point. I agree, and I’ve ed­ited my post to re­flect that.

An ad­di­tional is­sue here is premises which are not ex­plic­itly stated. For ex­am­ple, there’s an im­plicit premise in your post of there be­ing some fixed col­lec­tion of agents with some fixed col­lec­tion of prefer­ences that you want to ag­gre­gate. Not point­ing out this premise ex­plic­itly leaves your im­plied so­cial policy po­ten­tially vuln­er­a­ble to var­i­ous at­tacks in­volv­ing cre­at­ing agents, de­stroy­ing agents, or mod­ify­ing agents, as I’ve pointed out in other com­ments.

I thought I was be­ing ex­plicit about that when I was writ­ing it, but look­ing at my post again, I now see that I was not. I’ve ed­ited it to try to clar­ify that.

Thanks for point­ing those out.

• Ax­iom 1: Every per­son, and the FAI, are VNM-ra­tio­nal agents.

[...]

So why should you ac­cept my ax­ioms?

Ax­iom 1: The VNM util­ity ax­ioms are widely agreed to be nec­es­sary for any ra­tio­nal agent.

Though of course, hu­mans are not VNM-ra­tio­nal.

• Only a VNM-ra­tio­nal agent can have prefer­ences in a co­her­ent way, so if we’re talk­ing about ag­gre­gat­ing peo­ple’s prefer­ences, I don’t see any way to do it other than mod­el­ing peo­ple as hav­ing un­der­ly­ing VNM-ra­tio­nal prefer­ences that fail to perfectly de­ter­mine their de­ci­sions.

• Non-VNM agents satis­fy­ing only ax­iom 1 have co­her­ent prefer­ences… they just don’t mix well with prob­a­bil­ities.

• Pre­sum­ably there would be first be an ex­trap­o­la­tion phase re­sult­ing in ra­tio­nal prefer­ences.

• But many peo­ple don’t like this, usu­ally for rea­sons in­volv­ing util­ity mon­sters. If you are one of these peo­ple, then you bet­ter learn to like it, be­cause ac­cord­ing to Harsanyi’s So­cial Ag­gre­ga­tion The­o­rem, any al­ter­na­tive can re­sult in the sup­pos­edly Friendly AI mak­ing a choice that is bad for ev­ery mem­ber of the pop­u­la­tion. More for­mally,

That a bad re­sult can hap­pen in a given strat­egy is not a con­clu­sive ar­gu­ment against prefer­ring that strat­egy. Will it hap­pen? What’s the like­li­hood that it hap­pens? What’s the cost if it does hap­pen?

The two al­ter­na­tives dis­cussed each has their own failure mode, while your “bet­ter learn to like it” ad­mo­ni­tion seems to im­ply that one side is com­pel­led by the failure mode of their preferred strat­egy to give it up for the al­ter­na­tive strat­egy.

Why is this new failure mode sup­posed to be de­ci­sive in the choice be­tween the two al­ter­na­tives?

• That a bad re­sult can hap­pen in a given strat­egy is not a con­clu­sive ar­gu­ment against prefer­ring that strat­egy.

It’s pos­si­ble that the AI would just hap­pen never to con­front a situ­a­tion where it would choose differ­ently than ev­ery­one else would, but not re­li­ably. If you had an AI that vi­o­lated ax­iom 2, it would be tempt­ing to mod­ify it to in­clude the spe­cial case “If X is the best op­tion in ex­pec­ta­tion for ev­ery morally rele­vant agent, then do X.” It seems hard to ar­gue that such a mod­ifi­ca­tion would not be an im­prove­ment. And yet only throw­ing in that spe­cial case would make it no longer VNM-ra­tio­nal. Worse than a VNM-ir­ra­tional agent is pretty bad.

Why is this new failure mode sup­posed to be de­ci­sive in the choice be­tween the two al­ter­na­tives?

Be­cause max­i­miz­ing a weighted sum of util­ity func­tions does not have any com­pa­rably con­vinc­ing failure modes. None that I’ve heard of any­way, and I’d be pretty shocked if you came up with a failure mode that did com­pete.

• Be­cause max­i­miz­ing a weighted sum of util­ity func­tions does not have any com­pa­rably con­vinc­ing failure modes.

You don’t think util­ity mon­ster is a com­pa­rably con­vinc­ing failure mode?

I think we just don’t have data one way or the other.

• Utility mon­ster isn’t a failure mode. It just messes with our in­tu­itions be­cause no one could imag­ine be­ing a util­ity mon­ster.

Edit: At the time I made this com­ment, the wikipe­dia ar­ti­cle on util­ity mon­sters in­cor­rectly stated that a util­ity mon­ster meant an agent that gets in­creas­ing marginal util­ity with re­spect to re­sources. Now that I know that a util­ity mon­ster means an agent that gets much more util­ity from re­sources than other agents do, my re­sponse is that you can mul­ti­ply the util­ity mon­ster’s util­ity func­tion by a small co­effi­cient, so that it no longer acts as a util­ity mon­ster.

• There’s some­thing a lit­tle redicu­lous about claiming that ev­ery mem­ber of a group prefers A to B, but that the group in ag­gre­gate does not pre­fer A to B.

That would look a bit like Simp­son’s para­dox ac­tu­ally.

• The situ­a­tion analo­gous to Simp­son’s para­dox can only oc­cur if for some rea­son we care about some peo­ple’s opinion more than oth­ers in some situ­a­tions (this is analo­gous to the situ­a­tion in Simp­son’s para­dox where we have more data points in some parts of the table than oth­ers. It is a nec­es­sary con­di­tion for the para­dox to oc­cur.)

For ex­am­ple: Sup­pose Alice (fe­male) val­ues a cure for prostate can­cer at 10 utils, and a cure for breast can­cer at 15 utils. Bob (male) val­ues a cure for prostate can­cer at 100 utils, and a cure for breast can­cer at 150 utils. Sup­pose that be­cause prostate can­cer largely af­fects men and breast can­cer largely af­fects women we value Alice’s opinion twice as much about breast can­cer and Bob’s opinion twice as much about prostate can­cer. Then in the ag­gre­gate cur­ing prostate can­cer is 210 utils and cur­ing breast can­cer 180 utils, a prefer­ence re­ver­sal com­pared to ei­ther of Alice or Bob.

• This is es­sen­tially just an ex­am­ple of Harsanyi’s The­o­rem in ac­tion. And I think it makes a com­pel­ling demon­stra­tion of why you should not pro­gram an AI in that fash­ion.

• can only oc­cur if for some rea­son we care about some peo­ple’s opinion more than oth­ers in some situations

Isn’t that the de­scrip­tion of an util­ity max­i­mizer (or op­ti­mizer) tak­ing into ac­count the prefer­ences of an util­ity mon­ster?

• To get the effect that we need an op­ti­miser that cares about some peo­ple’s opinion more about some things but then for some other things cares about some­one else’s opinion. If we just have a util­ity mon­ster who the op­ti­miser always val­ues more than oth­ers we can’t get the effect. The im­por­tant thing is that it some­times cares about one per­son and some­times cares about some­one else.

• I don’t see how it’s like Simp­son’s para­dox, ac­tu­ally. You want to go to Good Hospi­tal in­stead of Bad Hospi­tal even if more pa­tients who go to Good Hospi­tal die be­cause they get al­most the hard cases. Ag­gre­gat­ing only hides the in­for­ma­tion needed to make a prop­erly in­formed choice. Here, ag­gre­gat­ing doesn’t hide any in­for­ma­tion.

But there are a bunch of other ways things like that can hap­pen.

This very morn­ing I did a non­lin­ear curvefit on a bunch of re­peats of an ex­per­i­ment. One of the pa­ram­e­ters that came out had val­ues in the range −1 to +1. I com­bined the data sets di­rectly and that pa­ram­e­ter for the com­bined set came out around 5.

In a way, this anal­ogy may be even more di­rectly ap­pli­ca­ble than Simp­son’s para­dox. Even if A and B are com­plete speci­fi­ca­tions (un­like that pa­ram­e­ter, which was one of sev­eral), the in­ter­per­sonal re­ac­tions to other peo­ple can do some very non­lin­ear things to in­ter­pre­ta­tions of A and B.

• What if we also add a re­quire­ment that the FAI doesn’t make any­one worse off in ex­pected util­ity com­pared to no FAI? That seems rea­son­able, but con­flicts the other ax­ioms. For ex­am­ple, sup­pose there are two agents: A gets 1 util if 90% of the uni­verse is con­verted into pa­per­clips, 0 utils oth­er­wise, and B gets 1 util if 90% of the uni­verse is con­verted into sta­ples, 0 utils oth­er­wise. Without an FAI, they’ll prob­a­bly end up fight­ing each other for con­trol of the uni­verse, and let’s say each has 30% chance of suc­cess. An FAI that doesn’t make one of them worse off has to pre­fer a 5050 lot­tery of the uni­verse turn­ing into ei­ther pa­per­clips or sta­ples to a cer­tain out­come of ei­ther, but that vi­o­lates VNM ra­tio­nal­ity.

And things get re­ally con­fus­ing when we also con­sider is­sues of log­i­cal un­cer­tainty and dy­nam­i­cal con­sis­tency.

• What if we also add a re­quire­ment that the FAI doesn’t make any­one worse off in ex­pected util­ity com­pared to no FAI?

Sounds ob­vi­ously un­rea­son­able to me. E.g. a situ­a­tion where a per­son de­rives a large part of their util­ity from hav­ing kid­napped and en­slaved some­body else: the kid­nap­per would be made worse off if their slave was freed, but the slave wouldn’t be­come worse off if their slav­ery merely con­tinued, so...

• The way I said that may have been too much of a dis­trac­tion from the real prob­lem, which I’ll restate as: con­sid­er­a­tions of fair­ness, which may arise from bar­gain­ing or just due to fair­ness be­ing a ter­mi­nal value for some peo­ple, can im­ply that the most preferred out­come lies on a flat part of the Pareto fron­tier of fea­si­ble ex­pected util­ities, in which case such prefer­ences are not VNM ra­tio­nal and the re­sult de­scribed in the OP can’t be di­rectly ap­plied.

• What if we also add a re­quire­ment that the FAI doesn’t make any­one worse off in ex­pected util­ity com­pared to no FAI?

I don’t think that seems rea­son­able at all, es­pe­cially when some agents want to en­gage in mas­sively nega­tive-sum games with oth­ers (like those you de­scribe), or have mas­sively dis­crete util­ity func­tions that pre­vent them from com­pro­mis­ing with oth­ers (like those you de­scribe). I’m okay with some agents be­ing worse off with the FAI, if that’s the kind of agents they are.

Luck­ily, I think peo­ple, given time to re­flect and grown and learn, are not like that, which is prob­a­bly what made the idea seem rea­son­able to you.

• I’m okay with some agents be­ing worse off with the FAI, if that’s the kind of agents they are.

Do you see CEV as about al­tru­ism, in­stead of co­op­er­a­tion/​bar­gain­ing/​poli­tics? It seems to me the lat­ter is more rele­vant, since if it’s just about al­tru­ism, you could use CEV in­stead of CEV. So, if you don’t want any­one to have an in­cen­tive to shut down an FAI pro­ject, you need to make sure they are not made worse off by an FAI. Of course you could limit this to peo­ple who ac­tu­ally have the power to shut you down, but my point is that it’s not en­tirely up to you which agents the FAI can make worse off.

Luck­ily, I think peo­ple, given time to re­flect and grown and learn, are not like that

Right, this could be an­other way to solve the prob­lem: show that of the peo­ple you do have to make sure are not made worse off, their ac­tual val­ues (given the right defi­ni­tion of “ac­tual val­ues”) are such that a VNM-ra­tio­nal FAI would be suffi­cient to not make them worse off. But even if you can do that, it might still be in­ter­est­ing and pro­duc­tive to look into why VNM-ra­tio­nal­ity doesn’t seem to be “closed un­der bar­gain­ing”.

Also, sup­pose I per­son­ally (ac­cord­ing to my sense of al­tru­ism) do not want to make any­one among worse off by my ac­tions. Depend­ing on their ac­tual util­ity func­tions, it seems that my prefer­ences may not be VNM-ra­tio­nal. So maybe it’s not safe to as­sume that the in­puts to this pro­cess are VNM-ra­tio­nal ei­ther?

• Even if it’s about bar­gain­ing rather than about al­tru­ism, it’s still okay to have some­one worse off un­der the FAI just so long as they would not be able to pre­dict ahead of time that they wold get the short end of the stick. It’s pos­si­ble to have ev­ery­one benefit in ex­pec­ta­tion by cre­at­ing an AI that is will­ing to make some peo­ple (who hu­mans can­not pre­dict the iden­tity of ahead of time) worse off if it brings suffi­cient gain to the oth­ers.

• I agree with this, which is why I said “worse off in ex­pected util­ity” at the be­gin­ning of the thread. But I think you need “would not be able to pre­dict ahead of time” in a fairly strong sense, namely that they would not be able to pre­dict it even if they knew all the de­tails of how the FAI worked. Other­wise they’d want to adopt the con­di­tional strat­egy “learn more about the FAI de­sign, and try to shut it down if I learn that I will get the short end of the stick”. It seems like the eas­iest way to ac­com­plish this is to de­sign the FAI to ex­plic­itly not make cer­tain peo­ple worse off, rather than de­pend on that hap­pen­ing as a likely side effect of other de­sign choices.

• I ex­pect that with ac­tual peo­ple, in prac­tice, the FAI would leave no one worse off. But I wouldn’t want to hard­wire that into the FAI be­cause then its be­hav­ior would be too sta­tus quo-de­pen­dent.

• What do you think about Eliezer’s pro­posed solu­tion of mak­ing the FAI’s util­ity func­tion de­pend on a coin­flip out­come?

• It seems like too much of a hack, but maybe it’s not? Can you think of a gen­eral pro­ce­dure for ag­gre­gat­ing prefer­ences that would lead to such an out­come (and also leads to sen­si­ble out­comes in other cir­cum­stances)?

• It seems like too much of a hack, but maybe it’s not? Can you think of a gen­eral pro­ce­dure for ag­gre­gat­ing prefer­ences that would lead to such an out­come (and also leads to sen­si­ble out­comes in other cir­cum­stances)?

• Look­ing over my old emails, it seems that my email on Jan 21, 2011 pro­posed a solu­tion to this prob­lem. Namely, if the agents can agree on a point on the Pareto fron­tier given their cur­rent state of knowl­edge (e.g. the point where agent A and agent B each have 50% prob­a­bil­ity of win­ning), then they can agree on a pro­ce­dure (pos­si­bly in­volv­ing coin­flips) whose re­sult is guaran­teed to be a Bayesian-ra­tio­nal merged agent, and the pro­ce­dure yields the speci­fied ex­pected util­ities to all agents given their cur­rent state of knowl­edge. Though you didn’t re­ply to that email, so I guess you found it un­satis­fac­tory in some way...

• I must not have been pay­ing at­ten­tion to the de­ci­sion the­ory mailing list at that time. Think­ing it over now, I think tech­ni­cally it works, but doesn’t seem very satis­fy­ing, be­cause the in­di­vi­d­ual agents jointly have non-VNM prefer­ences, and are hav­ing to do all the work to pick out a spe­cific mixed strat­egy/​out­come. They’re then us­ing a coin-flip + VNM AI just to reach that spe­cific out­come, with­out the VNM AI ac­tu­ally em­body­ing their joint prefer­ences.

To put it an­other way, if your prefer­ences can only be im­ple­mented by pick­ing a VNM AI based on a coin flip, then your prefer­ences are not VNM ra­tio­nal. The fact that any point on the Pareto fron­tier can be reached by a coin-flip + VNM AI seems more like a dis­trac­tion to try­ing to figure how to get an AI to cor­rectly em­body such prefer­ences.

• What do you mean when you say the agents “jointly have non-VNM prefer­ences”? Is there a defi­ni­tion of joint prefer­ences?

• Have you looked at some of the more re­cent pa­pers in this liter­a­ture (which gen­er­ally have a lot more nega­tive re­sults than pos­i­tive ones)? For ex­am­ple Prefer­ence ag­gre­ga­tion un­der un­cer­tainty: Sav­age vs. Pareto? I haven’t paid too much at­ten­tion to this liter­a­ture my­self yet, be­cause the so­cial ag­gre­ga­tion re­sults seem pretty sen­si­tive to de­tails of the as­sumed in­di­vi­d­ual de­ci­sion the­ory, which is still pretty un­set­tled. (Oh, I men­tioned an­other pa­per here.)

• Sub­jec­tive un­cer­tainty doesn’t seem par­tic­u­larly rele­vant to Friendly AI, since the FAI could come up with a more ac­cu­rate prob­a­bil­ity es­ti­mate than ev­ery­one else, and ax­iom 2 could re­fer to what ev­ery­one would want if they knew the prob­a­bil­ities as well as the FAI did. Do you have any ex­am­ples of un­de­sir­able effects of the Pareto prop­erty that do not in­volve sub­jec­tive un­cer­tainty, or do you think sub­jec­tive un­cer­tainty is more im­por­tant than I think it is?

• do you think sub­jec­tive un­cer­tainty is more im­por­tant than I think it is?

I’m not sure. It prob­a­bly de­pends on what “pri­ors” re­ally are and/​or whether peo­ple have com­mon pri­ors. I have a cou­ple of posts that ex­plain these prob­lems a bit more. But it does seem quite pos­si­ble that the more re­cent re­sults in the Bayesian ag­gre­ga­tion liter­a­ture aren’t re­ally rele­vant to FAI.

• Ac­cu­rate prob­a­bil­ity es­ti­mate is a bit of oxy­moron for any­thing other than cer­tain class of prob­lems where you have ob­jec­tive prob­a­bil­ity as a prop­erty of a non-lin­ear sys­tem that has cer­tain sym­me­tries (e.g. die that bounces enough times).

• I’d be cu­ri­ous to see some­one re­ply to this on be­half of par­li­a­men­tary mod­els, whether ap­plied to prefer­ence ag­gre­ga­tion or to moral un­cer­tainty be­tween differ­ent con­se­quen­tial­ist the­o­ries. Do the choices of a par­li­a­ment re­duce to max­i­miz­ing a weighted sum of util­ities? If not, which ax­iom out of 1-3 do par­li­a­men­tary mod­els vi­o­late, and why are they vi­able de­spite vi­o­lat­ing that ax­iom?

• Can you be more spe­cific about what you mean by a par­li­a­men­tary model? (If I had to guess, though, ax­iom 1.)

• This and mod­els similar to it.

• In­ter­est­ing. A par­li­a­men­tary model ap­plied to moral un­cer­tainty definitely fails ax­iom 1 if any of the moral the­o­ries you’re ag­gre­gat­ing isn’t VNM-ra­tio­nal. It prob­a­bly still fails ax­iom 1 even if all of the in­di­vi­d­ual moral the­o­ries are VNM-ra­tio­nal be­cause the en­tire par­li­a­ment is prob­a­bly not VNM-ra­tio­nal. That’s okay from Bos­tom’s point of view be­cause VNM-ra­tio­nal­ity could be one of the things you’re un­cer­tain about.

• What if it is not, in fact, one of the things you’re un­cer­tain about?

• Then I am not sure, be­cause that blog post hasn’t speci­fied the model pre­cisely enough for me to do any math, but my guess would be that the par­li­a­ment fails to be VNM-ra­tio­nal. Depend­ing on how the bar­gain­ing mechanism is set up, it might even fail to have co­her­ent prefer­ences in the sense that it might not always make the same choice when pre­sented with the same pair of out­comes…

• An ad­van­tage of par­li­a­men­tary mod­els is that you don’t have to know the util­ity func­tions of the in­di­vi­d­ual agents, but can just use them as black boxes that out­put de­ci­sions. This is use­ful for han­dling moral un­cer­tainty when you don’t know how to en­code all the eth­i­cal the­o­ries you’re un­cer­tain about as util­ity func­tions over the same on­tol­ogy.

Do the choices of a par­li­a­ment re­duce to max­i­miz­ing a weighted sum of util­ities?

Let’s say the par­li­a­ment makes a Pareto op­ti­mal choice, in which case that choice is also made by max­i­miz­ing some weighted sum of util­ities (putting aside the coin flip is­sue). But the par­li­a­ment doesn’t re­duce to max­i­miz­ing that weighted sum of util­ities, be­cause the com­pu­ta­tion be­ing done is likely very differ­ent. Say­ing that ev­ery method of mak­ing Pareto op­ti­mal choices re­duces to max­i­miz­ing a weighted sum of util­ities would be like say­ing that ev­ery com­pu­ta­tion that out­puts an in­te­ger greater than 1 re­duces to mul­ti­ply­ing a set of prime num­bers.

• Ax­iom two re­minds me of Simp­son’s para­dox. I’m not sure how ap­pli­ca­ble it is, but I wouldn’t be all that sur­prised so find an ex­pla­na­tion that a vi­o­la­tion of it this ax­iom perfectly rea­son­able. I don’t sup­pose you have a set of more ob­vi­ous ax­ioms you could work with.

• See my re­ply to 615C68A6.

• There is no re­la­tion to Simp­son’s para­dox. In Simp­son’s para­dox, each of the data points comes from the same one-di­men­sional x-axis, so as you keep in­creas­ing x, you can run through all the data points in one group, go out the other side, and then get to an­other group of data points. In prefer­ence ag­gre­ga­tion, there is no analo­gous mean­ingful way to run through one agent con­sid­er­ing each pos­si­ble state of the uni­verse, keep go­ing, and get to an­other agent con­sid­er­ing each pos­si­ble state of the uni­verse.

• Good point. More rele­vantly, Simp­son’s para­dox re­lies on differ­ent groups con­tain­ing differ­ent val­ues of the in­de­pen­dent vari­able. If each group con­tains each in­de­pen­dent vari­able in equal mea­sure, Simp­son’s para­dox can­not oc­cur. The analogue of this in de­ci­sion the­ory would be the prob­a­bil­ity dis­tri­bu­tion over out­comes. So if each agent has differ­ent be­liefs about what A and B are, then it makes sense that ev­ery­one could pre­fer A over B but the FAI prefers B, but that’s be­cause the FAI has bet­ter in­for­ma­tion, and knows that at least some peo­ple would pre­fer B if they had bet­ter in­for­ma­tion about what the op­tions con­sisted of. If ev­ery­one would pre­fer A over B given the FAI’s be­liefs, then that rea­son goes away, and the FAI should choose A. This lat­ter situ­a­tion is the one mod­eled in the post, and the former does not seem par­tic­u­larly rele­vant, since there’s no point in ask­ing which op­tion some­one prefers given bad in­for­ma­tion if you could also ap­ply their util­ity func­tion to a bet­ter-in­formed es­ti­mate of the prob­a­bil­ities in­volved.

• I don’t see how I could agree with this con­clu­sion :

But many peo­ple don’t like this, usu­ally for rea­sons in­volv­ing util­ity mon­sters. If you are one of these peo­ple, then you bet­ter learn to like it, be­cause ac­cord­ing to Harsanyi’s So­cial Ag­gre­ga­tion The­o­rem, any al­ter­na­tive can re­sult in the sup­pos­edly Friendly AI mak­ing a choice that is bad for ev­ery mem­ber of the pop­u­la­tion.

If both ways are wrong, then you haven’t tried hard enough yet.

Well ex­plained though.

• The So­cial Ag­gre­ga­tion The­o­rem doesn’t just show that some par­tic­u­lar way of ag­gre­gat­ing util­ity func­tions other than by lin­ear com­bi­na­tion is bad; it shows that ev­ery way of ag­gre­gat­ing util­ity func­tions other than by lin­ear com­bi­na­tion is bad.

• Great post! I wish Harsanyi’s pa­pers were bet­ter known amongst philoso­phers.

• Thanks for post­ing this! This is a fairly satis­fy­ing an­swer to my ques­tion from be­fore.

Can you clar­ify which peo­ple you want to ap­ply this the­o­rem to? I don’t think the rele­vant peo­ple should be the set of all hu­mans al­ive at the time that the FAI de­cides what to do be­cause this pop­u­la­tion is not fixed over time and doesn’t have fixed util­ity func­tions over time. I can think of situ­a­tions where I would want the FAI to make a de­ci­sion that all hu­mans al­ive at a fixed time would dis­agree with (for ex­am­ple, sup­pose most hu­mans die and the only ones left hap­pen to be amoral sav­ages), and I also have no idea how to deal with chang­ing pop­u­la­tions with chang­ing util­ity func­tions in gen­eral.

So it seems the FAI should be ag­gre­gat­ing the prefer­ences of a fixed set of peo­ple for all time. But this also seems prob­le­matic.

• Can you clar­ify which peo­ple you want to ap­ply this the­o­rem to?

I’m not en­tirely sure. My de­fault an­swer to that is “all peo­ple al­ive at the time that the sin­gu­lar­ity oc­curs”, al­though you pointed out a pos­si­ble draw­back to that (it in­cen­tivizes peo­ple to cre­ate more peo­ple with val­ues similar to their own) in our pre­vi­ous dis­cus­sion. This is re­ally an in­stru­men­tal ques­tion: What set of peo­ple should I sug­gest get to have their util­ity func­tions ag­gre­gated into the CEV so as to best max­i­mize my util­ity? One pos­si­ble an­swer is to ag­gre­gate the util­ities of ev­ery­one who worked on or sup­ported the FAI pro­ject, but I sus­pect that due to the in­fluence of far think­ing, that would ac­tu­ally be a ter­rible way to mo­ti­vate peo­ple to work on FAI, and it should ac­tu­ally be much broader than that.

So it seems the FAI should be ag­gre­gat­ing the prefer­ences of a fixed set of peo­ple for all time. But this also seems prob­le­matic.

I don’t think it would be ter­ribly prob­le­matic. “Peo­ple in the fu­ture should get ex­actly what we cur­rently would want them to get if we were perfectly wise and knew their val­ues and cir­cum­stances” seems like a pretty good rule. It is, af­ter all, what we want.

• My de­fault an­swer to that is “all peo­ple al­ive at the time that the sin­gu­lar­ity oc­curs”, al­though you pointed out a pos­si­ble draw­back to that (it in­cen­tivizes peo­ple to cre­ate more peo­ple with val­ues similar to their own) in our pre­vi­ous dis­cus­sion.

And also in­cen­tivizes peo­ple to kill peo­ple with val­ues dis­similar to their own!

I don’t think it would be ter­ribly prob­le­matic. “Peo­ple in the fu­ture should get ex­actly what we cur­rently would want them to get if we were perfectly wise and knew their val­ues and cir­cum­stances” seems like a pretty good rule. It is, af­ter all, what we want.

Fair enough. Hmm.

• And also in­cen­tivizes peo­ple to kill peo­ple with val­ues dis­similar to their own!

That’s a pretty good nail in the coffin. Maybe all peo­ple al­ive at the time of your com­ment. Or at any point in some in­ter­val con­tain­ing that time, pos­si­bly in­clud­ing up to the time the sin­gu­lar­ity oc­curs. Although again, these are crude guesses, not fi­nal sug­ges­tions. This might be a good ques­tion to think more about.

• That’s a pretty good nail in the coffin.

It’s not as bad as it sounds. Both ar­gu­ments are also ar­gu­ments against democ­racy, but I don’t think they’re knock­down ar­gu­ments against democ­racy (al­though the gen­eral point that democ­racy can be gamed by brain­wash­ing enough peo­ple is good to keep in mind, and I think is a point that Mold­bug, for ex­am­ple, is quite pre­oc­cu­pied with). For ex­am­ple, kil­ling peo­ple doesn’t ap­pear to be a vi­able strat­egy for gain­ing con­trol of the United States at the mo­ment. Although the kil­ling-peo­ple strat­egy in the FAI case might look more like “the US de­cides to nuke Rus­sia im­me­di­ately be­fore the sin­gu­lar­ity oc­curs.”

• For ex­am­ple, kil­ling peo­ple doesn’t ap­pear to be a vi­able strat­egy for gain­ing con­trol of the United States at the mo­ment.

Per­haps not, but it might help main­tain con­trol of the USG in­so­far as pop­u­lar­ity in­creases the chances of re­elec­tion and kil­ling (cer­tain) peo­ple in­creases pop­u­lar­ity.

• Dumb solu­tion: an FAI could have a sense of jus­tice which down­weights the util­ity func­tion of peo­ple who are kil­ling and/​or pro­cre­at­ing to game their rep­re­sen­ta­tion in AI’s util­ity func­tion, or some­thing like that do dis­in­cen­tivize it. (It’s dumb be­cause I don’t know how to op­er­a­tional­ize jus­tice; maybe enough peo­ple would not cheat and want to pun­ish the cheaters that the FAI would figure that out.)

Also, given what we mostly be­lieve about moral progress, I think defin­ing moral­ity in terms of the CEV of all peo­ple who ever lived is prob­a­bly okay… they’d prob­a­bly learn to dis­like slav­ery in the AI’s simu­la­tion of them.

• A Friendly AI would have to be able to ag­gre­gate each per­son’s prefer­ences into one util­ity func­tion. The most straight­for­ward and ob­vi­ous way to do this is to agree on some way to nor­mal­ize each in­di­vi­d­ual’s util­ity func­tion, and then add them up. But many peo­ple don’t like this, usu­ally for rea­sons in­volv­ing util­ity mon­sters.

I should think most of those who don’t like it do so be­cause their val­ues would be bet­ter rep­re­sented by other ap­proaches. A lot of those in­volved in the is­sue think they de­serve more than a on-in-seven-billionth share of the fu­ture—and so pur­sue ap­proaches that will help to de­liver them that. This prob­a­bly in­cludes most of those with the skills to cre­ate such a fu­ture, and most of those with the re­sources to help fund them.

• They could just in­sist on a nor­mal­iza­tion scheme that is blatantly bi­ased in fa­vor of their util­ity func­tion. In a the­o­ret­i­cal sense, this doesn’t cause a prob­lem, since there is no ob­jec­tive way to define an un­bi­ased nor­mal­iza­tion any­way. (of course, if ev­ery­one in­sisted on bi­as­ing the nor­mal­iza­tion in their fa­vor, there would be a prob­lem)

• I think most of those in­volved re­al­ise that such pro­jects tend to be team efforts—and there­fore some com­pro­mises over val­ues will be nec­es­sary. Any­way, I think this is the main difficulty for util­i­tar­i­ans: most peo­ple are not re­motely like util­i­tar­i­ans—and so don’t buy into their bizarre ideas about what the fu­ture should be like.

• Be­ing fair is not, in gen­eral, a VNM-ra­tio­nal thing to do.

Sup­pose you have an in­di­visi­ble slice of pie, and six peo­ple who want to eat it. The fair out­come would be to roll a die to de­ter­mine who gets the pie. But this is a prob­a­bil­is­tic mix­ture of six de­ter­minis­tic out­comes which are equally bad from a fair­ness point of view.

Prefer­ring a lot­tery to any of its out­comes is not VNM-ra­tio­nal (pretty sure it vi­o­lates in­de­pen­dence, but in any case it’s not max­i­miz­ing ex­pected util­ity).

We can make this stronger by sup­pos­ing some peo­ple like pie more than oth­ers (but all of them still like pie). Now the lot­tery is strictly worse than giv­ing the pie to the one who likes pie the most.

Although the re­sult is still in­ter­est­ing, I think most prefer­ence ag­gre­ga­tors vi­o­late Ax­iom 1, rather than Ax­iom 2, and this is not in­her­ently hor­rible.

• I’m pretty sure it’s pos­si­ble to reach the same con­clu­sion by re­mov­ing the re­quire­ment that the ag­gre­ga­tion be VNM-ra­tio­nal and strength­en­ing ax­iom 2 to say that the ag­gre­ga­tion must be Pareto-op­ti­mal with re­spect to all prior prob­a­bil­ity dis­tri­bu­tions over choices the ag­gre­ga­tion might face. That is, “given any prior prob­a­bil­ity dis­tri­bu­tion over pairs of gam­bles the ag­gre­ga­tion might have to choose be­tween, there is no other pos­si­ble ag­gre­ga­tion that would be bet­ter for ev­ery agent in the pop­u­la­tion in ex­pec­ta­tion.” It’s pos­si­ble we could even reach the same con­clu­sion just by us­ing some such prior dis­tri­bu­tion with cer­tain prop­er­ties, in­stead of all such dis­tri­bu­tions.

• I don’t un­der­stand what your strength­ened ax­iom means. Could you give an ex­am­ple of how, say, the take-the-min-of-all-ex­pected-util­ities ag­gre­ga­tion fails to satisfy it?

(Or if it doesn’t I sup­pose it would be a coun­terex­am­ple, but I’m not in­sist­ing on that)

• Lets say there are 3 pos­si­ble out­comes: A, B, and C, and 2 agents: x and y. The util­ity func­tions are x(A)=0, x(B)=1, x(C)=4, y(A)=4, y(B)=1, y(C)=0.

One pos­si­ble prior prob­a­bil­ity dis­tri­bu­tion over pairs of gam­bles is that there is a 50% chance that the ag­gre­ga­tion will be asked to choose be­tween A and B, and a 50% chance that the ag­gre­ga­tion will be asked to choose be­tween B and C (in this sim­plified case, all the an­ti­ci­pated “gam­bles” are ac­tu­ally cer­tain out­comes). Your max­imin ag­gre­ga­tion would choose B in each case, so both agents an­ti­ci­pate an ex­pected util­ity of 1. But the ag­gre­ga­tion that max­i­mizes the sum of each util­ity func­tion would choose A in the first case and C in the sec­ond, and each agent would an­ti­ci­pate an ex­pected util­ity of 2. Since both agents could agree that this ag­gre­ga­tion is bet­ter than max­imin, max­imin is not Pareto op­ti­mal with re­spect to that prob­a­bil­ity dis­tri­bu­tion.

Upvoted for sug­gest­ing a good ex­am­ple. I had sus­pected my ex­pla­na­tion might be con­fus­ing, and I should have thought to in­clude an ex­am­ple.

• Thank you, I un­der­stand it now.

• I won­der how hard it would be to self-mod­ify prior to the im­po­si­tion of the sort of regime dis­cussed here to be a counter-fac­tual util­ity mon­ster (along the lines of “I pre­fer X if Z and pre­fer not-X if not-Z”) who very very much wants to be (and thus be­comes?) an ac­tual util­ity mon­ster iff be­ing a util­ity mon­ster is re­warded. If this turns out to be easy then it seems like the odds of this already hav­ing hap­pened in se­cret be­fore the im­po­si­tion of the util­ity-mon­ster-re­ward­ing-regime would need to be taken into ac­count by those con­tem­plat­ing the im­po­si­tion.

It would be ironic if the regime was launched, and in the course of sur­vey­ing prefer­ences at its out­set they dis­cov­ered the counter-fac­tual util­ity mon­ster’s “moral booby-trap” and be­came its hostages. Story idea! Some­one launches a sim­ple prefer­ence ag­gre­ga­tion regime and they dis­cover a moral booby-trap and are hor­rified at what is likely to hap­pen when the sur­vey ends and the regime gets down to busi­ness… then they dis­cover a sec­ond counter-fac­tual util­ity mon­ster booby trap lurk­ing in some­one’s head that was de­signed with the naive booby traps in mind and so thwarts it. The sec­ond mon­ster also man­ages to have room in their func­tion to grant “util­ity mon­ster em­pa­thy sops” to the launch­ers of the regime and they are over­joyed that some­one man­aged to save them from their own ini­tial hubris, even though they would have been hor­rified if they had only dis­cov­ered the non-naive mon­ster with no naive mon­ster to serve as a con­trast ob­ject. Utility for ev­ery­one but the naive mon­ster: happy end­ing!

• Lin­early com­bin­ing util­ity func­tions does not force you to re­ward util­ity mon­sters. It just forces you to ei­ther be will­ing to sac­ri­fice large amounts of oth­ers’ util­ity for ex­tremely large amounts of util­ity mon­ster util­ity, or be un­will­ing to sac­ri­fice small amounts of oth­ers’ util­ity for some­what large amounts of util­ity mon­ster util­ity in the same ra­tio. The nor­mal­iza­tion scheme could re­quire the range of all nor­mal­ized util­ity func­tions to fit within cer­tain bounds.

• Does the the­o­rem say any­thing about the sign of the c_k? Will they always all be pos­i­tive? Will they always all be non-nega­tive?

• Un­der Harsanyi’s origi­nal ax­ioms, you can­not say any­thing about the signs of the co­effi­cients. My ax­ioms are slightly stronger, but I think still not quite enough. How­ever, if you make the even stronger (but still rea­son­able, I think) as­sump­tion that the agents’ util­ity func­tions are lin­early in­de­pen­dent, then you can prove that all of the co­effi­cients are non-nega­tive. This is be­cause the lin­ear in­de­pen­dence al­lows you cre­ate situ­a­tions where each agent prefers A to B by ar­bi­trar­ily speci­fi­able rel­a­tive amounts. As in, for all agents k, we can cre­ate choices A and B such that ev­ery agent prefers A to B, but the mar­gin by which ev­ery agent other than k prefers A to B is ar­bi­trar­ily small com­pared to the mar­gin by which k prefers A to B, so since FAI prefers A to B, c_k must be non­nega­tive.

• Thanks for writ­ing this up!

• It is worth men­tion­ing that Rawl’s later Veil of Ig­no­rance forces him to satisfy Harsanyi’s ax­ioms and Rawl’s con­clu­sions are a math er­ror.

• Harsanyi’s ax­ioms seem self-ev­i­dently de­sir­able on their own. I didn’t claim that they were a con­se­quence of the Veil of Ig­no­rance.

• Edit: con­clu­sion here. I mis­in­ter­preted ax­iom 2 as weaker than it is; I now agree that the ax­ioms im­ply the re­sult (though I in­ter­pret the re­sult some­what differ­ently).

I don’t think you can make the broad anal­ogy be­tween what you’re do­ing and what Harsanyi did that you’re try­ing to make.

Harsanyi’s pos­tu­late D is do­ing most of the work. Let’s re­place it with pos­tu­late D’: if at least two in­di­vi­d­u­als pre­fer situ­a­tion X to situ­a­tion Y, and none of the other in­di­vi­d­u­als pre­fer Y to X, then X is preferred to Y from a so­cial stand­point.

D’ is weaker; the weighted sum of util­ities satis­fies it. But is it pos­si­ble for an­other so­cial welfare func­tion to satisfy it? We’ll need our new method to satisfy pos­tu­lates A, B, and C.

Con­sider three in­di­vi­d­u­als; Alice, Bob, and Char­lie. There are four pos­si­ble out­comes; W, X, Y, and Z. Alice’s util­ities are (0,0,1,1). Bob’s util­ities are (0,1,0,1). Char­lie’s util­ities are (0,1,1,1). We no­tice that the so­cial welfare func­tion U=(0,1,1,1) satis­fies D’ but not D, and satis­fies A, B, and C. If we con­struct a lin­ear com­bi­na­tion of Alice’s, Bob’s, and Char­lie’s util­ity func­tions, say by an equal weight­ing, we get V=(0,2,2,3), which satis­fies D (and D’). Note the differ­ence is that U does not re­spect Bob’s prefer­ence for Z over Y, when Alice and Char­lie are in­differ­ent, or Alice’s prefer­ence for Z over X, when Bob and Char­lie are in­differ­ent, whereas V does re­spect those prefer­ences.

I haven’t done any ex­plo­ra­tion yet on if we can con­struct so­cial welfare func­tions that satisfy D’ and seem rea­son­able in un­cer­tain situ­a­tions, but that ex­am­ple should be enough to demon­strate that a slight weak­en­ing of D de­stroys the re­sult for cer­tain situ­a­tions.

I should also note that Harsanyi’s E is nar­rowly writ­ten, which makes sense given the strong D. If you weaken D to D’, you could smug­gle the full strength of D back in by strength­en­ing E to some E*, but if you leave it as cov­er­ing the nar­row situ­a­tion that it cur­rently does, or cor­re­spond­ingly weaken it to some E’, or leave it out en­tirely, then there’s noth­ing to worry about. (U triv­ially satis­fies E be­cause there’s only one dis­agree­ment.)

Your Ax­iom 2 is much, much weaker than my D’; if D’ is enough to re­move the jus­tifi­ca­tion for a lin­ear weight­ing, then I don’t be­lieve that your Ax­iom 2 is enough to jus­tify the lin­ear weight­ing. To be clearer: yes, lin­ear com­bi­na­tions satisfy weaker ver­sions of the ax­ioms, but the power of Harsanyi is the claim that only lin­ear com­bi­na­tions satisfy the ax­ioms. When you weaken the ax­ioms, you al­low other func­tions that also do the job. (Note that T=(0,0,0,1) satis­fies Ax­iom 2, but not D’, at least for cer­tainty.)

Now that I’ve started think­ing about prob­a­bil­ity, note that Ax­iom 1 only con­strains prob­a­bil­is­tic be­hav­ior for each agent sep­a­rately. You need pos­tu­lates like he in­tro­duces in sec­tion III to make them agree on gam­bles, and I don’t think weak pos­tu­lates there will get very far, but I’ll have to spend more time think­ing about that.

(Hope­fully that’s the last of my ed­its, for now at least.)

• You were look­ing at Harsanyi’s ex­pla­na­tion of a pre­vi­ous, similar the­o­rem by Flem­ing, in sec­tion II of his pa­per. He proves the the­o­rem I ex­plained in the post in sec­tion III.

My ax­iom 2 was meant to in­clude de­ci­sions in­volv­ing un­cer­tainty, like Harsanyi’s pos­tu­lates but un­like Flem­ing’s pos­tu­lates. Sorry if I did not make that clear.

• Your ax­ioms be­ing meant to in­clude some­thing doesn’t mean that they in­clude some­thing! Your ax­ioms do not im­ply Harsanyi’s, and so your proof is fatally flawed.

Right now, Ax­iom 1 only means that each agent needs to in­di­vi­d­u­ally have a scor­ing sys­tem for out­comes which satis­fies the VNM ax­ioms (ba­si­cally, you can map out­comes to re­als, and those re­als en­code prob­a­bil­is­tic prefer­ences). Ax­iom 2 is re­ally weak, and Ax­iom 3 is re­ally weak. My so­cial util­ity of T satis­fies Ax­ioms 1, 2 and 3 for Alice, Bob, and Char­lie:

1. Alice, Bob, and Char­lie each have util­ity func­tions and T is a util­ity func­tion. Agents make prob­a­bil­is­tic gam­bles ac­cord­ingly.

2. If all of Alice, Bob, and Char­lie pre­fer an out­come to an­other out­come, then so does T. (The only ex­am­ple of this is the prefer­ence for Z over W.)

3. There is an ex­am­ple where Alice, Bob, and Char­lie all share a prefer­ence: Z over W.

Note that ev­ery agent is in­differ­ent be­tween W and W, and that ev­ery agent prefers Z to W. We com­pare the gam­bles pZ+(1-p)W and pW+(1-p)W, and note that T satis­fies the prop­erty that the first gam­ble is preferred to the sec­ond gam­ble for ar­bi­trar­ily small p (as the util­ity of the first gam­ble is p, and the util­ity of the sec­ond gam­ble is 0), and that T is in­differ­ent be­tween them for p=0.

T, of course, is not a lin­ear com­bi­na­tion of Alice, Bob, and Char­lie’s util­ity func­tions.

Is T a coun­terex­am­ple to your the­o­rem? If not, why not?

• T does not satisfy ax­iom 2 be­cause Alice, Bob, and Char­lie all pre­fer the gam­ble .5X+.5Y over W, but T is in­differ­ent be­tween .5X+.5Y and W. As I said, ax­iom 2 in­cludes de­ci­sions un­der un­cer­tainty. The VNM ax­ioms don’t even make a dis­tinc­tion be­tween known out­comes and gam­bles, so I as­sumed that would be un­der­stood as the de­fault.

• Ah! I was in­ter­pret­ing “choice” as “out­come,” rather than “prob­a­bil­is­tic com­bi­na­tion of out­comes,” and with the lat­ter in­ter­pre­ta­tion ax­iom 2 be­comes much stronger.

I still don’t think it’s strong enough, though: it ap­pears to me that U still satis­fies ax­iom 2, de­spite not be­ing a lin­ear com­bi­na­tion of the util­ities. As well, if I add Di­ana, who has util­ity (0,0,0,1), then T ap­pears to serve as a so­cial welfare func­tion for Alice, Bob, Char­lie, and Di­ana.

I should note that I sus­pect that’s a gen­eral failure mode: take any util­ity func­tion, and add an agent to the pool who has that util­ity func­tion. That agent is now a can­di­date for the so­cial welfare func­tion, as it now satis­fies the first two ax­ioms and might satisfy the third. (Alter­na­tively, ap­point any agent already in the pool as the so­cial welfare func­tion; the first two ax­ioms will be satis­fied, and the third will be un­changed.)

• I should note that I sus­pect that’s a gen­eral failure mode: take any util­ity func­tion, and add an agent to the pool who has that util­ity func­tion. That agent is now a can­di­date for the so­cial welfare func­tion, as it now satis­fies the first two ax­ioms and might satisfy the third. (Alter­na­tively, ap­point any agent already in the pool as the so­cial welfare func­tion; the first two ax­ioms will be satis­fied, and the third will be un­changed.)

That is cor­rect. But in a case like that, the ag­gre­gate util­ity func­tion is a lin­ear com­bi­na­tion of the origi­nal util­ity func­tions where all but one of the co­effi­cients are 0. Be­ing a lin­ear com­bi­na­tion of util­ity func­tions is not a strong enough re­quire­ment to rule out all bad ag­gre­ga­tions.

• First, thanks for your pa­tience.

Con­clu­sion: I don’t agree with Harsanyi’s claim that the lin­ear com­bi­na­tion of util­ity func­tions is unique up to lin­ear trans­for­ma­tions. I agree it is unique up to af­fine trans­for­ma­tions, and the dis­crep­ancy be­tween my state­ment and his is ex­plained by his com­ment “on the un­der­stand­ing that the zero point of the so­cial welfare func­tion is ap­pro­pri­ately cho­sen.” (Why he didn’t ex­plic­itly gen­er­al­ize to af­fine trans­for­ma­tions is be­yond me.)

I don’t think the claim “the util­ity func­tion can be ex­pressed as a lin­ear com­bi­na­tion of the in­di­vi­d­ual util­ity func­tions” is par­tic­u­larly mean­ingful, be­cause it just means that the ag­gre­gated util­ity func­tion must ex­ist in the space spanned by the in­di­vi­d­ual util­ity func­tions. I’d restate it as:

If the ag­gre­ga­tor in­tro­duces new val­ues not shared by hu­mans, it is will­ing to trade hu­man val­ues to get them, and thus is not a friendly ag­gre­ga­tor.

(Be­cause, as per VNM, all val­ues are com­pa­rable.) Also, note that this might not be a nec­es­sary con­di­tion for friendli­ness, but it is a nec­es­sary con­di­tion for ax­iom 2-ness.

Notes:

I’ve been rep­re­sent­ing the util­ities as vec­tors, and it seems like mov­ing to lin­ear alge­bra will make this dis­cus­sion much cleaner.

Sup­pose the util­ity vec­tor for an in­di­vi­d­ual is a row vec­tor. We can com­bine their prefer­ences into a ma­trix P=[A;B;C].

In or­der to make a coun­terex­am­ple, we need a row vec­tor S which 1) is lin­early in­de­pen­dent of P, that is, rank[P;S] =/​= rank[P]. Note that if P has rank equal to the num­ber of out­comes, this is im­pos­si­ble; all util­ity func­tions can be ex­pressed as lin­ear com­bi­na­tions. In our par­tic­u­lar ex­am­ple, the rank of P is 3, and there are 4 out­comes, so S=null[P]=[-1,0,0,0], and we can con­firm that rank[P;S]=4. (Note that for this nu­mer­i­cal ex­am­ple, S is equiv­a­lent to a af­finely trans­formed C, but I’m not sure if this is gen­eral.)

We also need S to 2) satisfy any prefer­ences shared by all mem­bers of P. We can see gam­bles as column vec­tors, with each el­e­ment be­ing the prob­a­bil­ity that a gam­ble leads to a par­tic­u­lar out­come; all val­ues should be pos­i­tive and sum to one. We can com­pare gam­bles by sub­tract­ing them; A*x-A*y gives us the amount that A prefers x to y. Fol­low­ing Harsanyi, we’ll make it share in­differ­ences; that is, if A*(x-y)=0, then A is in­differ­ent be­tween x and y, and if P*(x-y) is a zero column vec­tor, then all mem­bers of the pop­u­la­tion are in­differ­ent.

Let z=(x-y), and note that P*z=0 is the null space of P, which we used ear­lier to iden­tify a can­di­date S, be­cause we knew in­cor­po­rat­ing one of the vec­tors of the null space would in­crease the rank. We need S*z=0 for it to be in­differ­ent when P is in­differ­ent; this re­quires that the null space of P have at least two di­men­sions. (So three in­de­pen­dent agents ag­gre­gated in four di­men­sions isn’t enough!)

We also need the sum of z to be zero for it to count as a com­par­i­son be­tween gam­bles, which is equiv­a­lent to [1,1,1,1,1]*z=0. If we get lucky, this oc­curs nor­mally, but we’re not guaran­teed two differ­ent gam­bles that all mem­bers of the pop­u­la­tion are in­differ­ent be­tween. If we have a null space of at least three di­men­sions, then that is guaran­teed to hap­pen, be­cause we can toss the ones vec­tor in as an­other row to en­sure that all the vec­tors re­turned by null sum to 0.

So, if the null space of P is at least 2-di­men­sional, we can con­struct a so­cial welfare func­tion that shares in­differ­ences, and if the null space of P is at least 3-di­men­sional, those in­differ­ences are guaran­teed to ex­ist. But shar­ing prefer­ences is a bit tougher- we need ev­ery case where P*z>0 to re­sult in S*z>0. Since z=x-y, we have the con­straint that the sum of z’s el­e­ments must add up to 0, which makes things weirder, since it means we need to con­sider at least two el­e­ments at once.

So it’s not clear to me yet that it’s im­pos­si­ble to con­struct S which shares prefer­ences and is lin­early in­de­pen­dent, but I also haven’t gen­er­ated a con­struc­tive method to do so in gen­eral.

• I don’t agree with Harsanyi’s claim that the lin­ear com­bi­na­tion of util­ity func­tions is unique up to lin­ear trans­for­ma­tions. I agree it is unique up to af­fine trans­for­ma­tions, and the dis­crep­ancy be­tween my state­ment and his is ex­plained by his com­ment “on the un­der­stand­ing that the zero point of the so­cial welfare func­tion is ap­pro­pri­ately cho­sen.” (Why he didn’t ex­plic­itly gen­er­al­ize to af­fine trans­for­ma­tions is be­yond me.)

I’m not quite sure what you mean. Are you talk­ing about the fact that you can add a con­stant to util­ity func­tion with­out chang­ing any­thing im­por­tant, but that a con­stant is not nec­es­sar­ily a lin­ear com­bi­na­tion of the util­ity func­tions to be ag­gre­gated? For that rea­son, it might be best to im­plic­itly in­clude the con­stant func­tion in any set of util­ity func­tions when talk­ing about whether or not they are lin­early in­de­pen­dent; oth­er­wise you can change the an­swer by adding a con­stant to one of them. Also, where did Harsanyi say that?

I don’t think the claim “the util­ity func­tion can be ex­pressed as a lin­ear com­bi­na­tion of the in­di­vi­d­ual util­ity func­tions” is par­tic­u­larly mean­ingful, be­cause it just means that the ag­gre­gated util­ity func­tion must ex­ist in the space spanned by the in­di­vi­d­ual util­ity func­tions.

Yes, that’s what it means. I don’t see how that makes it un­mean­ingful.

Agreed that lin­ear alge­bra is a nat­u­ral way to ap­proach this. In fact, I was think­ing in similar terms. If you re­place ax­iom 3 with the stronger as­sump­tion that the util­ity func­tions to be ag­gre­gated, along with the con­stant func­tion, are lin­early in­de­pen­dent (which I think is still rea­son­able if there are an in­finite num­ber of out­comes, or even if there are just at least 2 more out­comes than agents), then it is fairly easy to show that shar­ing prefer­ences re­quires the ag­gre­ga­tion to be a lin­ear com­bi­na­tion of the util­ity func­tions and the con­stant func­tion.

Let K rep­re­sent the row vec­tor with all 1s (a con­stant func­tion). Let “pseu­dogam­ble” re­fer to column vec­tors whose el­e­ments add to 1 (Kx = 1). Note that given two pseu­dogam­bles x and y, we can find two gam­bles x’ and y’ such that for any agent A, A(x-y) has the same sign as A(x’-y’) by mix­ing the pseuogam­bles with an­other gam­ble. For in­stance, if x, y, and z are out­comes, and A(x) > A(2y-z), then A(.5x+.5z) > A(.5(2y-z)+.5z) = A(y). So the fact that I’ll be talk­ing about pseu­dogam­bles rather than gam­bles is not a prob­lem.

Any­way, if the ini­tial util­ity func­tions and K are lin­early in­de­pen­dent, then the ag­gre­gate not be­ing a lin­ear com­bi­na­tion of the ini­tial util­ity func­tions and K would mean that the ag­gre­gate, K, and the ini­tial util­ity func­tions all to­gether are lin­early in­de­pen­dent. Given a lin­early in­de­pen­dent set of row vec­tors, it is pos­si­ble to find a column vec­tor whose product with each row vec­tor is in­de­pen­dently speci­fi­able. In par­tic­u­lar, you can find column vec­tors x and y such that Kx=Ky=1, Ax>Ay for all ini­tial util­ity func­tions A, and Sx<Sy, where S is the ag­gre­gate util­ity func­tion.

Edit: I just re­al­ized that if we use Harsanyi’s shared in­differ­ence crite­rion in­stead of my shared prefer­ence crite­rion, we don’t even need the lin­ear in­de­pen­dence of the ini­tial util­ity func­tions for that ar­gu­ment to work. You can find x and y such that Kx=Ky=1, Ax=Ay for all ini­tial util­ity func­tions A, and Sx=/​=Sy if S is not a lin­ear com­bi­na­tion of the ini­tial util­ity func­tions and K, whether or not the ini­tial util­ity func­tions are lin­early in­de­pen­dent of each other, be­cause if you en­sure that Ax=Ay for a max­i­mal lin­early in­de­pen­dent sub­set of the ini­tial util­ity func­tions and K, then it fol­lows that Ax=Ay for the oth­ers as well.

• Also, where did Harsanyi say that?

Im­me­di­ately be­fore the state­ment of The­o­rem I in sec­tion III.

Yes, that’s what it means. I don’t see how that makes it un­mean­ingful.

In my mind, there’s a meang­ingful differ­ence be­tween con­struc­tion and de­scrip­tion- yes, you can de­scribe any wave­form as an in­finite se­ries of sines and cosines, but if you ac­tu­ally want to build one, you prob­a­bly want to use a finite se­ries. And this re­sult doesn’t ex­clude any ex­otic meth­ods of con­struct­ing util­ity func­tions; you could mul­ti­ply to­gether the util­ities of each in­di­vi­d­ual in the pool and you’d end up with an ag­gre­gate util­ity func­tion that could be ex­pressed as a lin­ear com­bi­na­tion of the in­di­vi­d­ual util­ities (and the ones vec­tor), with the weights chang­ing ev­ery time you add an­other in­di­vi­d­ual to the pool or add an­other out­come to be con­sid­ered.

More rele­vant to the dis­cus­sion, though, is the idea of the ag­gre­ga­tor should not in­tro­duce novel prefer­ences. This is an un­ob­jec­tion­able con­clu­sion, I would say, but it doesn’t get us very far: if there are prefer­ences in the pool that we want to ex­clude, like a util­ity mon­ster’s, set­ting their weight to 0 is what ex­cludes their prefer­ences, not aban­don­ing lin­ear com­bi­na­tions, and if the sys­tem de­signer has prefer­ences about “fair­ness” or so on, then so long as one of the agents in the pool has those prefer­ences, the sys­tem de­signer can in­cor­po­rate those prefer­ences just by in­creas­ing their weight in the com­bi­na­tion.

But in both cases, the ag­gre­ga­tor would prob­a­bly be cre­ated through an­other func­tion, and then so long as it does not in­tro­duce novel prefer­ences it can be de­scribed as a lin­ear com­bi­na­tion. In­stead of ar­gu­ing about weights, we may find it more fruit­ful to ar­gue about meta-weights, even though there is a many-to-one map­ping (for any par­tic­u­lar in­stance) from meta-weights to weights.

Let K rep­re­sent the row vec­tor with all 1s (a con­stant func­tion). Let “pseu­dogam­ble” re­fer to column vec­tors whose el­e­ments add to 1 (Kx = 1).

I’d recom­mend the use of “e” for the ones vec­tor, and if the el­e­ments add to 1, it’s not clear to me why it’s a “pseu­dogam­ble” rather than a “gam­ble,” if one uses the ter­minol­ogy that column vec­tors where only a sin­gle el­e­ment is 1 are “out­comes.”

I find prefer­ences much clearer to think about as “trade­offs”- that is, column vec­tors that add to 0, which are eas­ily cre­ated by sub­tract­ing two gam­bles, but now the scal­ing is ar­bi­trary and the sign of the product of a util­ity row vec­tor and a trade­off column vec­tor un­am­bigu­ously de­ter­mines the prefer­ence for the prefer­ence.

For in­stance, if x, y, and z are outcomes

Alpha­bet­i­cal col­li­sion!

Given a lin­early in­de­pen­dent set of row vec­tors, it is pos­si­ble to find a column vec­tor whose product with each row vec­tor is in­de­pen­dently speci­fi­able. In par­tic­u­lar, you can find column vec­tors x and y such that Kx=Ky=1, Ax>Ay for all ini­tial util­ity func­tions A, and Sx<Sy, where S is the ag­gre­gate util­ity func­tion.

Agreed.

• you could mul­ti­ply to­gether the util­ities of each in­di­vi­d­ual in the pool and you’d end up with an ag­gre­gate util­ity func­tion that could be ex­pressed as a lin­ear com­bi­na­tion of the in­di­vi­d­ual util­ities (and the ones vec­tor), with the weights chang­ing ev­ery time you add an­other in­di­vi­d­ual to the pool or add an­other out­come to be con­sid­ered.

Un­likely, un­less there are at least as many agents as out­comes.

if the sys­tem de­signer has prefer­ences about “fair­ness” or so on, then so long as one of the agents in the pool has those prefer­ences, the sys­tem de­signer can in­cor­po­rate those prefer­ences just by in­creas­ing their weight in the com­bi­na­tion.

Yes. In fact, I think some­thing like that will be nec­es­sary. For ex­am­ple, sup­pose there is a pop­u­la­tion of two agents, each of which has a “he­don func­tion” which speci­fies their agent-cen­tric prefer­ences. One of the agents is an ego­ist, so his util­ity func­tion is his he­don func­tion. The other agent is an al­tru­ist, so his util­ity func­tion is the av­er­age of his and the ego­ist’s he­don func­tions. If you add up the two util­ity func­tions, you find that the ego­ist’s he­don func­tion gets three times the weight of the al­tru­ist’s he­don func­tion, which seems un­fair. So we would want to give ex­tra weight to the al­tru­ist’s util­ity func­tion (you could ar­gue that in this ex­am­ple you should use only the al­tru­ist’s util­ity func­tion).

if the el­e­ments add to 1, it’s not clear to me why it’s a “pseu­dogam­ble” rather than a “gam­ble,” if one uses the ter­minol­ogy that column vec­tors where only a sin­gle el­e­ment is 1 are “out­comes.”

It may con­tain nega­tive el­e­ments.

• Un­likely, un­less there are at least as many agents as out­comes.

It’s un­likely that the weights of ex­ist­ing agents would change un­der ei­ther of those cases, or that the mul­ti­pli­ca­tion could be ex­pressed as a weighted sum, or that the mul­ti­pli­ca­tion would have ax­iom 2-ness?

If you add up the two util­ity func­tions, you find that the ego­ist’s he­don func­tion gets three times the weight of the al­tru­ist’s he­don func­tion, which seems un­fair.

In­deed. The prob­lem is more gen­eral- I would clas­sify the parts as “in­ter­nal” and “ex­ter­nal,” rather than agent-cen­tric and other, be­cause that makes it clearer that agents don’t have to pos­i­tively weight each other’s util­ities. If you have a ‘maltru­ist’ whose util­ity is his in­ter­nal util­ity minus the ego­ist’s util­ity (di­vided by two to nor­mal­ize), we might want to bal­ance their weight and the ego­ist’s weight so that the agents’ in­ter­nal util­ities are equally rep­re­sented in the ag­gre­ga­tor.

Such meta-weight ar­gu­ments, though, ex­ist in an en­tirely differ­ent realm from this re­sult, and so this re­sult has lit­tle bear­ing on those ar­gu­ments (which is what peo­ple are in­ter­ested in when they re­sist the claim that so­cial welfare func­tions are lin­ear com­bi­na­tions of in­di­vi­d­ual util­ity).

It may con­tain nega­tive el­e­ments.

Ah! Of course.

• It’s un­likely that the weights of ex­ist­ing agents would change un­der ei­ther of those cases, or that the mul­ti­pli­ca­tion could be ex­pressed as a weighted sum, or that the mul­ti­pli­ca­tion would have ax­iom 2-ness?

Un­likely that the mul­ti­pli­ca­tion could be ex­pressed as a weighted sum (and hence by ex­ten­sion, also un­likely it would obey ax­iom 2).

• I agree in gen­eral, be­cause we would need the left in­verse of the com­bined lin­early in­de­pen­dent in­di­vi­d­ual util­ities and e, and that won’t ex­ist. We do have free­dom to af­finely trans­form the in­di­vi­d­ual util­ities be­fore tak­ing their el­e­ment-wise product, though, and that gives us an ex­tra de­gree of free­dom per agent. I sus­pect we can do it so long as the num­ber of agents is at least half the num­ber of out­comes.

• Oh, I see what you mean. It should be pos­si­ble to find some af­finely trans­formed product that is also a lin­ear com­bi­na­tion if the num­ber of agents is at least half the num­ber of out­comes, but some ar­bi­trary af­finely trans­formed product is only likely to also be a lin­ear com­bi­na­tion if the num­ber of agents is at least the num­ber of out­comes.

• Right, I just no­ticed that. So T is out as a coun­terex­am­ple, and like­wise U is just Char­lie’s util­ity. At­tempt­ing to build an­other coun­terex­am­ple.