# Polymath-style attack on the Parliamentary Model for moral uncertainty

Thanks to ESrogs, Ste­fan_Schu­bert, and the Effec­tive Altru­ism sum­mit for the dis­cus­sion that led to this post!

This post is to test out Poly­math-style col­lab­o­ra­tion on LW. The prob­lem we’ve cho­sen to try is for­mal­iz­ing and an­a­lyz­ing Bostrom and Ord’s “Par­li­a­men­tary Model” for deal­ing with moral un­cer­tainty.

I’ll first re­view the Par­li­a­men­tary Model, then give some of Poly­math’s style sug­ges­tions, and fi­nally sug­gest some di­rec­tions that the con­ver­sa­tion could take.

## The Par­li­a­men­tary Model

The Par­li­a­men­tary Model is an un­der-speci­fied method of deal­ing with moral un­cer­tainty, pro­posed in 2009 by Nick Bostrom and Toby Ord. Re­post­ing Nick’s sum­mary from Over­com­ing Bias:

Sup­pose that you have a set of mu­tu­ally ex­clu­sive moral the­o­ries, and that you as­sign each of these some prob­a­bil­ity. Now imag­ine that each of these the­o­ries gets to send some num­ber of del­e­gates to The Par­li­a­ment. The num­ber of del­e­gates each the­ory gets to send is pro­por­tional to the prob­a­bil­ity of the the­ory. Then the del­e­gates bar­gain with one an­other for sup­port on var­i­ous is­sues; and the Par­li­a­ment reaches a de­ci­sion by the del­e­gates vot­ing. What you should do is act ac­cord­ing to the de­ci­sions of this imag­i­nary Par­li­a­ment. (Ac­tu­ally, we use an ex­tra trick here: we imag­ine that the del­e­gates act as if the Par­li­a­ment’s de­ci­sion were a stochas­tic vari­able such that the prob­a­bil­ity of the Par­li­a­ment tak­ing ac­tion A is pro­por­tional to the frac­tion of votes for A. This has the effect of elimi­nat­ing the ar­tifi­cial 50% thresh­old that oth­er­wise gives a ma­jor­ity bloc ab­solute power. Yet – un­be­knownst to the del­e­gates – the Par­li­a­ment always takes what­ever ac­tion got the most votes: this way we avoid pay­ing the cost of the ran­dom­iza­tion!)

The idea here is that moral the­o­ries get more in­fluence the more prob­a­ble they are; yet even a rel­a­tively weak the­ory can still get its way on some is­sues that the the­ory think are ex­tremely im­por­tant by sac­ri­fic­ing its in­fluence on other is­sues that other the­o­ries deem more im­por­tant. For ex­am­ple, sup­pose you as­sign 10% prob­a­bil­ity to to­tal util­i­tar­i­anism and 90% to moral ego­ism (just to illus­trate the prin­ci­ple). Then the Par­li­a­ment would mostly take ac­tions that max­i­mize ego­is­tic satis­fac­tion; how­ever it would make some con­ces­sions to util­i­tar­i­anism on is­sues that util­i­tar­i­anism thinks is es­pe­cially im­por­tant. In this ex­am­ple, the per­son might donate some por­tion of their in­come to ex­is­ten­tial risks re­search and oth­er­wise live com­pletely self­ishly.

I think there might be wis­dom in this model. It avoids the dan­ger­ous and un­sta­ble ex­trem­ism that would re­sult from let­ting one’s cur­rent fa­vorite moral the­ory com­pletely dic­tate ac­tion, while still al­low­ing the ag­gres­sive pur­suit of some non-com­mon­sen­si­cal high-lev­er­age strate­gies so long as they don’t in­fringe too much on what other ma­jor moral the­o­ries deem cen­trally im­por­tant.

In a com­ment, Bostrom con­tinues:

there are a num­ber of known is­sues with var­i­ous vot­ing sys­tems, and this is the rea­son I say our model is im­pre­cise and un­der-de­ter­mined. But we have some quite sub­stan­tial in­tu­itions and in­sights into how ac­tual par­li­a­ments work so it is not a com­plete black box. For ex­am­ple, we can see that, other things equal, views that have more del­e­gates tend to ex­ert greater in­fluence on the out­come, etc. There are some fea­tures of ac­tual par­li­a­ments that we want to pos­tu­late away. The fake ran­dom­iza­tion step is one pos­tu­late. We also think we want to stipu­late that the imag­i­nary par­li­a­men­tar­i­ans should not en­gage in black­mail etc. but we don’t have a full speci­fi­ca­tion of this. Also, we have not defined the rule by which the agenda is set. So it is far from a com­plete for­mal model.

It’s an in­ter­est­ing idea, but clearly there are a lot of de­tails to work out. Can we for­mally spec­ify the kinds of ne­go­ti­a­tion that del­e­gates can en­gage in? What about black­mail or pris­on­ers’ dilem­mas be­tween del­e­gates? It what ways does this pro­posed method out­perform other ways of deal­ing with moral un­cer­tainty?

I was dis­cussing this with ESRogs and Ste­fan_Schu­bert at the Effec­tive Altru­ism sum­mit, and we thought it might be fun to throw the ques­tion open to LessWrong. In par­tic­u­lar, we thought it’d be a good test prob­lem for a Poly­math-pro­ject-style ap­proach.

## How to Polymath

The Poly­math com­ment style sug­ges­tions are not so differ­ent from LW’s, but num­bers 5 and 6 are par­tic­u­larly im­por­tant. In essence, they point out that the idea of a Poly­math pro­ject is to split up the work into min­i­mal chunks among par­ti­ci­pants, and to get most of the think­ing to oc­cur in com­ment threads. This is as op­posed to a pro­cess in which one com­mu­nity mem­ber goes off for a week, med­i­tates deeply on the prob­lem, and pro­duces a com­plete solu­tion by them­selves. Poly­math rules 5 and 6 are in­struc­tive:

5. If you are plan­ning to think about some as­pect of the prob­lem offline for an ex­tended length of time, let the rest of us know. A poly­math pro­ject is sup­posed to be more than the sum of its in­di­vi­d­ual con­trib­u­tors; the in­sights that you have are sup­posed to be shared amongst all of us, not kept in iso­la­tion un­til you have re­solved all the difficul­ties by your­self. It will un­doubt­edly be the case, es­pe­cially in the later stages of a poly­math pro­ject, that the best way to achieve progress is for one of the par­ti­ci­pants to do some deep thought or ex­ten­sive com­pu­ta­tion away from the blog, but to keep in the spirit of the poly­math pro­ject, it would be good if you could let us know that you are do­ing this, and to up­date us on what­ever progress you make (or fail to make). It may well be that an­other par­ti­ci­pant may have a sug­ges­tion that could save you some effort.

6. An ideal poly­math re­search com­ment should rep­re­sent a “quan­tum of progress”. On the one hand, it should con­tain a non-triv­ial new in­sight (which can in­clude nega­tive in­sights, such as point­ing out that a par­tic­u­lar ap­proach to the prob­lem has some spe­cific difficulty), but on the other hand it should not be a com­plex piece of math­e­mat­ics that the other par­ti­ci­pants will have trou­ble ab­sorb­ing. (This prin­ci­ple un­der­lies many of the pre­ced­ing guidelines.) Ba­si­cally, once your thought pro­cesses reach a point where one could effi­ciently hand the ba­ton on to an­other par­ti­ci­pant, that would be a good time to de­scribe what you’ve just re­al­ised on the blog.

It seems to us as well that an im­por­tant part of the Poly­math style is to have fun to­gether and to use the prin­ci­ple of char­ity liber­ally, so as to cre­ate a space in which peo­ple can safely be wrong, point out flaws, and build up a bet­ter pic­ture to­gether.

## Our test project

If you’re still read­ing, then I hope you’re in­ter­ested in giv­ing this a try. The over­all goal is to clar­ify and for­mal­ize the Par­li­a­men­tary Model, and to an­a­lyze its strengths and weak­nesses rel­a­tive to other ways of deal­ing with moral un­cer­tainty. Here are the three most promis­ing ques­tions we came up with:

1. What prop­er­ties would be de­sir­able for the model to have (e.g. Pareto effi­ciency)?

2. What should the ex­act mechanism for ne­go­ti­a­tion among del­e­gates?

3. Are there other mod­els that are prov­ably dom­i­nated by some nice for­mal­iza­tion of the Par­li­a­men­tary Model?

The origi­nal OB post had a cou­ple of com­ments that I thought were worth re­pro­duc­ing here, in case they spark dis­cus­sion, so I’ve posted them.

Fi­nally, if you have meta-level com­ments on the pro­ject as a whole in­stead of Poly­math-style com­ments that aim to clar­ify or solve the prob­lem, please re­ply in the meta-com­ments thread.

• Con­sider the fol­low­ing de­gen­er­ate case: there is only one de­ci­sion to be made, and your com­pet­ing the­o­ries as­sess it as fol­lows.

• The­ory 1: op­tion A is vastly worse than op­tion B.

• The­ory 2: op­tion A is just a tiny bit bet­ter than op­tion B.

And sup­pose you find the­ory 2 just slightly more prob­a­ble than the­ory 1.

Then it seems like any par­li­a­men­tary model is go­ing to say that the­ory 2 wins, and you choose op­tion A. That seems like a bad out­come.

Ac­cord­ingly, I sug­gest that to ar­rive at a work­able par­li­a­men­tary model we need to do at least one of the fol­low­ing:

• Disal­low de­gen­er­ate cases of this kind. (Seems wrong; e.g., sup­pose you have an im­por­tant de­ci­sion to make on your deathbed.)

• Bite the bul­let and say that in the situ­a­tion above you re­ally are go­ing to choose A over B. (Seems pretty ter­rible.)

• Take into ac­count how strongly the del­e­gates feel about the de­ci­sion, in such a way that you’d choose B in this situ­a­tion. (Hand­wav­ily it feels as if any way of do­ing this is go­ing to con­strain how much “tac­ti­cal” vot­ing the del­e­gates can en­gage in.)

As you might gather, I find the last op­tion the most promis­ing.

• Great ex­am­ple. As an al­ter­na­tive to your three op­tions (or maybe this falls un­der your first bul­let), maybe ne­go­ti­a­tion should hap­pen be­hind a veil of ig­no­rance about what de­ci­sions will ac­tu­ally need to be made; the del­e­gates would ar­rive at a de­ci­sion func­tion for all pos­si­ble de­ci­sions.

Your ex­am­ple does make me ner­vous, though, on the be­half of del­e­gates who don’t have much to ne­go­ti­ate with. Maybe (as bad­ger says) car­di­nal in­for­ma­tion does need to come into it.

• Yes, I think we need some­thing like this veil of ig­no­rance ap­proach.

In a pa­per (preprint) with Ord and MacAskill we prove that for similar pro­ce­dures, you end up with cycli­cal prefer­ences across choice situ­a­tions if you try to de­cide af­ter you know the choice situ­a­tion. The par­li­a­men­tary model isn’t quite within the scope of the proof, but I think more or less the same proof works. I’ll try to sketch it.

Sup­pose:

• We have equal cre­dence in The­ory 1, The­ory 2, and The­ory 3

• The­ory 1 prefers A > B > C

• The­ory 2 prefers B > C > A

• The­ory 3 prefers C > A > B

Then in a de­ci­sion be­tween A and B there is no scope for ne­go­ti­a­tion, so as two of the the­o­ries pre­fer A the par­li­a­ment will. Similarly in a choice be­tween B and C the par­li­a­ment will pre­fer B, and in a choice be­tween C and A the par­li­a­ment will pre­fer A.

• This seems re­ally similar to the prob­lem Knigh­tian un­cer­tainty at­tempts to fix.

I think So8res’s solu­tion is es­sen­tially your op­tion 3, with the strength of the dis­agree­ments be­ing taken into ac­count in the util­ity func­tion, and then once you re­ally have ev­ery­thing you care about ac­counted for, then the best choice is the stan­dard one.

• I agree that some car­di­nal in­for­ma­tion needs to en­ter in the model to gen­er­ate com­pro­mise. The ques­tion is whether we can map all the­o­ries onto the same util­ity scale or whether each agent gets their own scale. If we put ev­ery­thing on the same scale, it looks like we’re do­ing meta-util­i­tar­i­anism. If each agent gets their own scale, com­pro­mise still makes sense with­out meta-value judg­ments.

Two out­comes is too de­gen­er­ate if agents get their own scales, so sup­pose A, B, and C were op­tions, the­ory 1 has or­di­nal prefer­ences B > C > A, and the­ory 2 has prefer­ences A > C > B. Depend­ing on how much of a com­pro­mise C is for each agent, the out­come could vary be­tween

• choos­ing C (say if C is 99% as good as the ideal for each agent),

• a 5050 lot­tery over A and B (if C is only 1% bet­ter than the worst for each), or

• some other lot­tery (for in­stance, 1 thinks C achieves 90% of B and 2 thinks C achieves 40% of A. Then, a lot­tery with weight 2/​3rds on C and 1/​3rd on A gives them each 60% of the gain be­tween their best and worst)

• A pos­si­ble (but I ad­mit, quite ugly) workaround: when­ever there are very few de­ci­sions to be made in­tro­duce dummy bills that would not be ac­tu­ally car­ried out. MPs wouldn’t know about their ex­is­tence. In this case The­ory 1 might be able to ne­go­ti­ate their way into get­ting B.

• My read­ing of the prob­lem is that a satis­fac­tory Par­li­a­men­tary Model should:

• Rep­re­sent moral the­o­ries as del­e­gates with prefer­ences over adopted poli­cies.

• Allow del­e­gates to stand-up for their the­o­ries and bar­gain over the fi­nal out­come, ex­tract­ing con­ces­sions on vi­tal points while let­ting oth­ers poli­cies slide.

• Restrict del­e­gates’ use of dirty tricks or de­ceit.

Since bar­gain­ing in good faith ap­pears to be the core fea­ture, my mind im­me­di­ately goes to mod­els of bar­gain­ing un­der com­plete in­for­ma­tion rather than vot­ing. What are the pros and cons of start­ing with the Nash bar­gain­ing solu­tion as im­ple­mented by an al­ter­nat­ing offer game?

The two ob­vi­ous is­sues are how to trans­late del­e­gate’s prefer­ences into util­ities and what the dis­agree­ment point is. As­sum­ing a util­ity func­tion is fairly mild if the del­e­gate has prefer­ences over lot­ter­ies. Plus,there’s no util­ity com­par­i­son prob­lem even though you need car­di­nal util­ities. The lack of a nat­u­ral dis­agree­ment point is trick­ier. What in­tu­itions might be lost go­ing this route?

• I think there’s a fairly nat­u­ral dis­agree­ment point here: the out­come with no trade, which is just a ran­domi­sa­tion of the top op­tions of the differ­ent the­o­ries, with prob­a­bil­ity ac­cord­ing to the cre­dence in that the­ory.

One pos­si­bil­ity to progress is to analyse what hap­pens here in the two-the­ory case, per­haps start­ing with some worked ex­am­ples.

• Alright, a cre­dence-weighted ran­dom­iza­tion be­tween ideals and then bar­gain­ing on equal foot­ing from there makes sense. I was imag­in­ing the par­li­a­ment start­ing from scratch.

Another al­ter­na­tive would be to use a hy­po­thet­i­cal dis­agree­ment point cor­re­spond­ing to the worst util­ity for each the­ory and giv­ing higher cre­dence the­o­ries more bar­gain­ing power. Or more bar­gain­ing power from a typ­i­cal per­son’s life (the out­come can’t be worse for any the­ory than a policy of be­ing kind to your fam­ily, giv­ing to so­cially-mo­ti­vated causes, cheat­ing on your taxes a lit­tle, tel­ling white lies, and not mur­der­ing).

• In the set-up we’re given the de­scrip­tion of what hap­pens with­out any trade—I don’t quite see how we can jus­tify us­ing any­thing else as a defec­tion point.

• I think the the Nash bar­gain­ing solu­tion should be pretty good if there are only two mem­bers of the par­li­a­ment, but it’s not clear how to scale up to a larger par­li­a­ment.

• For the NBS with more than two agents, you just max­i­mize the product of ev­ery­one’s gain in util­ity over the dis­agree­ment point. For Kalai-Smodor­in­sky, you con­tinue to equate the ra­tios of gains, i.e. pick­ing the point on the Pareto fron­tier on the line be­tween the dis­agree­ment point and vec­tor of ideal util­ities.

Agents could be given more bar­gain­ing power by giv­ing them differ­ent ex­po­nents in the Nash product.

• Giv­ing them differ­ent ex­po­nents in the Nash product has some ap­peal, ex­cept that it does seem like NBS with­out mod­ifi­ca­tion is cor­rect in the two-del­e­gate case (where the weight as­signed to the differ­ent the­o­ries is cap­tured prop­erly by the fact that the defec­tion point is more closely al­igned with the view of the the­ory with more weight). If we don’t think that’s right in the two-del­e­gate case we should have some ac­count of why not.

• The is­sue is when we should tilt out­comes in fa­vor of higher cre­dence the­o­ries. Start­ing from a cre­dence-weighted mix­ture, I agree the­o­ries should have equal bar­gain­ing power. Start­ing from a more neu­tral dis­agree­ment point, like the sta­tus quo ac­tions of a typ­i­cal per­son, higher cre­dence should en­tail more power /​ votes /​ del­e­gates.

On a quick ex­am­ple, equal bar­gain­ing from a cre­dence-weighted mix­ture tends to fa­vor the lower cre­dence the­ory com­pared to weighted bar­gain­ing from an equal sta­tus quo. If the to­tal fea­si­ble set of util­ities is {(x,y) | x^2 + y^2 ≤ 1; x,y ≥ 0}, then the NBS start­ing from (0.9, 0.1) is about (0.95, 0.28) and the NBS start­ing from (0,0) with the­ory 1 hav­ing nine del­e­gates (i.e. an ex­po­nent of nine in the Nash product) and the­ory 2 hav­ing one del­e­gate is (0.98, 0.16).

If the cre­dence-weighted mix­ture were on the Pareto fron­tier, both ap­proaches are equiv­a­lent.

• Up­date: I now be­lieve I was over-sim­plify­ing things. For two del­e­gates I think is cor­rect, but in the par­li­a­men­tary model that cor­re­sponds to giv­ing the the­o­ries equal cre­dence. As cre­dences vary so do the num­ber of del­e­gates. Max­imis­ing the Nash product over all del­e­gates is equiv­a­lent to max­imis­ing a product where they have differ­ent ex­po­nents (ex­po­nents in pro­por­tion to the num­ber of del­e­gates).

• In or­der to get a bet­ter han­dle on the prob­lem, I’d like to try walk­ing through the me­chan­ics of a how a vote by moral par­li­a­ment might work. I don’t claim to be do­ing any­thing new here, I just want to de­scribe the par­li­a­ment in more de­tail to make sure I un­der­stand it, and so that it’s eas­ier to rea­son about.

Here’s the setup I have in mind:

• let’s sup­pose we’ve already al­lo­cated del­e­gates to moral the­o­ries, and we’ve ended up with 100 mem­bers of par­li­a­ment, MP_1 through MP_100

• these MP’s will vote on 10 bills B_1 through B_10 that will each ei­ther pass or fail by ma­jor­ity vote

• each MP M_m has a util­ity score for each bill B_b pass­ing U_m,b (and as­signs zero util­ity to the bill failing, so if they’d rather the bill fail, U_m,b is nega­tive)

• the votes will take place on each bill in or­der from B_1 to B_10, and this or­der is known to all MP’s

• all MP’s know each other’s util­ity scores

Each MP wants to max­i­mize the util­ity of the re­sults ac­cord­ing to their own scores, and they can en­gage in ne­go­ti­a­tion be­fore the vot­ing starts to ac­com­plish this.

Does this seem to oth­ers like a rea­son­able de­scrip­tion of how the par­li­a­men­tary vote might work? Any sug­ges­tions for im­prove­ments to the de­scrip­tion?

If oth­ers agree that this de­scrip­tion is un­ob­jec­tion­able, I’d like to move on to dis­cussing ne­go­ti­at­ing strate­gies the MP’s might use, the prop­er­ties these strate­gies might have, and whether there are re­stric­tions that might be use­ful to place on ne­go­ti­at­ing strate­gies. But I’ll wait to see if oth­ers think I’m miss­ing any im­por­tant con­sid­er­a­tions first.

• This looks rea­son­able to analyse (al­though I’d be in­ter­ested in analysing other forms too).

I’d be tempted to start with a sim­pler ex­am­ple to get com­plete anal­y­sis. Per­haps 2 bills and 2 MPs. If that’s easy, move to 3 MPs.

• It seems like votes should be con­sid­ered si­mul­ta­neously to avoid com­plex al­li­ances of the form: I will vote on B4 in the di­rec­tion you like if you vote on B3 in the di­rec­tion I like, but this is only pos­si­ble in one di­rec­tion WRT time. Hav­ing such an or­der­ing and re­sult­ing ne­go­ti­a­tions means that some agents have an in­cen­tive to bar­gain for mov­ing the lo­ca­tion of a bill. It seems bet­ter to be able to make all such Bx vote for By vote trades. I’m not fa­mil­iar enough with vot­ing mod­els to know the trade­offs for a si­mul­ta­neous sys­tem though.

• An al­ter­na­tive is to say that only one of the votes ac­tu­ally oc­curs, but which it is will be cho­sen ran­domly.

• A very quick thought about one type of pos­si­ble ne­go­ti­at­ing strate­gies. A del­e­gate might choose a sub­set of bills, choose an­other del­e­gate to ap­proach and offer a usual cake cut­ting game for two play­ers, when the first del­e­gate di­vides that sub­set into two “piles” and al­lows the sec­ond del­e­gate to choose one of them. Then they each would de­cide how to vote on the bills from their re­spec­tive “piles” and promise to vote in ac­cor­dance to each other’s de­ci­sions.

How­ever, it is not clear to me how these two choices (marked by as­ter­isks) should work. Also, whether the sec­ond can­di­date should be al­lowed to re­ject the offer to play a cake cut­ting game.

edit: A po­ten­tial flaw. Sup­pose we have a bill with two pos­si­ble vot­ing op­tions A_1 and A_2 (e.g. “yes” and “no”) with no pos­si­bil­ity to in­tro­duce a new in­ter­me­di­ate op­tion. If a op­tion A is sup­ported by a small enough minor­ity (0.75), this minor­ity would never be able to achieve A (even though they wouldn’t know that), and util­ity differ­ence U_m (A_1) - U_m (A_2) for each m would not mat­ter, only the sign of differ­ence would.

• A re­mark that seems suffi­ciently dis­tinct to de­serve its own com­ment. At this mo­ment we are only think­ing about del­e­gates with “fixed per­son­al­ities”. Should “per­son­al­ity” of a del­e­gate be “re­calcu­lated[1]” af­ter each new agree­ment/​trade [2]? Changes would tem­po­rary, only within a con­text of a given set of bills, they would re­vert to their origi­nal “per­son­al­ities” af­ter the vote. Maybe this could give re­sults that would be vaguely analo­gous to smooth­ing a func­tion? This would al­low us to have a kind of “per­sua­sion”.

In the con­text of my com­ment above, this could en­able tak­ing into ac­count util­ity differ­ences and not just signs, as­sum­ing large differ­ences in util­ity would usu­ally re­quire large changes (and there­fore, usu­ally more than one change) in “per­son­al­ity” to in­vert the sign of it. I ad­mit that this is very hand­wavy.

[1] I do not know what in­ter­po­la­tion al­gorithm should be used

[2] A sec­ond re­mark. Maybe del­e­gates should trade changes in each other’s “per­son­al­ity” rather than votes them­selves, i.e. in­stead of promis­ing to vote on bills in ac­cor­dance to some bind­ing agree­ment, they would promise to perform a min­i­mal pos­si­ble non-ad-hoc change [3] to their per­son­al­ities that would make them vote that way? How­ever, this could cre­ate slip­pery slopes, similar to those men­tioned here.

[3] This is prob­a­bly a hard problem

• It seems to me that the less per­sonal MPs are, and the fewer op­por­tu­ni­ties we al­low for an­thro­po­mor­phic per­sua­sion be­tween them (through ap­peals such as is­sue fram­ing, plead­ing, sig­nal­ing loy­alty to a coal­i­tion, in­gra­ti­a­tion, de­fa­ma­tion, challenges to the MPs sta­tus, de­ceit (e.g. un­re­li­able state­ments by MPs about their pri­vate info rele­vant to prob­a­ble con­se­quences of acts re­sult­ing from the pas­sage of bills)), then all the more we will en­cap­su­late away the hard prob­lems of moral rea­son­ing within the MPs.

Even per­sua­sive mechanisms more amenable to for­mal­iza­tion—like agree­ments be­tween MPs to re­al­lo­cate their com­pu­ta­tional re­sources, or like risk-shar­ing agree­ments be­tween MPs based on their ex­pec­ta­tions that they might lose fu­ture in­fluence in the par­li­a­ment if the agent changes its as­sign­ment of prob­a­bil­ities to the MPs’ moral cor­rect­ness based on its ob­ser­va­tion of de­ci­sion con­se­quences—even these sound to me, in the ab­sence of rea­sons why they should ap­pear in a the­ory of how to act given a dis­tri­bu­tion over self-con­tained moral the­o­ries, like com­pli­ca­tions that will im­pede crisp math­e­mat­i­cal rea­son­ing, in­tro­duced mainly for their similar­ity to the mechanisms that func­tion in real hu­man par­li­a­ments.

Or am I off base, and your scare quotes around “per­son­al­ity” mean that you’re talk­ing about some­thing else? Be­cause what I’m pic­tur­ing is ba­si­cally some­one build­ing cog­ni­tive ma­chin­ery for emo­tions, con­cepts, habits and styles of think­ing, et cetera, on top of moral the­o­ries.

• Well, I agree that I chose words badly and then didn’t ex­plain the in­tended mean­ing, con­tinued to speak in metaphors (my writ­ing skills are se­ri­ously lack­ing). What I called “per­son­al­ity” of a del­e­gate was a func­tion that as­signs a util­ity score for any given state of the world (at the be­gin­ning they are de­ter­mined by moral the­o­ries). In my first post I thought about these util­ity func­tion as con­stants and stayed that way through­out ne­go­ti­a­tion pro­cess (it was my im­pres­sion that ESRogs 3rd as­sump­tion im­plic­itly says ba­si­cally the same thing), maybe ac­cept­ing some bind­ing agree­ments if they help to in­crease the ex­pected util­ity (these agree­ments are not treated as a part of util­ity func­tion, they are ad-hoc).

On the other hand, what if we drop the as­sump­tion that these util­ity func­tions stay con­stant? What if, e.g. when two del­e­gates meet, in­stead of ex­chang­ing bind­ing agree­ments to vote in a spe­cific way, they would ex­change agree­ments to self-mod­ify in a spe­cific way that would cor­re­spond to those agree­ments? I.e. sup­pose a del­e­gate M_1 strongly prefers op­tion O_1,1 to an op­tion O_1,2 on an is­sue B_1 and slightly prefers O_2,1 to O_2,2 on an is­sue B_2, whereas a del­e­gate M_2 strongly prefers op­tion O_2,2 to an op­tion O_2,1 on an is­sue B_2 and slightly prefers O_1,2 to O_1,1 on an is­sue B_1. Now, M_1 could agree to vote (O_1,1 ;O_2,2) in ex­change for a promise that M_2 would vote the same way, and sign a bind­ing agree­ment. On the other hand, M_1 could agree to self-mod­ify to slightly pre­fer O_2,2 to O_2,1 in ex­change for a promise that M_2 would self-mod­ify to slightly pre­fer O_1,1 to O_1,2 (both want to self-mod­ify as lit­tle as pos­si­ble, how­ever any mod­ifi­ca­tion that is not ad-hoc would prob­a­bly af­fect util­ity func­tion at more than one point (?). Self-mod­ify­ing in this case is re­stricted (only util­ity func­tion is mod­ified), there­fore maybe it wouldn’t re­quire heavy ma­chin­ery (I am not sure), be­sides, all util­ity func­tions ul­ti­mately be­long to the same per­sons). Th­ese self-mod­ifi­ca­tions are not bind­ing agree­ments, del­e­gates are al­lowed to fur­ther self-mod­ify their “per­son­al­ities”(i.e. util­ity func­tions) in an­other ex­change.

Now, this idea vaguely re­minds me a smooth­ing over the space of all pos­si­ble util­ity func­tions. Me­taphor­i­cally, this looks as if del­e­gates were “per­suaded” to change their “per­son­al­ities”, their “opinions about things”(i.e. util­ity func­tions) by an “ar­gu­ment” (i.e. ex­change).

I would guess these self-mod­ify­ing del­e­gates should be used as dummy vari­ables dur­ing a finite ne­go­ti­a­tion pro­cess. After the vote, del­e­gates would re­vert to their origi­nal util­ity func­tions.

• Re­mem­ber there’s no such thing as zero util­ity. You can as­sign an ar­bi­trar­ily bad value to failing to re­solve, but it seems a bit ar­bi­trary.

• Hmm. What I was in­tend­ing to do there was cap­ture the idea that a bill failing to pass is the de­fault state, and I’m only in­ter­ested in the differ­ence be­tween a bill pass­ing and a bill failing. So the util­ity score of a bill pass­ing is sup­posed to rep­re­sent the differ­ence be­tween it get­ting passed vs noth­ing hap­pen­ing.

Does that make sense? Am I just us­ing util­ity ter­minol­ogy in a con­fus­ing way?

• Pin­ning the util­ity of a failed bill to 0 for all agents gets rid of some free pa­ram­e­ters in the model, but it’s not clear to me that it’s the com­plete way to do so (you still have enough free pa­ram­e­ters that you could do more).

What do we get from us­ing the util­ity per bill frame­work?

1. We en­force that the com­bined de­sir­a­bil­ity of a bill port­fo­lio can only de­pend on the sum of the in­di­vi­d­ual de­sir­a­bil­ities of the bills.

2. We al­low MPs to price gam­bles be­tween bills.

It’s not clear to me that the sec­ond is go­ing to be use­ful (do they have ac­cess to a source of ran­dom­ness and bind­ing com­mit­ments?), and it’s not clear to me that the first is a re­quire­ment we ac­tu­ally want to im­pose. Sup­pose B1 is some­thing like “cows are peo­ple” and B2 is some­thing like “we shouldn’t eat peo­ple.” A MP who is against eat­ing hu­mans but for eat­ing cows will flip their opinion on B2 based on the (ex­pected) out­come of B1.

So then it seems like we should as­sign val­ues to port­fo­lios (i.e. bit­strings of whether or not bills passed), and if we don’t need prob­a­bil­is­tic in­ter­pre­ta­tions then we should deal with or­di­nal rank­ings of those bit­strings that al­low in­differ­ence, which would look like (01>11>10=00). A per­haps in­ac­cessible way to talk about those rank­ings is sets of per­mu­ta­tions of bit­strings (the pre­vi­ous rank­ing is <(01,11,10,00),(01,11,00,10)>).

• That’s a good sug­ges­tion about the al­low­ing the MP’s as­sign util­ities to port­fo­lios. I went with the per bill frame­work be­cause I thought it was sim­pler, and was try­ing to find the sim­plest for­mal­iza­tion I could that would cap­ture the in­ter­est­ing parts of the par­li­a­men­tary model.

But per­haps de­pen­dence of bills on each other (or in the real world of ac­tions that one’s moral par­li­a­ment might take on each other) might be a key fea­ture?

It might be in­ter­est­ing to see if we can an­a­lyze both mod­els.

• In Ideal Ad­vi­sor The­o­ries and Per­sonal CEV, my co-au­thor and I de­scribe a par­tic­u­lar (but still im­pre­cisely speci­fied) ver­sion of the par­li­a­men­tary ap­proach:

we de­ter­mine the per­sonal CEV of an agent by simu­lat­ing mul­ti­ple ver­sions of them, ex­trap­o­lated from var­i­ous start­ing times and along differ­ent de­vel­op­men­tal paths. Some of these ver­sions are then as­signed to a par­li­a­ment where they vote on var­i­ous choices and make trades with one an­other.

We then very briefly ar­gue that this kind of ap­proach can over­come some ob­jec­tions to par­li­a­men­tary mod­els (and similar the­o­ries) made by philoso­pher David So­bel.

The pa­per is short and non-tech­ni­cal, but still man­ages to sum­ma­rize some con­cerns that we’ll likely want a for­mal­ized par­li­a­men­tary model to over­come or sidestep.

• We dis­cussed this is­sue at the two MIRIx Bos­ton work­shops. A big prob­lem with par­li­a­men­tary mod­els which we were un­able to solve, was what we’ve been call­ing en­sem­ble sta­bil­ity. The is­sue is this: sup­pose your AI’s value sys­tem is made from a col­lec­tion of value sys­tems in a vot­ing-like sys­tem, is con­struct­ing a suc­ces­sor, more pow­er­ful AI, and is con­sid­er­ing con­struct­ing the suc­ces­sor so that it rep­re­sents only a sub­set of the origi­nal value sys­tems. Each value sys­tem which is rep­re­sented will be in fa­vor; each value sys­tem which is not rep­re­sented, will be against. In or­der to keep that from hap­pen­ing, you ei­ther need a vot­ing sys­tem which some­how re­li­ably never does that (but noth­ing we tried worked), or a spe­cial case for con­struct­ing suc­ces­sors, and a work­ing loop­hole-free defi­ni­tion of that case (which is Hard).

• This seems to be al­most equiv­a­lent to ir­re­versibly form­ing a ma­jor­ity vot­ing bloc. The only differ­ence is how they in­ter­act with the (fake) ran­dom­iza­tion: by cre­at­ing a sub­agent, it effec­tively (perfectly) cor­re­lates all the fu­ture ran­dom out­puts. (In gen­eral, I think this will change the out­comes un­less agents’ (car­di­nal) prefer­ences about differ­ent de­ci­sions are in­de­pen­dent).

The ran­dom­iza­tion trick still po­ten­tially helps here: it would be in each rep­re­sen­ta­tive’s in­ter­est to agree not to vote for such pro­pos­als, prior to know­ing which such pro­pos­als will come up and in which or­der they’re voted on. How­ever, de­pend­ing on what frac­tion of its po­ten­tial value an agent ex­pects to be able to achieve through ne­go­ti­a­tions, I think that some agents would not sign such an agree­ment if they know they will have the chance to try to lock their op­po­nents out be­fore they might get locked out.

Ac­tu­ally, there seems to be a more gen­eral is­sue with or­der­ing and in­com­pat­i­ble com­bi­na­tions of choices - split­ting that into a differ­ent com­ment.

• It seems that spec­i­fy­ing the del­e­gates’ in­for­ma­tional situ­a­tion cre­ates a dilemma.

As you write above, we should take the del­e­gates to think that Par­li­a­ment’s de­ci­sion is a stochas­tic vari­able such that the prob­a­bil­ity of the Par­li­a­ment tak­ing ac­tion A is pro­por­tional to the frac­tion of votes for A, to avoid giv­ing the ma­jor­ity bloc ab­solute power.

How­ever, your sug­ges­tion gen­er­ates its own prob­lems (as long as we take the par­li­a­ment to go with the op­tion with the most votes):

Sup­pose an is­sue The Par­li­a­ment votes on in­volves op­tions A1, A2, …, An and an ad­di­tional op­tion X. Sup­pose fur­ther that the great ma­jor­ity of the­o­ries in which the agent has cre­dence agree that it is very im­por­tant to perform one of A1, A2, …, An rather than X. Although all these the­o­ries have a differ­ent favourite op­tion, which of A1, A2, …, An is performed makes lit­tle differ­ence to them.

Now sup­pose that ac­cord­ing to an ad­di­tional hy­poth­e­sis in which the agent has rel­a­tively lit­tle cre­dence, it is best to perform X.

Be­cause the del­e­gates who favour A1, A2, …, An do not know that what mat­ters is get­ting the ma­jor­ity, they see no value in co­or­di­nat­ing them­selves and con­cen­trat­ing their votes on one or a few op­tions to make sure X will not end up get­ting the most votes. Ac­cord­ingly, they will all vote for differ­ent op­tions. X may then end up be­ing the op­tion with most votes if the agent has slightly more cre­dence in the hy­poth­e­sis which favours X than in any other in­di­vi­d­ual the­ory, de­spite the fact that the agent is al­most sure that this op­tion is grossly sub­op­ti­mal.

This is clearly the wrong re­sult.

• It looks like this prob­lem is as­sum­ing that Par­li­a­ment uses plu­ral­ity vot­ing with more than 2 op­tions. It seems like it shouldn’t be a prob­lem if all votes in­volve only 2 op­tions (an up-or-down vote on a sin­gle bill). If we want the rules to al­low votes be­tween more than 2 op­tions, it seems fix­able by us­ing a differ­ent vot­ing sys­tem such as ap­proval vot­ing.

• Given del­e­gates with a cer­tain type of am­ne­sia (i.e. they should not re­mem­ber to have voted on an is­sue be­fore, al­though they might have to re­mem­ber some bind­ing agree­ments (I am not sure about that)), we could re­place a plu­ral­ity vote with an elimi­na­tion runoff, where at each step of elimi­na­tion del­e­gates think that this is the only vote on that is­sue (which is thought to be af­fected by ran­dom­iza­tion) and they are not al­lowed to in­tro­duce new op­tions.

Well, this sys­tem might have its own dis­ad­van­tages, pos­si­bly similar to these (how­ever, at each step ne­go­ti­a­tions are al­lowed), al­though del­e­gates wouldn’t know how to ac­tu­ally game it.

• It seems to me that if we’re go­ing to be for­mal­iz­ing the idea of the rel­a­tive “moral im­por­tance” of var­i­ous courses of ac­tion to differ­ent moral the­o­ries, we’ll end up hav­ing to use some­thing like util­ity func­tions. It’s un­for­tu­nate, then, that de­on­tolog­i­cal rules (which are pretty com­mon) can’t be speci­fied with finite util­ity func­tions be­cause of the time­less­ness is­sue (i.e., a de­on­tol­o­gist who doesn’t lie won’t lie even if do­ing so would pre­vent them from be­ing forced to tell ten lies in the fu­ture).

• Well, the en­tire idea of the par­li­a­men­tary ap­proach is pred­i­cated on the idea that the par­li­a­men­tar­i­ans have some ac­tions that they con­sider “more bad” than other ac­tions.

How’s this for a for­mal­iza­tion: Our par­li­a­ment faces a se­ries of de­ci­sions d[i]. For any given de­ci­sion the par­li­a­ment faces, there are a se­ries of choices d[i][j] that could be made re­gard­ing it. (d[i][0], d[i][1], etc.)

Over any given ses­sion of a par­li­a­ment, the par­li­a­ment faces ev­ery de­ci­sion d[i] and for each de­ci­sion it faces, makes a choice d[i][j] re­gard­ing how to ad­dress it. A struc­ture con­tain­ing all the de­ci­sions the par­li­a­ment faces and a choice for each is a “de­ci­sion record”. A par­li­a­men­tar­i­ans’ prefer­ences are speci­fied by an or­der­ing of de­ci­sion records from most preferred to least preferred. The to­tal num­ber of pos­si­ble de­ci­sion records is equal to the product of the num­bers of choices for each in­di­vi­d­ual de­ci­sion.

• I like the for­mal­iza­tion, but it seems to miss a key fea­ture of the par­li­a­men­tary model. Per Bostrom,

[...] even a rel­a­tively weak the­ory can still get its way on some is­sues that the the­ory think are ex­tremely im­por­tant by sac­ri­fic­ing its in­fluence on other is­sues that other the­o­ries deem more im­por­tant. For ex­am­ple, sup­pose you as­sign 10% prob­a­bil­ity to to­tal util­i­tar­i­anism and 90% to moral ego­ism (just to illus­trate the prin­ci­ple). Then the Par­li­a­ment would mostly take ac­tions that max­i­mize ego­is­tic satis­fac­tion; how­ever it would make some con­ces­sions to util­i­tar­i­anism on is­sues that util­i­tar­i­anism thinks is es­pe­cially im­por­tant. In this ex­am­ple, the per­son might donate some por­tion of their in­come to ex­is­ten­tial risks re­search and oth­er­wise live com­pletely self­ishly.

If prefer­ences are only defined by an or­der­ing of pos­si­ble out­comes, then you would get some­thing like this:

• To­tal Utili­tar­ian := (Donate 100% of in­come to ex­is­ten­tial risk re­duc­tion and oth­er­wise be­have self­lessly, Donate 100% to x-risk and be­have ego­is­ti­cally, Donate 40% and be­have self­lessly, Donate 40% and be­have ego­is­ti­cally, 0% and self­less, 0% and ego­is­tic)

• Ego­ist := Re­v­erse(To­tal Utili­tar­ian)

Then what par­tic­u­lar rea­son do we have to ex­pect them to end up com­pro­mis­ing at [40% and ego­is­tic], rather than (say) [0% and self­less]? Ob­vi­ously the to­tal util­i­tar­ian would much pre­fer to donate 40% of their in­come to x-risk re­duc­tion and be­have self­ishly in in­ter­per­sonal cir­cum­stances than to do the re­verse (donate noth­ing but take time out to help old ladies across the road, etc.). But any sys­tem for ar­riv­ing at the fairer com­pro­mise just on the ba­sis of those or­di­nal prefer­ences over de­ci­sions could be ma­nipu­lated into de­cid­ing differ­ently just by in­tro­duc­ing [39.9% and ego­is­tic] or [0.1% and self­less] as a bill, or what­ever. The car­di­nal as­pect of the to­tal util­i­tar­ian’s prefer­ence is key to be­ing able to con­sis­tently de­cide what trade­offs that philos­o­phy would be will­ing to make.

(NB: I’m aware that I’m be­ing ter­ribly un­fair to the ob­ject-level moral philoso­phies of ego­ism and to­tal util­i­tar­i­anism, but I hope that can be for­given along with my ter­rible no­ta­tion in ser­vice of the broader point)

Edit: gjm puts it better

• Can’t we use a hi­er­ar­chy of or­di­nal num­bers and a differ­ent or­di­nal sum (e.g. maybe some­thing of Con­way’s) in our util­ity calcu­la­tions?

That is, ly­ing would be in­finitely bad, but ly­ing ten times would be in­finitely worse.

• To avoid the time­less­ness is­sue, the par­li­a­ment could be en­vi­sioned as vot­ing on com­plete courses of ac­tion over the fore­see­able fu­ture, rather than sep­a­rate votes taken on each ac­tion. Then the de­on­tol­o­gists’ util­ity func­tion could re­turn 0 for all un­ac­cept­able courses of ac­tion and 1 for all ac­cept­able courses of ac­tion.

• Maybe de­on­tolog­i­cal the­ory can be for­mal­ized as par­li­a­men­tary frac­tion that have the only one right op­tion for each de­ci­sion and always vote for this op­tion and can’t be bar­gained to change its vote. This for­mal­iza­tion have an un­for­tu­nate con­se­quence: if some de­on­tolog­i­cal the­ory have more then the 50% cre­dence, agent will always act on it. But if no de­on­tolog­i­cal the­ory have more then the 50% frac­tion, this for­mal­iza­tion can be rea­son­able.

• One route to­wards analysing this would be to iden­tify a unit of cur­rency which was held in roughly equal value by all del­e­gates (at least at the mar­gin), so that we can analyse how much they value other things in terms of this unit of cur­rency—this could lead to mar­ket prices for things (?).

Per­haps a nat­u­ral choice for a cur­rency unit would be some­thing like ‘unit of to­tal say in the par­li­a­ment’. So for ex­am­ple a 1% chance that things go the way of your the­ory, ap­plied be­fore what­ever else would hap­pen.

I’m not sure if this could even work, just throw­ing it out there.

• The idea of ex­plicit vote-sel­l­ing is prob­a­bly the eas­iest way to have ‘en­force­able con­tracts’ with­out things get­ting par­tic­u­larly sticky. (If you have or­dered votes and no en­force­able con­tracts, then vote or­der be­comes su­per im­por­tant and trad­ing ba­si­cally breaks apart. But if you have or­dered votes and vote sales, then trad­ing is still pos­si­ble be­cause the votes can’t switch.)

But I don’t think the prices are go­ing to be that in­ter­est­ing- if the vote’s on the edge, then all votes are valuable, but as soon as one vote changes hand the im­me­di­ate price of all votes drops back to 0. Calcu­lat­ing the value of, say, amass­ing enough votes to de­ter any trad­ing on that vote seems like it might add a lot of murk­i­ness with­out much in­creased effi­ciency.

• The vot­ing sys­tem is set up to avoid these edge effects. From the open­ing post:

(Ac­tu­ally, we use an ex­tra trick here: we imag­ine that the del­e­gates act as if the Par­li­a­ment’s de­ci­sion were a stochas­tic vari­able such that the prob­a­bil­ity of the Par­li­a­ment tak­ing ac­tion A is pro­por­tional to the frac­tion of votes for A. This has the effect of elimi­nat­ing the ar­tifi­cial 50% thresh­old that oth­er­wise gives a ma­jor­ity bloc ab­solute power. Yet – un­be­knownst to the del­e­gates – the Par­li­a­ment always takes what­ever ac­tion got the most votes: this way we avoid pay­ing the cost of the ran­dom­iza­tion!)

• Hm, some­how I failed to no­tice that. It’s not clear to me that you want to avoid the edge effects, though; del­e­gates might trade away in­fluence on con­tentious is­sues (where we have sig­nifi­cant moral un­cer­tainty) to dou­ble down on set­tled is­sues (where we have in­signifi­cant moral un­cer­tainty), if the set­tled is­sues are suffi­ciently im­por­tant. Eliezer’s con­cern that del­e­gates could threaten to vote ‘no’ on some­thing im­por­tant would make oth­ers des­per­ately buy their votes away from them- un­less you have a non­lin­ear­ity which makes the del­e­gates se­cure that a lone fili­buster won’t cause trou­ble.

On sec­ond thought, though, it seems likely to be de­sir­able that del­e­gates /​ the par­li­a­ment would be­have lin­early in the prob­a­bil­ity of var­i­ous moral the­o­ries. The con­cern is mostly that this means we’ll end up do­ing av­er­ag­ing, and noth­ing much more in­ter­est­ing.

• Is there some way to rephrase this with­out both­er­ing with the par­li­a­ment anal­ogy at all? For ex­am­ple, how about just hav­ing each moral the­ory as­sign the available ac­tions a “good­ness num­ber” (ba­si­cally ex­pected util­ity). Nor­mal­ize the good­ness num­bers some­how, then just take the weighted av­er­age across moral the­o­ries to de­cide what to do.

If we nor­mal­ize by di­vid­ing each moral the­ory’s an­swers by its biggest-mag­ni­tude an­swer, (only closed sets of ac­tions al­lowed :) ) I think this re­gen­er­ates the de­scribed be­hav­ior, though I’m not sure. Ob­vi­ously this cuts out “hu­man-ish” be­hav­ior of par­li­a­ment mem­bers, but I think that’s a fea­ture, since they don’t ex­ist.

• There’s a fam­ily of ap­proaches here, but it’s not clear that they recre­ate the same be­havi­our as the par­li­a­ment (at least with­out more ar­gu­ments about the par­li­a­ment). Whether they are more or less de­sir­able is a sep­a­rate ques­tion.

In­ci­den­tally, the ver­sion that you sug­gest isn’t quite well-defined, since it can be changed by adding a con­stant to the the­ory of a func­tion. But that can eas­ily be patched over.

I’ve ar­gued that nor­mal­is­ing the var­i­ance of the func­tions is the most nat­u­ral of these ap­proaches (link to a pa­per giv­ing the ar­gu­ments in a so­cial choice con­text; forth­com­ing pa­per with Ord and MacAskill in the moral un­cer­tainty con­text).

• I like the origi­nal­ity of the ge­o­met­ric ap­proach. I don’t think it’s su­per use­ful, but then again you made good use of it in The­o­rem 19, so that shows what I know.

I found the sec­tion on vot­ing to need re­vi­sion for clar­ity. Is the idea that each voter sub­mits a func­tion, the out­comes are nor­mal­ized and summed, and the out­come with the high­est value wins (like in range vot­ing—ex­cept fixed-var­i­ance vot­ing)? Either I missed the ex­pla­na­tion or you need to ex­plain this. Later in The­o­rem 14 you as­sumed that each agent voted with its util­ity func­tion (proved later in Thm 19, good work by the way, but please don’t as­sume it with­out com­ment ear­lier), and we need to re­mem­ber that all the way back in 4.0 you ex­plained why to nor­mal­ize v and u the same.

Over­all I’d like to see you move away from the shaky no­tion of “a pri­ori vot­ing power” in the con­clu­sion, by trans­lat­ing from the case of vot­ing back into the origi­nal case of moral philos­o­phy. I’m pretty sold that var­i­ance nor­mal­iza­tion is bet­ter than range nor­mal­iza­tion though.

• Thanks for the feed­back!

• I think the key benefit of the par­li­a­men­tary model is that the mem­bers will vote trade in or­der to max­i­mize their ex­pec­ta­tion.

• My sus­pi­cion is that this just cor­re­sponds to some par­tic­u­lar rule for nor­mal­iz­ing prefer­ences over strate­gies. The “amount of power” given to each fac­tion is capped, so that even if some fac­tion has an ex­treme opinion about one is­sue it can only ex­press it­self by be­ing more and more will­ing to trade other things to get it.

If good­ness num­bers are nor­mal­ized, and some moral the­ory wants to ex­press a large rel­a­tive prefer­ence for one thing over an­other, it can’t just crank up the num­ber on the thing it likes—it must flat­ten the con­trast of things it cares less about in or­der to ex­press a more ex­treme prefer­ence for one thing.

• I pro­pose to work through a sim­ple ex­am­ple to check whether it al­igns with the meth­ods which nor­mal­ise prefer­ences and sum even in a sim­ple case.

Setup:

• The­ory I, with cre­dence p, and and The­ory II with cre­dence 1-p.

• We will face a de­ci­sion ei­ther be­tween A and B (with prob­a­bil­ity 50%), or be­tween C and D (with prob­a­bil­ity 50%).

• The­ory I prefers A to B and prefers C to D, but cares twice as much about the differ­ence be­tween A and B as that be­tween C and D.

• The­ory II prefers B to A and prefers D to C, but cares twice as much about the differ­ence be­tween D and C as that be­tween B and A.

Ques­tions: What will the bar­gain­ing out­come be? What will nor­mal­i­sa­tion pro­ce­dures do?

• Nor­mal­i­sa­tion pro­ce­dures: if they are ‘struc­tural’ (not car­ing about de­tails like the names of the the­o­ries or out­comes), then the two the­o­ries are sym­met­ric, so they must be nor­mal­ised in the same way. WLOG, as fol­lows:

T1(A) = 2, T1(B) = 0, T1(C) = 1, T1(D) = 0 T2(A) = 0, T2(B) = 1, T2(C) = 0, T2(D) = 2

Then let­ting q = (1-p) the ag­gre­gate prefer­ences T are given by:

T(A) = 2p, T(B) = q, T(C) = p, T(D) = q

So:

• if p > 23, the ag­gre­gate chooses A and C

• if 13 < p < 23, the ag­gre­gate chooses A and D

• if p < 13, the ag­gre­gate chooses B and D

The ad­van­tage of this sim­ple set-up is that I didn’t have to make any as­sump­tions about the nor­mal­i­sa­tion pro­ce­dure be­yond that it is struc­tural. If the bar­gain­ing out­come agrees with this we may need to look at more com­pli­cated cases; if it dis­agrees we have dis­cov­ered some­thing already.

• For the bar­gain­ing out­come, I’ll as­sume we’re look­ing for a Nash Bar­gain­ing Solu­tion (as sug­gested in an­other com­ment thread).

The defec­tion point has ex­pected util­ity 3p/​2 for The­ory I and ex­pected util­ity 3q/​2 for The­ory II (us­ing the same no­ta­tion as I did in this com­ment).

I don’t see im­me­di­ately how to calcu­late the NBS from this.

• As­sume p = 23.

Then The­ory I has ex­pected util­ity 1, and The­ory 2 has ex­pected util­ity 12.

As­sume (x,y) is the solu­tion point, where x rep­re­sents prob­a­bil­ity of vot­ing for A (over B), and y rep­re­sents prob­a­bil­ity of vot­ing for C (over D). I claim with­out proof that the NBS has x=1 … seems hard for this not to be the case, but would be good to check it care­fully.

Then the util­ity of The­ory 1 for the point (1, y) = 1 + y/​2, and util­ity of The­ory 2 = 1 - y. To max­imise the product of the benefits over the defec­tion point we want to max­imise y/​2*(1/​2 - y). This cor­re­sponds to max­imis­ing y/​2 - y^2. Tak­ing the deriva­tive this hap­pens when y = 14.

• Note that the nor­mal­i­sa­tion pro­ce­dure leads to be­ing on the fence be­tween C and D at p = 23.

If I’m cor­rect in my ad-hoc ap­proach to calcu­lat­ing the NBS when p = 23, then this is firmly in the ter­ri­tory which prefers D to C. There­fore the par­li­a­men­tary model gives differ­ent solu­tions to any nor­mal­i­sa­tion pro­ce­dure.

• My sus­pi­cion is that this just cor­re­sponds to some par­tic­u­lar rule for nor­mal­iz­ing prefer­ences over strate­gies.

Yes, as­sum­ing that the del­e­gates always take any available Pareto im­prove­ments, it should work out to that [edit: nev­er­mind; I didn’t no­tice that owencb already showed that that is false]. That doesn’t nec­es­sar­ily make the par­li­a­men­tary model use­less, though. Find­ing nice ways to nor­mal­ize prefer­ences is not easy, and if we end up de­riv­ing some such nor­mal­iza­tion rule with de­sir­able prop­er­ties from the par­li­a­men­tary model, I would con­sider that a suc­cess.

• Harsanyi’s the­o­rem will tell us that it will af­ter the fact be equiv­a­lent to some nor­mal­i­sa­tion—but the way you nor­mal­ise prefer­ences may vary with the set of prefer­ences in the par­li­a­ment (and the cre­dences they have). And from a calcu­la­tion el­se­where in this com­ment thread I think it will have to vary with these things.

I don’t know if such a thing is still best thought of as a ‘rule for nor­mal­is­ing prefer­ences’. It still seems in­ter­est­ing to me.

• Yes, that sounds right. Harsanyi’s the­o­rem was what I was think­ing of when I made the claim, and then I got con­fused for a while when I saw your coun­terex­am­ple.

• This ac­tu­ally sounds plau­si­ble to me, but I’m not sure how to work it out for­mally. It might make for a supris­ing and in­ter­est­ing re­sult.

• I think there’s already been a Stu­art Arm­strong post con­tain­ing the es­sen­tial ideas, but I can’t find it. So ask­ing him might be a good start.

• To me it looks like the main is­sues are in con­figur­ing the “del­e­gates” so that they don’t “ne­go­ti­ate” quite like real agents—for ex­am­ple, there’s no del­e­gate that will threaten to adopt an ex­tremely nega­tive policy in or­der to gain ne­go­ti­at­ing lev­er­age over other del­e­gates.

The part where we talk about these ne­go­ti­a­tions seems to me like the main pres­sure point on the moral the­ory qua moral the­ory—can we point to a form of ne­go­ti­a­tion that is iso­mor­phic to the “right an­swer”, rather than just be­ing an awk­ward tool to get closer to the right an­swer?

• The threats prob­lem seems like a spe­cific case of prob­lems that might arise by putting real in­tel­li­gence in to the agents in the sys­tem. Espe­cially if this moral the­ory was be­ing run on a su­per­in­tel­li­gent AI, it seems like the agents might be able to come up with all sorts of cre­ative un­ex­pected stuff. And I’m doubt­ful that cre­ative un­ex­pected stuff would make the par­li­a­ment’s de­ci­sions more iso­mor­phic to the “right an­swer”.

One way to solve this prob­lem might be to drop any no­tion of “in­tel­li­gence” in the del­e­gates and in­stead spe­cific a de­ter­minis­tic al­gorithm that any in­di­vi­d­ual del­e­gate fol­lows in de­cid­ing which “deals” they ac­cept. Or take the same idea even fur­ther and spec­ify a de­ter­minis­tic al­gorithm for re­solv­ing moral un­cer­tainty that is merely in­spired by the func­tion of par­li­a­ments, in the same sense that the sta­ble mar­riage prob­lem and al­gorithms for solv­ing it could have been in­spired by the way peo­ple de­cide who to marry.

Eliezer’s no­tion of a “right an­swer” sounds ap­peal­ing, but I’m a lit­tle skep­ti­cal. In com­puter sci­ence, it’s pos­si­ble to prove that a par­tic­u­lar al­gorithm, when run, will always achieve the max­i­mal “score” on a crite­rion it’s at­tempt­ing to op­ti­mize. But in this case, if we could for­mal­ize a score we wanted to op­ti­mize for, that would be equiv­a­lent to solv­ing the prob­lem! That’s not to say this is a bad an­gle of ap­proach, how­ever… it may be use­ful to take the idea of a par­li­a­ment and use it to for­mal­ize a scor­ing sys­tem that cap­tures our in­tu­itions about how differ­ent moral the­o­ries trade off and then max­i­mize this score us­ing what­ever method seems to work best. For ex­am­ple waves hands per­haps we could score the to­tal re­gret of our par­li­a­men­tar­i­ans and min­i­mize that.

Another ap­proach might be to for­mal­ize a set of crite­ria that a good solu­tion to the prob­lem of moral un­cer­tainty should achieve and then set out to de­sign an al­gorithm that achieves all of these crite­ria. In other words, mak­ing a for­mal prob­lem de­scrip­tion that’s more like that of the sta­ble mar­riage prob­lem and less like that of the as­sign­ment prob­lem.

So one plan of at­tack on the moral un­cer­tainty prob­lem might be:

• Gen­er­ate a bunch of “prob­lem de­scrip­tions” for moral un­cer­tainty that spec­ify a set of crite­ria to satisfy/​op­ti­mize.

• Figure out which “prob­lem de­scrip­tion” best fits our in­tu­itions about how moral un­cer­tainty should be solved.

• Find an al­gorithm that prov­ably solves the prob­lem as speci­fied in its de­scrip­tion.

• I was think­ing last night of how vote trad­ing would work in a com­pletely ra­tio­nal par­li­a­men­tary sys­tem. To sim­plify things a bit, lets as­sume that each is­sue is bi­nary, each del­e­gate holds a po­si­tion on ev­ery is­sue, and that po­si­tion can be nor­mal­ized to a 0.0 − 1.0 rank­ing. (e.g. If I have a 60% be­lief that I will gain 10 util­ity from this is­sue be­ing ap­proved, it may have a nor­mal­ized score of .6, if it is a 100% be­lief that I will gain 10 util­ity it may be a .7, while a 40% chance of −1000 util­ity may be a .1) The map­ping func­tion doesn’t re­ally mat­ter too much, as long as it can map to the 0-1 scale for sim­plifi­ca­tion.

The first point that seems rel­a­tively ob­vi­ous to me is that all ra­tio­nal agents will in­ten­tion­ally mis-state their util­ity func­tions as ex­tremes for bar­gain­ing pur­poses. In a trade, you should be able to get a much bet­ter ex­change by offer­ing to up­date from 0 to 1 than you would for up­dat­ing from 0.45 to 1, and as such, I would ex­pect all util­ity func­tion out­puts to be re­ported to oth­ers as ei­ther 1 or 0, which sim­plifies things even fur­ther, though in­ter­nally, each del­e­gate would keep their true ut­lity func­tion val­ues. (As a san­ity check, com­pare this to the cur­rent par­li­a­men­tary mod­els in the real world, where most poli­ti­ci­ans rep­re­sent their ideals pub­li­cly as ei­ther strongly for or strongly against)

The sec­ond in­ter­est­ing point I no­ticed is that with the vot­ing sys­tem as pro­posed, where ev­ery ad­di­tional vote grants ad­di­tional prob­a­bil­ity of the mea­sure be­ing en­acted, ev­ery vote counts. This means it is always a good trade for me to ex­change votes when my ex­pected value of the is­sue you are chang­ing po­si­tion on is higher than my ex­pected value of the po­si­tion I am chang­ing po­si­tion on. This leads to a situ­a­tion, where I am bet­ter off chang­ing po­si­tions on ev­ery is­sue ex­cept the one that brings me the most util­ity in ex­change for votes on the is­sue that brings me the most util­ity. Essen­tially, this means that the only is­sue that mat­ters to an in­di­vi­d­ual del­e­gate is the is­sue that po­ten­tially brings them the most util­ity, and the rest of the is­sues are just fod­der for trad­ing.

Given the first point I men­tioned, that all val­ues should be ex­ter­nally rep­re­sented as ei­ther 1 or 0, it seems that any vote trade will be a straight 1 for 1 trade. I haven’t ex­actly worked out the math here, but I’m pretty sure that for an ar­bi­trar­ily large par­li­a­ment with an ar­bi­trar­ily large num­ber of is­sues (to be used for trad­ing), the re­sult of any given vote will be de­ter­mined by the pro­por­tion of del­e­gates hold­ing that is­sue as ei­ther their high­est or low­est util­ity is­sue, with the rest of the del­e­gates trad­ing their votes on that is­sue for votes on an­other is­sue they find to be higher util­ity. (As a sec­ond san­ity check, this also seems to con­form closely to re­al­ity with the way lob­by­ist groups push sin­gle is­sues and poli­ti­ci­ans trade votes to push their pet is­sues through the vote.)

This is prob­a­bly an over­sim­plified case, but I thought I’d throw it for dis­cus­sion to see if it sparked any new ideas.

• The first point that seems rel­a­tively ob­vi­ous to me is that all ra­tio­nal agents will in­ten­tion­ally mis-state their util­ity func­tions as ex­tremes for bar­gain­ing pur­poses.

Be­cause we’re work­ing in an ideal­ised hy­po­thet­i­cal, we could de­cree that they can’t do this (they must all wear their true util­ity func­tions on their sleeves). I don’t see a dis­ad­van­tage to de­mand­ing this.

• If what you say is true about all trades be­ing 1-for-1, that seems more like a bug than a fea­ture; if an agent doesn’t have any votes valuable enough to sway oth­ers, it seems like I’d want them to be able (i.e. prop­erly in­cen­tivized) to offer more votes, so that the sys­tem over­all can re­flect the ag­gre­gate’s val­ues more sen­si­tively. I don’t have a for­mal crite­rion that says why this would be bet­ter, but maybe that points to­wards one.

• Any par­li­a­men­tary model will in­volve vot­ing.

When vot­ing ar­rows im­pos­si­bly the­orm is go­ing to im­pose con­straints that can’t be avoided http://​​en.m.wikipe­dia.org/​​wiki/​​Ar­row’s_im­pos­si­bil­ity_theorem

In par­tic­u­lar it is im­pos­si­ble to have all of the below

If ev­ery voter prefers al­ter­na­tive X over al­ter­na­tive Y, then the group prefers X over Y. If ev­ery voter’s prefer­ence be­tween X and Y re­mains un­changed, then the group’s prefer­ence be­tween X and Y will also re­main un­changed (even if vot­ers’ prefer­ences be­tween other pairs like X and Z, Y and Z, or Z and W change). There is no “dic­ta­tor”: no sin­gle voter pos­sesses the power to always de­ter­mine the group’s prefer­ence.

So it’s worth­while to pick which bul­let to bite first and de­sign with that in mind as a limi­ta­tion rather than just get­ting started and later on re­al­ize you’re boxed into a cor­ner on this point.

[will re­for­mat when not typ­ing on phone]

• So it’s worth­while to pick which bul­let to bite first and de­sign with that in mind as a limi­ta­tion rather than just get­ting started and later on re­al­ize you’re boxed into a cor­ner on this point.

The eas­iest bul­let to bite is the “or­di­nal prefer­ences” bul­let. If you al­low the group to be in­differ­ent be­tween op­tions, then the im­pos­si­bil­ity dis­ap­pears. (You may end up with a group that uses a sen­si­ble vot­ing rule that is in­differ­ent be­tween all op­tions, but that’s be­cause the group is bal­anced in its op­po­si­tion.)

• This doesn’t work so well if you want to use it as a de­ci­sion rule. You may end up with some rank­ing which leaves you in­differ­ent be­tween the top two op­tions, but then you still need to pick one. I think you need to ex­plain why what­ever pro­cess you use to do that wasn’t con­sid­ered part of the vot­ing sys­tem.

• This doesn’t work so well if you want to use it as a de­ci­sion rule. You may end up with some rank­ing which leaves you in­differ­ent be­tween the top two op­tions, but then you still need to pick one. I think you need to ex­plain why what­ever pro­cess you use to do that wasn’t con­sid­ered part of the vot­ing sys­tem.

It seems to me that de­ci­sion rules that per­mit in­differ­ence are more use­ful than de­ci­sion rules that do not per­mit in­differ­ence, be­cause fun­gi­bil­ity of ac­tions is a use­ful prop­erty. That is, I would view the de­ci­sion rule as ex­press­ing prefer­ences over classes of ac­tions, but not spec­i­fy­ing which of the ac­tions to take within the class be­cause it doesn’t see a differ­ence be­tween them. Con­sid­er­ing Buri­dan’s Ass, it would rather “go eat hay” than “not go eat hay,” but doesn’t have a high-level prefer­ence for the left or right bale of hay, just like it doesn’t have a prefer­ence whether it starts walk­ing with its right hoof or its left hoof.

Some­thing must have a prefer­ence—per­haps the Ass is right-hoofed, and so it leads with its right hoof and goes to the right bale of hay—but treat­ing that de­ci­sion as its own prob­lem of smaller scope seems su­pe­rior to me than spec­i­fy­ing ev­ery pos­si­ble de­tail in the high-level de­ci­sion prob­lem.

• If ev­ery voter’s prefer­ence be­tween X and Y re­mains un­changed, then the group’s prefer­ence be­tween X and Y will also re­main un­changed (even if vot­ers’ prefer­ences be­tween other pairs like X and Z, Y and Z, or Z and W change).

This is the con­di­tion I want to give up on. I’m not even con­vinced that it’s de­sir­able.

• Can MPs have un­known util­ity func­tions? For ex­am­ple, I might have a rel­a­tively low con­fi­dence in all ex­plic­itly for­mu­lated moral the­o­ries, and want to give a num­ber of MPs to Sys­tem 1 - but I don’t know in ad­vance how Sys­tem 1 will vote. Is that prob­lem out­side the scope of the par­li­a­men­tary model (i.e., I can’t nom­i­nate MPs who don’t “know” how they will vote)?

Can MPs have un­de­cid­able prefer­ence or­der­ings (or sub-or­der­ings)? E.g., such an MP might have some moral ax­ioms that provide or­der­ings for some bills but not oth­ers.