Non-Consequentialist Cooperation?

This is a very rough in­tu­ition pump for pos­si­ble al­ter­na­tives to value learn­ing.

In broad strokes, the goal of (am­bi­tious) value learn­ing is to define and im­ple­ment a no­tion of co­op­er­a­tion (or helpful­ness) in terms of two ac­tivi­ties: (1) figur­ing out what hu­mans value, (2) work­ing to op­ti­mize that.

I’m go­ing to try to sketch an al­ter­na­tive no­tion of co­op­er­a­tion/​helpful­ness. This in­tu­ition is based on liber­tar­ian or an­ar­cho-cap­i­tal­ist ideas, but in some ways seems closer to what hu­mans do when they try to help.


I was talk­ing to An­drew For­rester about the suffer­ing golem thought ex­per­i­ment. I’m not sure who came up with this thought ex­per­i­ment, but the idea is:

Suffer­ing Golem: A golem suffers ter­ribly in ev­ery mo­ment of its ex­is­tence, but it says it wants to keep liv­ing. Do you kill it?

The idea is that if you think it is good to kill it, you’re a he­do­nic util­i­tar­ian: your al­tru­is­tic mo­tives have to do with max­i­miz­ing plea­sure /​ min­i­miz­ing suffer­ing. If you think it should not be kil­led, then you’re more likely to be a prefer­ence util­i­tar­ian: your al­tru­is­tic mo­tives have to do with what the per­son val­ues for them­selves, rather than some other thing you think would be good for them. (I tend to lean to­ward prefer­ence util­i­tar­i­anism my­self, but don’t think the ques­tion is ob­vi­ous.)

An­drew For­rester was against kil­ling it, but jus­tified his an­swer with some­thing like “it’s none of your busi­ness. You could provide con­ve­nient means of suicide if you wanted...” This wasn’t an ex­pres­sion of not car­ing about the welfare of the golem. Rather, it was a way of say­ing that you want to pre­serve the golem’s au­ton­omy.

I re­al­ized that al­though his an­swer was con­sis­tent with prefer­ence util­i­tar­i­anism on the sur­face, it went be­yond. I think he would likely have a similar re­sponse to the fol­low­ing thought ex­per­i­ment:

Con­fused Golem: A golem hates ev­ery mo­ment of its ex­is­tence, and would pre­fer to die, but it is un­able to ad­mit the fact to it­self. It thinks that it loves life and wants to con­tinue liv­ing. Per­haps it could even­tu­ally re­al­ize that it preferred not to ex­ist if it thought about the ques­tion for long enough, but that day is a long way off. Do you kill it?

The au­ton­omy-pre­serv­ing move is to not kill the con­fused golem. You might talk to the golem about what it wants, but you wouldn’t ac­tively op­ti­mize for con­vinc­ing it that it ac­tu­ally wants to die. (That would sub­tract from its au­ton­omy.)

In­formed Con­sent?

If I imag­ine some­thing which is only mo­ti­vated to help, where “help” is in­ter­preted in the au­ton­omy-cen­tric way, it seems like the idea is an en­tity which only acts on your in­formed con­sent. It will sit and do noth­ing un­til it is con­fi­dent that there is some­thing which you want it to do, and un­der­stand the con­se­quences of it do­ing.

Imag­ine you buy a robot which runs on these rules. The robot sits pa­tiently in your house. Sit­ting there is not a vi­o­la­tion of its di­rec­tive, be­cause it did not place it­self there; what­ever the con­se­quences may be, they are a re­sult of your au­tonomous ac­tion. How­ever, it does watch you, which may vi­o­late in­formed con­sent. It has a high prior prob­a­bil­ity that you un­der­stand it is watch­ing and con­sent to this, be­cause the pack­ag­ing had promi­nent warn­ings about this. Watch­ing you is nec­es­sary for the robot to func­tion, since it must in­fer your con­sent. It may shut off if it in­fers that you do not un­der­stand this.

The robot will con­tinue do­ing noth­ing un­til it has gained con­fi­dence that you have fully read the in­struc­tion book­let, which con­tains the ba­sic facts about how the robot func­tions. You may then is­sue com­mands to the robot. The in­struc­tion book­let recom­mends that, be­fore you try this, you say “I con­sent to dis­cuss the mean­ing of my com­mands with you, the robot.” This clears the robot to ask clar­ify­ing ques­tions about com­mands and to tell you about the likely con­se­quences of com­mands if it does not think you un­der­stand them. Without giv­ing con­sent to this, the robot will of­ten fail to do any­thing with­out offer­ing any ex­pla­na­tion for its failure.

Another recom­mended com­mand is “Let’s dis­cuss how we can work to­gether.” This clears the robot to make gen­eral in­quiries about how it should be­have to­ward you. Once you is­sue this com­mand, for the du­ra­tion of the con­ver­sa­tion, the robot will for­mu­late in­tel­li­gent ques­tions about what you might want, what you like and dis­like, where you strug­gle in your life, and so on. It will make sug­ges­tions about how it can help, many in­vented on the spot. At some point dur­ing the con­ver­sa­tion it will likely ask if it should main­tain this level of can­dor go­ing for­ward, or if it should only dis­cuss its tasks in such an open-ended way upon re­quest.

Gen­tle Value Extraction

What the robot will ab­solutely not do dur­ing this ini­tial in­ter­view is pry into your per­sonal life with ques­tions op­ti­mized to ex­tract the max­i­mum use­ful in­for­ma­tion about your val­ues and life difficul­ties. Although that might be the most use­ful thing it could do dur­ing its ini­tial in­ter­view with you, it would break your au­ton­omy, be­cause many hu­mans are un­com­fortable dis­cussing cer­tain top­ics, and break­ing these norms is not a rea­son­able con­se­quence to ex­pect from the com­mand you’ve is­sued. Since hu­mans may not even wish to con­sent to the robot know­ing var­i­ous per­sonal de­tails (and may ac­ci­den­tally re­veal enough in­for­ma­tion for the robot to figure things out), the robot has to tread lightly in its in­fer­ences, too. Even ask­ing di­rectly whether a cer­tain topic is OK may be an un­wanted and un­ex­pected act, mak­ing it im­pos­si­ble to go there un­less the hu­man brings it up on their own ini­ti­a­tive.

The robot might not even try to gen­tly move the dis­cus­sion in the di­rec­tion of greater open­ness about pri­vate de­tails, be­cause “try­ing to get the hu­man to open up more” is not an ob­vi­ous con­se­quence of dis­cussing po­ten­tial tasks. But it isn’t ob­vi­ous; maybe try­ing to get peo­ple to open up is nor­mal enough for a con­ver­sa­tion that this is fine. The in­struc­tion book­let could warn users about it, mak­ing it an ex­pected con­se­quence and there­fore part of what is con­sented to.

Ex­plicit Con­sent vs In­ferred Consent

At this point, some­one might be think­ing “Why are you talk­ing about the robot in­fer­ring that the hu­man con­sents to cer­tain things as rea­son­able ex­pec­ta­tions of giv­ing cer­tain com­mands? Why give so much lee­way? We could just re­quire ex­plicit con­sent in­stead.”

Ex­plicit con­sent is so im­prac­ti­cal as to bor­der on mean­ingless­ness. We want the robot to have some au­ton­omy in how it ex­e­cutes com­mands. If it knows we like cream in our coffee, it makes sense for it to just put the cream in with­out ask­ing ev­ery time, or us is­su­ing a gen­eral rule. Cream in the coffee is a rea­son­able ex­pec­ta­tion. The way I think about it, an ex­plicit con­sent re­quire­ment would force us to ap­prove ev­ery mo­tor com­mand pre­cisely; the free­dom to in­tel­li­gently carry out com­plex tasks in re­sponse to com­mands re­quires a cer­tain amount of free­dom to in­fer con­sent.

Another way of think­ing about the prob­lem is that ex­plicit con­sent places dic­tio­nary-defi­ni­tion English in too much of a spe­cial po­si­tion. We can con­vey our mean­ing in any num­ber of ways. In a suffi­ciently in­for­ma­tion-rich con­text, a glance might be suffi­cient.

Turn­ing things the other way around, there are also cases when ex­plicit con­sent doesn’t make for in­ferred con­sent. If some­one is made to con­sent un­der duress, con­sent should not be in­ferred.

The biggest ar­gu­ment I see in fa­vor of ex­plicit con­sent is that it makes for a much lower risk of mi­s­un­der­stand­ing. Mi­sun­der­stand­ing is cer­tainly a se­ri­ous con­cern, and one rea­son why hu­mans of­ten re­quire ex­plicit con­sent in high-stakes situ­a­tions. How­ever, in the con­text of con­sent-based robotics, there are likely bet­ter ways of ad­dress­ing the con­cern:

  • Re­quiring higher con­fi­dence in in­ferred con­sent. This might be mod­u­lated by the in­ferred im­por­tance of the ques­tion in a situ­a­tion, so that ex­plicit con­sent is re­quired in prac­tice for any­thing of im­por­tance, due to the high con­fi­dence it es­tab­lishes. Mea­sur­ing “im­por­tance” in this way cre­ates its own po­ten­tial safety con­cerns, of course.

  • Us­ing highly ro­bust ma­chine-learn­ing tech­niques, so that spu­ri­ously high con­fi­dence in in­ferred con­sent is very un­likely.

What Does In­formed Con­sent Mean?

There’s a con­cep­tual prob­lem here which feels very similar to im­pact mea­sures. An im­pact mea­sure is sup­posed to quan­tify the ex­tent to which an ac­tion changes things in gen­eral. In­formed con­sent seems to re­quire that we quan­tify the de­gree to which a change fits within cer­tain ex­pec­ta­tions. The no­tion of “change” seems to be com­mon be­tween the two.

For ex­am­ple, at least ac­cord­ing to my in­tu­ition, an im­pact mea­sure should not pe­nal­ize an AI much for the but­terfly-effect changes in weather pat­terns which any ac­tion im­plies. The fu­ture will in­clude hur­ri­canes de­stroy­ing cities in a broad va­ri­ety of cir­cum­stances, and small ac­tions may cre­ate large changes in which hur­ri­canes /​ which cities. If a par­tic­u­lar ac­tion fore­see­ably changes the over­all im­pact of the hur­ri­cane/​city pat­tern on other im­por­tant vari­ables in a sig­nifi­cant way, then an im­pact mea­sure should pe­nal­ize it.

Similarly, a hu­man can have in­formed con­sent as to the con­se­quences of a robot go­ing to the gro­cery store and buy­ing ba­nanas with­out un­der­stand­ing all the con­se­quences on fu­ture weather pat­terns, even though this will in­volve some large changes to which hur­ri­canes de­stroy which cities at some point later. On the other hand, if the robot walks to the gro­cery store in just the right way so as to cause a se­ries of se­vere hur­ri­canes to tip the right dom­i­noes to cause a se­vere eco­nomic col­lapse which would oth­er­wise not have hap­pened, then this is a sig­nifi­cant un­ex­pected con­se­quence of go­ing to the gro­cery store which the hu­man would need to con­sent to sep­a­rately.

Hu­man Ra­tion­al­ity Assumptions

The bad news is that this ap­proach seems likely to run into es­sen­tially all the same con­cep­tual difficul­ties as value learn­ing, if not more. Although the con­cep­tual frame­work is not as strongly tied to VNM-style util­ity func­tions as value learn­ing is, the robot still needs to in­fer what the hu­man be­lieves and wants: be­lief for the “in­formed” part, and want for the “con­sent” part. This still sounds like it is most nat­u­rally for­mu­lated in a VNM-ish frame­work, al­though there may be other ways.

As such, it doesn’t seem like it helps any with the difficul­ties of as­sum­ing hu­man ra­tio­nal­ity.

Helping Animals

My friend men­tioned that the suffer­ing golem sce­nario de­pends a great deal on whether the golem is sen­tient. Mercy-kil­ling suffer­ing an­i­mals is OK, even good, with­out any con­sent. More gen­er­ally, there are lots of ac­cept­able ways of helping an­i­mals which break their au­ton­omy in sig­nifi­cant ways, such as tak­ing them to the vet against protests. One might say the same thing of chil­dren.

It isn’t ob­vi­ous what makes the differ­ence, but one idea might be: where there is no ca­pac­ity for in­formed con­sent, other prin­ci­ples may ap­ply. But what would this im­ply for hu­mans? There may be con­se­quences of ac­tions which we lack the ca­pac­ity to un­der­stand. Should the robot sim­ply try to op­ti­mize for our prefer­ences on those is­sues, with­out con­strain­ing ac­cept­able con­se­quences by con­sent?

How should an au­ton­omy-re­spect­ing robot in­ter­act with chil­dren? Re­spect­ing hu­man au­ton­omy ab­solutely might make it im­pos­si­ble to help with cer­tain house­hold tasks like chang­ing di­apers. If so, the ap­proach might not re­sult in very ca­pa­ble agents.

Re­spect­ing All Humans

So far, I’ve fo­cused on a thought ex­per­i­ment of a robot re­spect­ing the au­ton­omy of a sin­gle des­ig­nated user. Ul­ti­mately, it seems like an ap­proach to al­ign­ment has to deal with all hu­mans. How­ever, get­ting “con­sent” from all hu­mans seems im­pos­si­ble. How can a con­sent-based ap­proach ap­prove any ac­tions, then?

One idea is to only re­quire con­sent from hu­mans who are im­pacted by an ac­tion. Any ac­tion which im­pacts the whole fu­ture would re­quire con­sent from ev­ery­one (?), but low-im­pact ac­tions could be car­ried out with only con­sent from those in­volved.

It’s not clear to me how to ap­proach this ques­tion.

Con­nec­tions to Other Problems

  • As I men­tioned, this seems to con­nect to im­pact mea­sures.

  • The agent as de­scribed also may be a mild op­ti­mizer, be­cause (1) it has to avoid think­ing about things when those things are not un­der­stood con­se­quences of car­ry­ing out com­mands, (2) plans are con­strained by the hu­man prob­a­bil­ity dis­tri­bu­tion over plans, some­how (I’m not sure how it works, but there’s definitely an as­pect of “un­ex­pected plans are not al­lowed” in play here).

  • There is a con­nec­tion to trans­parency, in that im­pacts of ac­tions have to be de­scribed/​un­der­stood (and ap­proved) in or­der to be al­lowed.

  • The agent as I’ve de­scribed it sounds po­ten­tially cor­rigible, in that re­sis­tance to shut­down or mod­ifi­ca­tion would have to be an un­der­stood and ap­proved con­se­quence of a com­mand.