By default, avoid ambiguous distant situations

“The Ood.[...] They’re born for it. Ba­sic slave race.” Mr. Jeffer­son, The Im­pos­si­ble Planet.

“And al­though he may be poor, he shall never be a slave,”, from the Bat­tle Cry of Free­dom.

I’ve talked about morally dis­tant situ­a­tions: situ­a­tions far re­moved from our own. This dis­tance is in the moral sense, not the ac­tual sense: an­cient China is fur­ther from us than Star Wars is (even for mod­ern Chi­nese). There are pos­si­ble wor­lds out there far more alien than any­thing in most of our fic­tion.

In these dis­tant situ­a­tions, our usual web of con­no­ta­tions falls apart, and closely re­lated terms start to mean differ­ent things. As shown in this post, in ex­treme cases, our prefer­ences can be­come non­sen­si­cal.

Note that not all our prefer­ences need be­come non­sen­si­cal, just some of them. Con­sider the case of a slave race—a will­ing slave race. In that situ­a­tion, our con­no­ta­tions about slav­ery may come apart: will­ing and slave don’t fit well to­gether. How­ever, we would still be clear our­selves that we didn’t want to be slaves. So though some prefer­ences lose power, some do not.

But let’s re­turn to that will­ing slave race situ­a­tion. Eliezer’s Harry says:

Who­ever had cre­ated house elves in the first place had been un­speak­ably evil, ob­vi­ously; but that didn’t mean Hermione was do­ing the right thing now by deny­ing sen­tient be­ings the drudgery they had been shaped to en­joy.)

Or con­sider Dou­glas Adam cow-var­i­ant bread for will­ingly be­ing eaten:

“I just don’t want to eat an an­i­mal that’s stand­ing here invit­ing me to,” said Arthur, “it’s heartless.”

“Bet­ter than eat­ing an an­i­mal that doesn’t want to be eaten,” said Zaphod. [...]

“May I urge you to con­sider my liver?” asked the an­i­mal, “it must be very rich and ten­der by now, I’ve been force-feed­ing my­self for months.”

At X, the de­ci­sion is clear. Should we go to X in the first place?

In those situ­a­tions, some im­me­di­ate ac­tions are pretty clear. There is no point in free­ing a will­ing slave race; there is no ad­van­tage to eat­ing an an­i­mal that doesn’t want to be eaten, rather than one that does.

The longer-term ac­tions are more am­bigu­ous, es­pe­cially as they con­flict with other of our val­ues: for ex­am­ple, should we forcibly change the prefer­ences of the slave race/​ed­ible race so that they don’t have those odd prefer­ences any more? Does it make a differ­ence if there are more ma­nipu­la­tive paths that achieve the same re­sults, with­out di­rectly forc­ing them? We may not want to al­low ma­nipu­la­tive paths to count as ac­cept­able in gen­eral.

But, lay­ing that aside, it seems there is a prima fa­cie case that we shouldn’t en­ter those kinds of situ­a­tions. That non-con­scious robots are bet­ter than con­scious will­ing slaves. That vat grown meat is bet­ter than con­scious will­ing live­stock.

So there seems to be a good rule of thumb: don’t go there. Add an ax­iom A:

  • A: When the web of con­no­ta­tions of a strong prefer­ence falls apart, those are situ­a­tions which should get an au­to­matic penalty. Ini­tially at least, those should be treated as bad situ­a­tions worth avoid­ing.

De­fault weights in dis­tant situations

When a web of con­no­ta­tion un­rav­els, the prefer­ences nor­mally end up weaker than ini­tially, be­cause some of the con­no­ta­tions of those prefer­ences are lost or even op­po­site. So, nor­mally, prefer­ences in these dis­tant situ­a­tions are quite weak.

But here I’m sug­gest­ing adding an ex­plicit meta-prefer­ence to these situ­a­tions. And one that the hu­man sub­ject might not have them­selves. This doesn’t fit in the for­mal­ism of this post. In the lan­guage of the forth­com­ing re­search agenda, this is a “Global meta-prefer­ences about the out­come of the syn­the­sis pro­cess”.

Isn’t this an over­rid­ing of the per­son’s prefer­ence? It is, to some ex­tent. But note the “Ini­tially at least” clause in A. If we don’t have other prefer­ences about the dis­tant situ­a­tion, it should be avoided. But this penalty can be over­come by other con­sid­er­a­tions.

For ex­am­ple, the pre­vi­ous stan­dard web of con­no­ta­tions for sex­u­al­ity has fallen apart, while the gen­der one is un­rav­el­ling; it’s perfectly pos­si­ble to have meta-prefer­ences that would have told us to re­spect our re­flec­tion on is­sues like that, and our re­flec­tion might be fine with these new situ­a­tions. Similarly, some (but not all) fu­ture changes to the hu­man con­di­tion are things that would worry me ini­tially but that I’d be ok with upon re­flec­tion; I my­self have strong meta-prefer­ences that these should be ac­cept­able.

But for situ­a­tions where our other prefer­ences and meta-prefer­ences don’t weigh in, A would down­grade these dis­tant wor­lds as a de­fault (dis)prefer­ence. This adds an ex­plicit level of sta­tus quo bias to our prefer­ences, which I feel is jus­tified: bet­ter to be pru­dent rather than reck­less where our prefer­ences and val­ues are con­cerned. The time for (po­ten­tial) reck­less­ness is in the im­ple­men­ta­tion of these val­ues, not their defi­ni­tion.