Our values are underdefined, changeable, and manipulable

Cross­posted at In­tel­li­gent Agent Fo­rum.

When asked whether “com­mu­nist” jour­nal­ists could re­port freely from the USA, only 36% of 1950 Amer­i­cans agreed. A fol­low up ques­tion about Ame­rian jour­nal­ists re­port­ing freely from the USSR got 66% agree­ment. When the or­der of the ques­tions was re­versed, 90% were in favour of Amer­i­can jour­nal­ists—and an as­tound­ing 73% in favour of the com­mu­nist ones.

There are many ex­am­ples of sur­vey re­sponses de­pend­ing on ques­tion or­der, or sub­tle is­sues of phras­ing.

So there are peo­ple whose an­swers de­pended on ques­tion or­der. What then are the “true” val­ues of these in­di­vi­d­u­als?

Un­derde­ter­mined values

I think the best way of char­ac­ter­is­ing their val­ues is to call them “un­der­de­ter­mined”. There were/​are pre­sum­ably some peo­ple for which uni­ver­sal free­dom of the press or strict na­tional se­cu­rity were firm and es­tab­lished val­ues. But for most, there were pre­sum­ably some soft ver­sions of free­dom of the press and na­tion­al­ism, and the first ques­tion trig­gered one nar­ra­tive more strongly than the other. What then, are their “real” val­ues? That’s the wrong ques­tion—akin to ask­ing if Ar­gentina re­ally won the 1986 world cup.

Poli­ti­ci­ans can change the opinions of a large sec­tor of the vot­ing pub­lic with a sin­gle pro­nounce­ment—were the peo­ple’s real opinions the ones be­fore, or the ones af­ter? Again, this seems to be the wrong ques­tion. But don’t peo­ple fret about this in­con­sis­tency? I’d wa­ger that they aren’t re­ally aware of this, be­cause peo­ple are the most change­able on is­sues they’ve given the least thought to.

And ra­tio­nal­ists and EAs are not im­mune to this—we pre­sum­ably don’t shift much on what we iden­tify as our core val­ues, but on less im­por­tant val­ues, we’re prob­a­bly as change­able as any­one. But such con­tin­gent val­ues can be­come very strong if at­tacked, thus be­com­ing a core part of our iden­tity—even if it’s very plau­si­ble we could have held the op­po­site po­si­tion in a world slightly differ­ent.

Frame­works and moral updating

Peo­ple of­ten rely on a small num­ber of moral frame­works and prin­ci­ples to guide them. When a new moral is­sue arises, we gen­er­ally try and fit it into a moral frame­work—and when there are mul­ti­ple ones that could fit, we can go in mul­ti­ple di­rec­tions, driven by mood, bias, trib­al­ism, and many other con­tin­gent fac­tors.

The moral frame­works them­selves can and do shift, due to is­sues like trib­al­ism, cog­ni­tive dis­so­nance, life ex­pe­rience, and our own self-anal­y­sis. Or the frame­works can ac­cu­mu­late so many ex­cep­tions or re­fine­ments, that they trans­form in prac­tice if not in name—it’s very in­ter­est­ing that my leftist opinions agree with An­ders Sand­berg’s liber­tar­ian opinions on most im­por­tant is­sues. We seem to have changed po­si­tions with­out chang­ing la­bels.


In a sense, you could see all of metaethics as the re­fine­ment and anal­y­sis of these frame­works. There are urges to­wards sim­plic­ity, to get a more sta­ble and el­e­gant sys­tem, and to­wards com­plex­ity, to cap­ture the full spec­trum of hu­man val­ues. Much of philo­soph­i­cal dis­agree­ment can be seen as “Given A, propo­si­tion B (gen­er­ally ac­cept­able con­clu­sion) im­plies C (con­tro­ver­sial po­si­tion I en­dorse)”, to which the re­sponse is “C is wrong, thus A (or B) is wrong as stated and needs to be re­fined or de­nied”—the logic is gen­er­ally ac­cepted, but which po­si­tion is kept varies.

Since eth­i­cal dis­agree­ments are rarely re­solved, it’s likely that the po­si­tions of pro­fes­sional philoso­phers, though more con­sis­tent, are also of­ten driven by con­tin­gent and ran­dom fac­tors. The pro­cess is not com­pletely ran­dom—eth­i­cal ideas that are the least co­her­ent, like the moral foun­da­tion of pu­rity, tend to get dis­carded—but is cer­tainly con­tin­gent. As be­fore, I ar­gue you should fo­cus on the pro­ce­dure P by which philoso­phers up­date their opinions, rather than the (hy­po­thet­i­cal) R to which P may be sup­posed to con­verge to.

Most peo­ple, how­ever, will not have con­sis­tent meta-ethics, as they haven’t con­sid­ered these ques­tions. So their meta-opinions there will be even more sub­ject to ran­dom in­fluences that their base-level opinions.

Fu­ture preferences

There is an ur­gent ques­tion di­vid­ing the fu­ture world: should lo­cal FLOOBS be al­lowed to re­strict use of BLARGS, or in­stead ORFOILS should pres­sure COLATS to agree to FLAPPLE the SNARFS.

Ok, we don’t cur­rently know what fu­ture poli­ti­cal is­sues will be, but it’s clear there will be new is­sues (how do we know this? Be­cause no­body cares to­day whether Richard Lion­heart and Phillip Au­gust of France lacked in their feu­dal du­ties to each other, nor did the peo­ple of that pe­riod worry much about med­i­cal tort re­form). And peo­ple will take po­si­tions on them, and they will be in­cor­po­rated into moral frame­works, caus­ing those frame­works to change, and even­tu­ally philoso­phers may in­cor­po­rate enough change into new metaeth­i­cal frame­works.

I think it’s fair to say that our cur­rent po­si­tions on these fu­ture is­sues are even more un­der-de­ter­mined than most of our val­ues.

Contin­gent means manipulable

If our fu­ture val­ues are de­ter­mined by con­tin­gent facts, then a suffi­ciently pow­er­ful and in­tel­li­gent agent can ma­nipu­late our val­ues, by ma­nipu­lat­ing those facts. How­ever, with­out some sort of learn­ing-pro­cesses-with-con­tin­gent-facts, our val­ues are un­der­de­ter­mined, and hence an agent that wanted to max­imise hu­man val­ues/​re­ward wouldn’t know what to do.

It was this re­al­i­sa­tion, that the agent could ma­nipu­late the val­ues it was sup­posed to max­imise, that caused me to look at ways of avoid­ing this.

Choices need to be made

We want a safe way to re­solve the un­der-de­ter­mi­na­tion in hu­man val­ues, a task that gets more and more difficult as we move away from the usual world of to­day and into the hy­po­thet­i­cal world that a su­per­pow­ered AI could build.

But, pre­cisely be­cause of the un­der-de­ter­mi­na­tion, there are do­ing to be mul­ti­ple ways of re­solv­ing this safely. Which means that choices will need to be made as to how to do so. The pro­cess of mak­ing hu­man val­ues fully rigor­ous, is not value-free.

(A minor ex­am­ple, that illus­trated for me a tiny part of the challenge: does the way we be­have when we’re drunk re­veal our true val­ues? And the an­swer: do you want it to? If there is a di­ver­gence in drunk and sober val­ues, then ac­com­mo­dat­ing drunk val­ues is a de­ci­sion—one that will likely be made sober.)