RSS

Com­plex­ity of Value

TagLast edit: 23 Dec 2020 18:30 UTC by Ben ten Berge

Complexity of value is the thesis that human values have high Kolmogorov complexity; that our preferences, the things we care about, cannot be summed by a few simple rules, or compressed. Fragility of value is the thesis that losing even a small part of the rules that make up our values could lead to results that most of us would now consider as unacceptable (just like dialing nine out of ten phone digits correctly does not connect you to a person 90% similar to your friend). For example, all of our values except novelty might yield a future full of individuals replaying only one optimal experience through all eternity.

Related: Ethics & Metaethics, Fun Theory, Preference, Wireheading

Many human choices can be compressed, by representing them by simple rules—the desire to survive produces innumerable actions and subgoals as we fulfill that desire. But people don’t just want to survive—although you can compress many human activities to that desire, you cannot compress all of human existence into it. The human equivalents of a utility function, our terminal values, contain many different elements that are not strictly reducible to one another. William Frankena offered this list of things which many cultures and people seem to value (for their own sake rather than strictly for their external consequences):

Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom; beauty, harmony, proportion in objects contemplated; aesthetic experience; morally good dispositions or virtues; mutual affection, love, friendship, cooperation; just distribution of goods and evils; harmony and proportion in one’s own life; power and experiences of achievement; self-expression; freedom; peace, security; adventure and novelty; and good reputation, honor, esteem, etc.

The “etc.” at the end is the tricky part, because there may be a great many values not included on this list.

Since natural selection reifies selection pressures as psychological drives which then continue to execute independently of any consequentialist reasoning in the organism or that organism explicitly representing, let alone caring about, the original evolutionary context, we have no reason to expect these terminal values to be reducible to any one thing, or each other.

Taken in conjunction with another LessWrong claim, that all values are morally relevant, this would suggest that those philosophers who seek to do so are mistaken in trying to find cognitively tractable overarching principles of ethics. However, it is coherent to suppose that not all values are morally relevant, and that the morally relevant ones form a tractable subset.

Complexity of value also runs into underappreciation in the presence of bad metaethics. The local flavor of metaethics could be characterized as cognitivist, without implying “thick” notions of instrumental rationality; in other words, moral discourse can be about a coherent subject matter, without all possible minds and agents necessarily finding truths about that subject matter to be psychologically compelling. An expected paperclip maximizer doesn’t disagree with you about morality any more than you disagree with it about “which action leads to the greatest number of expected paperclips”, it is just constructed to find the latter subject matter psychologically compelling but not the former. Failure to appreciate that “But it’s just paperclips! What a dumb goal! No sufficiently intelligent agent would pick such a dumb goal!” is a judgment carried out on a local brain that evaluates paperclips as inherently low-in-the-preference-ordering means that someone will expect all moral judgments to be automatically reproduced in a sufficiently intelligent agent, since, after all, they would not lack the intelligence to see that paperclips are so obviously inherently-low-in-the-preference-ordering. This is a particularly subtle species of anthropomorphism and mind projection fallacy.

Because the human brain very often fails to grasp all these difficulties involving our values, we tend to think building an awesome future is much less problematic than it really is. Fragility of value is relevant for building Friendly AI, because an AGI which does not respect human values is likely to create a world that we would consider devoid of value—not necessarily full of explicit attempts to be evil, but perhaps just a dull, boring loss.

As values are orthogonal with intelligence, they can freely vary no matter how intelligent and efficient an AGI is [1]. Since human /​ humane values have high Kolmogorov complexity, a random AGI is highly unlikely to maximize human /​ humane values. The fragility of value thesis implies that a poorly constructed AGI might e.g. turn us into blobs of perpetual orgasm. Because of this relevance the complexity and fragility of value is a major theme of Eliezer Yudkowsky’s writings.

Wrongly designing the future because we wrongly encoded human values is a serious and difficult to assess type of Existential risk. “Touch too hard in the wrong dimension, and the physical representation of those values will shatter—and not come back, for there will be nothing left to want to bring it back. And the referent of those values—a worthwhile universe—would no longer have any physical reason to come into being. Let go of the steering wheel, and the Future crashes.” [2]

Complexity of Value and AI

Complexity of value poses a problem for AI alignment. If you can’t easily compress what humans want into a simple function that can be fed into a computer, it isn’t easy to make a powerful AI that does things humans want and doesn’t do things humans don’t want. Value Learning attempts to address this problem.

Major posts

Other posts

See also

The Hid­den Com­plex­ity of Wishes

Eliezer Yudkowsky24 Nov 2007 0:12 UTC
112 points
135 comments7 min readLW link

Value is Fragile

Eliezer Yudkowsky29 Jan 2009 8:46 UTC
106 points
96 comments6 min readLW link

Com­plex­ity of Value ≠ Com­plex­ity of Outcome

Wei_Dai30 Jan 2010 2:50 UTC
57 points
232 comments3 min readLW link

Alan Carter on the Com­plex­ity of Value

Ghatanathoah10 May 2012 7:23 UTC
44 points
41 comments7 min readLW link

Disen­tan­gling ar­gu­ments for the im­por­tance of AI safety

Richard_Ngo21 Jan 2019 12:41 UTC
123 points
23 comments8 min readLW link

Have you felt ex­iert yet?

Stuart_Armstrong5 Jan 2018 17:03 UTC
28 points
7 comments1 min readLW link

31 Laws of Fun

Eliezer Yudkowsky26 Jan 2009 10:13 UTC
63 points
36 comments8 min readLW link

The two-layer model of hu­man val­ues, and prob­lems with syn­the­siz­ing preferences

Kaj_Sotala24 Jan 2020 15:17 UTC
68 points
15 comments9 min readLW link

The Gift We Give To Tomorrow

Eliezer Yudkowsky17 Jul 2008 6:07 UTC
80 points
99 comments8 min readLW link

High Challenge

Eliezer Yudkowsky19 Dec 2008 0:51 UTC
46 points
74 comments4 min readLW link

Our val­ues are un­der­defined, change­able, and manipulable

Stuart_Armstrong2 Nov 2017 11:09 UTC
22 points
6 comments3 min readLW link

Rev­ersible changes: con­sider a bucket of water

Stuart_Armstrong26 Aug 2019 22:55 UTC
25 points
18 comments2 min readLW link

Would I think for ten thou­sand years?

Stuart_Armstrong11 Feb 2019 19:37 UTC
25 points
13 comments1 min readLW link

Beyond al­gorith­mic equiv­alence: self-modelling

Stuart_Armstrong28 Feb 2018 16:55 UTC
10 points
3 comments1 min readLW link

Bias in ra­tio­nal­ity is much worse than noise

Stuart_Armstrong31 Oct 2017 11:57 UTC
11 points
0 comments2 min readLW link

Re­view of ‘But ex­actly how com­plex and frag­ile?’

TurnTrout6 Jan 2021 18:39 UTC
51 points
0 comments8 min readLW link

2012 Robin Han­son com­ment on “In­tel­li­gence Ex­plo­sion: Ev­i­dence and Im­port”

Rob Bensinger2 Apr 2021 16:26 UTC
28 points
4 comments3 min readLW link

Two Ne­glected Prob­lems in Hu­man-AI Safety

Wei_Dai16 Dec 2018 22:13 UTC
73 points
23 comments2 min readLW link

The E-Coli Test for AI Alignment

johnswentworth16 Dec 2018 8:10 UTC
66 points
24 comments1 min readLW link

Ba­bies and Bun­nies: A Cau­tion About Evo-Psych

Alicorn22 Feb 2010 1:53 UTC
80 points
844 comments2 min readLW link

Ter­mi­nal Values and In­stru­men­tal Values

Eliezer Yudkowsky15 Nov 2007 7:56 UTC
79 points
42 comments10 min readLW link

Three AI Safety Re­lated Ideas

Wei_Dai13 Dec 2018 21:32 UTC
62 points
38 comments2 min readLW link

Why we need a *the­ory* of hu­man values

Stuart_Armstrong5 Dec 2018 16:00 UTC
64 points
15 comments4 min readLW link

But ex­actly how com­plex and frag­ile?

KatjaGrace3 Nov 2019 18:20 UTC
65 points
32 comments3 min readLW link2 nominations1 review
(meteuphoric.com)

The ge­nie knows, but doesn’t care

Rob Bensinger6 Sep 2013 6:42 UTC
88 points
519 comments8 min readLW link

Hack­ing the CEV for Fun and Profit

Wei_Dai3 Jun 2010 20:30 UTC
75 points
207 comments1 min readLW link

Siren wor­lds and the per­ils of over-op­ti­mised search

Stuart_Armstrong7 Apr 2014 11:00 UTC
69 points
417 comments7 min readLW link

Bore­dom vs. Scope Insensitivity

Wei_Dai24 Sep 2009 11:45 UTC
48 points
40 comments3 min readLW link

Fake Utility Functions

Eliezer Yudkowsky6 Dec 2007 16:55 UTC
49 points
62 comments4 min readLW link

What AI Safety Re­searchers Have Writ­ten About the Na­ture of Hu­man Values

avturchin16 Jan 2019 13:59 UTC
40 points
3 comments15 min readLW link

Leaky Generalizations

Eliezer Yudkowsky22 Nov 2007 21:16 UTC
35 points
29 comments3 min readLW link

Sym­pa­thetic Minds

Eliezer Yudkowsky19 Jan 2009 9:31 UTC
47 points
27 comments5 min readLW link

In Praise of Boredom

Eliezer Yudkowsky18 Jan 2009 9:03 UTC
32 points
104 comments6 min readLW link

Values Weren’t Com­plex, Once.

Davidmanheim25 Nov 2018 9:17 UTC
32 points
13 comments2 min readLW link

Post Your Utility Function

taw4 Jun 2009 5:05 UTC
32 points
280 comments1 min readLW link

Can’t Un­birth a Child

Eliezer Yudkowsky28 Dec 2008 17:00 UTC
35 points
96 comments3 min readLW link

ISO: Name of Problem

johnswentworth24 Jul 2018 17:15 UTC
28 points
15 comments1 min readLW link

Su­per­in­tel­li­gence 20: The value-load­ing problem

KatjaGrace27 Jan 2015 2:00 UTC
8 points
21 comments6 min readLW link

An­thro­po­mor­phic Optimism

Eliezer Yudkowsky4 Aug 2008 20:17 UTC
56 points
58 comments5 min readLW link

Fad­ing Novelty

lifelonglearner25 Jul 2018 21:36 UTC
21 points
2 comments6 min readLW link

Can there be an in­de­scrib­able hel­l­world?

Stuart_Armstrong29 Jan 2019 15:00 UTC
33 points
19 comments2 min readLW link

What’s wrong with sim­plic­ity of value?

Wei_Dai27 Jul 2011 3:09 UTC
29 points
40 comments1 min readLW link
No comments.