[link] Desiderata for a model of human values

Kaj_Sotala28 Nov 2015 19:25 UTC

5 points

1 comment1 min readLW link Archive

http://kajsotala.fi/2015/11/desiderata-for-a-model-of-human-values/

Soares (2015) defines the value learning problem as

By what methods could an intelligent machine be constructed to reliably learn what to value and to act as its operators intended?

There have been a few attempts to formalize this question. Dewey (2011) started from the notion of building an AI that maximized a given utility function, and then moved on to suggest that a value learner should exhibit uncertainty over utility functions and then take “the action with the highest expected value, calculated by a weighted average over the agent’s pool of possible utility functions.” This is a reasonable starting point, but a very general one: in particular, it gives us no criteria by which we or the AI could judge the correctness of a utility function which it is considering.

To improve on Dewey’s definition, we would need to get a clearer idea of just what we mean by human values. In this post, I don’t yet want to offer any preliminary definition: rather, I’d like to ask what properties we’d like a definition of human values to have. Once we have a set of such criteria, we can use them as a guideline to evaluate various offered definitions.

Kaj_Sotala28 Nov 2015 19:25 UTC

5 points

1 comment1 min readLW link Archive

[deleted] 29 Nov 2015 1:21 UTC
0 points
Illocutionary act
1. Kaj states preference for an intensional definition
2. We can select from genus of existing definitions of human values or share differentia in this thread
3. Shared differentia will be either nominal or real, or if you are better versed in cognitive psychology than philosophy, they will be exemplars or the other thing.
4. It is prudent that sharers provide context to their shares to avoid brainstorming ideas that are homonyms.
5. Contributions to this thread that approximate recursive definitions will be more proximate in utility to Soares and the MIRI team who prefer mathematical and logical specifications
(Intensional, nominal) properties of a definition of human values

Certain rules have traditionally been given for definitions (in particular, genus-differentia definitions).
1. A definition must set out the essential attributes of the thing defined.
2. Definitions should avoid circularity. To define a horse as “a member of the species equus” would convey no information whatsoever. For this reason, Locking[specify] adds that a definition of a term must not consist of terms which are synonymous with it. This would be a circular definition, a circulus in definiendo. Note, however, that it is acceptable to define two relative terms in respect of each other. Clearly, we cannot define “antecedent” without using the term “consequent”, nor conversely.
3. The definition must not be too wide or too narrow. It must be applicable to everything to which the defined term applies (i.e. not miss anything out), and to nothing else (i.e. not include any things to which the defined term would not truly apply). The definition must not be obscure. The purpose of a definition is to explain the meaning of a term which may be obscure or difficult, by the use of terms that are commonly understood and whose meaning is clear. The violation of this rule is known by the Latin term obscurum per obscurius. However, sometimes scientific and philosophical terms are difficult to define without obscurity. (See the definition of Free will in Wikipedia, for instance).
4. A definition should not be negative where it can be positive. We should not define “wisdom” as the absence of folly, or a healthy thing as whatever is not sick. Sometimes this is unavoidable, however. One cannot define a point except as “something with no parts”, nor blindness except as “the absence of sight in a creature that is normally sighted”.
Source Main article

See also: limitations (of definition)