[link] Desiderata for a model of human values
http://kajsotala.fi/2015/11/desiderata-for-a-model-of-human-values/
Soares (2015) defines the value learning problem as
By what methods could an intelligent machine be constructed to reliably learn what to value and to act as its operators intended?
There have been a few attempts to formalize this question. Dewey (2011) started from the notion of building an AI that maximized a given utility function, and then moved on to suggest that a value learner should exhibit uncertainty over utility functions and then take “the action with the highest expected value, calculated by a weighted average over the agent’s pool of possible utility functions.” This is a reasonable starting point, but a very general one: in particular, it gives us no criteria by which we or the AI could judge the correctness of a utility function which it is considering.
To improve on Dewey’s definition, we would need to get a clearer idea of just what we mean by human values. In this post, I don’t yet want to offer any preliminary definition: rather, I’d like to ask what properties we’d like a definition of human values to have. Once we have a set of such criteria, we can use them as a guideline to evaluate various offered definitions.
Illocutionary act
Kaj states preference for an intensional definition
We can select from genus of existing definitions of human values or share differentia in this thread
Shared differentia will be either nominal or real, or if you are better versed in cognitive psychology than philosophy, they will be exemplars or the other thing.
It is prudent that sharers provide context to their shares to avoid brainstorming ideas that are homonyms.
Contributions to this thread that approximate recursive definitions will be more proximate in utility to Soares and the MIRI team who prefer mathematical and logical specifications
(Intensional, nominal) properties of a definition of human values
Source Main article
See also: limitations (of definition)