How are human values categorically different from things like music preference? Descriptions of art also seem to to rely on lots of fairly arbitrary objects that it’s difficult to simplify.
I’m also not sure what qualifies as unacceptably wrong. There’s obviously some utility in having very crude models of human preferences. How would a slightly less crude model suddenly result in something that was “unacceptably” wrong?
There’s a lot more room for variation in the space of possible states of the universe than in the space of possible pieces of music. Also, people can usually give you a straightforward answer if you ask them how much they like a certain piece of music, but when asked which of two hypothetical scenarios better satisfies their values, people’s answers tend to be erratic, and people will give sets of answers not consistent with any coherent value system, making quite a mess to untangle. Revealed preferences won’t help, because people don’t act like optimizers, and often do things that they know conflict with what they want.
By “unacceptably” wrong, I meant wrong enough that it would be a disaster if they were used as the utility function of a superintelligence. In situations with a much larger margin of error, it is possible to use fairly simple algorithms to usefully approximate human values in some domain of interest.
I’m not sure that the number of possible states of the universe is relevant. I would imagine that the vast majority of the variation in that state space would be characterized by human indifference. The set of all possible combinations of sound frequencies is probably comparably enormous, but that doesn’t seem to have precluded Pandora’s commercial success.
I have to categorically disagree with the statement that people don’t have access to their values or that their answers about what they value will tend to be erratic. I would wager that an overwhelming percentage of people would rather find 5 dollars than stub their toe. I would furthermore expect that answer to be far more consistent and more stable than asking people to name their favorite music genre or favorite song. This reflects something very real about human values. I can create ethical or value situations that are difficult for humans to resolve, but I can do that in almost any domain where humans express preferences. We’re not going to get it perfectly right. But with some practice, we can probably make it to acceptably wrong.
How are human values categorically different from things like music preference? Descriptions of art also seem to to rely on lots of fairly arbitrary objects that it’s difficult to simplify.
I’m also not sure what qualifies as unacceptably wrong. There’s obviously some utility in having very crude models of human preferences. How would a slightly less crude model suddenly result in something that was “unacceptably” wrong?
There’s a lot more room for variation in the space of possible states of the universe than in the space of possible pieces of music. Also, people can usually give you a straightforward answer if you ask them how much they like a certain piece of music, but when asked which of two hypothetical scenarios better satisfies their values, people’s answers tend to be erratic, and people will give sets of answers not consistent with any coherent value system, making quite a mess to untangle. Revealed preferences won’t help, because people don’t act like optimizers, and often do things that they know conflict with what they want.
By “unacceptably” wrong, I meant wrong enough that it would be a disaster if they were used as the utility function of a superintelligence. In situations with a much larger margin of error, it is possible to use fairly simple algorithms to usefully approximate human values in some domain of interest.
I’m not sure that the number of possible states of the universe is relevant. I would imagine that the vast majority of the variation in that state space would be characterized by human indifference. The set of all possible combinations of sound frequencies is probably comparably enormous, but that doesn’t seem to have precluded Pandora’s commercial success.
I have to categorically disagree with the statement that people don’t have access to their values or that their answers about what they value will tend to be erratic. I would wager that an overwhelming percentage of people would rather find 5 dollars than stub their toe. I would furthermore expect that answer to be far more consistent and more stable than asking people to name their favorite music genre or favorite song. This reflects something very real about human values. I can create ethical or value situations that are difficult for humans to resolve, but I can do that in almost any domain where humans express preferences. We’re not going to get it perfectly right. But with some practice, we can probably make it to acceptably wrong.
It’d plausible that morally relevant values are a subset of values.