There’s a lot more room for variation in the space of possible states of the universe than in the space of possible pieces of music. Also, people can usually give you a straightforward answer if you ask them how much they like a certain piece of music, but when asked which of two hypothetical scenarios better satisfies their values, people’s answers tend to be erratic, and people will give sets of answers not consistent with any coherent value system, making quite a mess to untangle. Revealed preferences won’t help, because people don’t act like optimizers, and often do things that they know conflict with what they want.
By “unacceptably” wrong, I meant wrong enough that it would be a disaster if they were used as the utility function of a superintelligence. In situations with a much larger margin of error, it is possible to use fairly simple algorithms to usefully approximate human values in some domain of interest.
I’m not sure that the number of possible states of the universe is relevant. I would imagine that the vast majority of the variation in that state space would be characterized by human indifference. The set of all possible combinations of sound frequencies is probably comparably enormous, but that doesn’t seem to have precluded Pandora’s commercial success.
I have to categorically disagree with the statement that people don’t have access to their values or that their answers about what they value will tend to be erratic. I would wager that an overwhelming percentage of people would rather find 5 dollars than stub their toe. I would furthermore expect that answer to be far more consistent and more stable than asking people to name their favorite music genre or favorite song. This reflects something very real about human values. I can create ethical or value situations that are difficult for humans to resolve, but I can do that in almost any domain where humans express preferences. We’re not going to get it perfectly right. But with some practice, we can probably make it to acceptably wrong.
There’s a lot more room for variation in the space of possible states of the universe than in the space of possible pieces of music. Also, people can usually give you a straightforward answer if you ask them how much they like a certain piece of music, but when asked which of two hypothetical scenarios better satisfies their values, people’s answers tend to be erratic, and people will give sets of answers not consistent with any coherent value system, making quite a mess to untangle. Revealed preferences won’t help, because people don’t act like optimizers, and often do things that they know conflict with what they want.
By “unacceptably” wrong, I meant wrong enough that it would be a disaster if they were used as the utility function of a superintelligence. In situations with a much larger margin of error, it is possible to use fairly simple algorithms to usefully approximate human values in some domain of interest.
I’m not sure that the number of possible states of the universe is relevant. I would imagine that the vast majority of the variation in that state space would be characterized by human indifference. The set of all possible combinations of sound frequencies is probably comparably enormous, but that doesn’t seem to have precluded Pandora’s commercial success.
I have to categorically disagree with the statement that people don’t have access to their values or that their answers about what they value will tend to be erratic. I would wager that an overwhelming percentage of people would rather find 5 dollars than stub their toe. I would furthermore expect that answer to be far more consistent and more stable than asking people to name their favorite music genre or favorite song. This reflects something very real about human values. I can create ethical or value situations that are difficult for humans to resolve, but I can do that in almost any domain where humans express preferences. We’re not going to get it perfectly right. But with some practice, we can probably make it to acceptably wrong.