If we’re trying to learn about our own values though, with machine-learning to help us, then we might be able to reduce the above issue by using object-level assumptions about our values and the ways in which we are irrational. Unfortunately, its hard to tell from the inside whether the things we think we care about are things we actually care about or just due to irrationality. Furthermore, when we push these kinds of questions onto advanced machine-learning (near AGI level), the answer will probably be algorithm dependent.
#1. People being irrationally scope insensitive about how bad it is to inconvenience such an incomprehensibly massive amount of people, even if each inconvenience is very minor (this is the explanation given by most rationalists)
How would (or should!) machine-learning decide between the two above two explanations? There are a lot of different ways you could do it with different answers. If your machine-learning algorithm learns scope-insensitivity in general and adjusts for it, it will tend to conclude #1. If your machine-learning algorithm isolates this problem specifically and directly queries people for their opinion on it upon reflection, it might conclude #2. To get one of these answers, we have to make a normative assumption (without machine-learning!) about how we want our machine-learning algorithms to learn.
I don’t think we can understand things like “what we truly want” just by using the appropriate machine-learning tools.
To start off, Occam’s razor is insufficient to infer the preferences of irrational agents. Since humans are generally irrational, this implies that—even with all human behavioral data—machine-learning tools and analysis will be insufficient to infer our values.
If we’re trying to learn about our own values though, with machine-learning to help us, then we might be able to reduce the above issue by using object-level assumptions about our values and the ways in which we are irrational. Unfortunately, its hard to tell from the inside whether the things we think we care about are things we actually care about or just due to irrationality. Furthermore, when we push these kinds of questions onto advanced machine-learning (near AGI level), the answer will probably be algorithm dependent.
Practical Example:
Consider whether we should let an incomprehensibly massive number of people get dust specks in their eyes or subject one person to torture. Most people’s intuitions prompt them to save the one person from torture. This can be explained in two different ways:
#1. People being irrationally scope insensitive about how bad it is to inconvenience such an incomprehensibly massive amount of people, even if each inconvenience is very minor (this is the explanation given by most rationalists)
or
#2. Torture being infinitely bad (this is the answer most people feel is intuitive)
How would (or should!) machine-learning decide between the two above two explanations? There are a lot of different ways you could do it with different answers. If your machine-learning algorithm learns scope-insensitivity in general and adjusts for it, it will tend to conclude #1. If your machine-learning algorithm isolates this problem specifically and directly queries people for their opinion on it upon reflection, it might conclude #2. To get one of these answers, we have to make a normative assumption (without machine-learning!) about how we want our machine-learning algorithms to learn.