(2) The semantic tricks merely shift the lump under the rug, they don’t get rid of it. Standard worries about relativism re-emerge, e.g. an agent can know a priori that their own fundamental values are right, given how the meaning of the word ‘right’ is determined. This kind of (even merely ‘fundamental’) infallibility seems implausible.
EY bites this bullet in the abstract, but notes that it does not apply to humans. An AI with a simple utility function and full ability to analyze its own source code can be quite sure that maximizing that function is the meaning of “that-AI-right” in the sense EY is talking about.
But there is no analogue to that situation in human psychology, given how much we now know about self-deception, our conscious and unconscious mental machinery, and the increasing complexity of our values the more we think on them. We can, it’s true, say that “the correct extrapolation of my fundamental values is what’s right for me to do”, but this doesn’t guarantee whether value X is or is not a member of that set. The actual work of extrapolating human values (through moral arguments and other methods) still has to be done.
So practical objections to this sort of bullet-biting don’t apply to this metaethics; are there any important theoretical objections?
EDIT: Changed “right” to “that-AI-right”. Important clarification.
An AI with a simple utility function and full ability to analyze its own source code can be quite sure that maximizing that function is the meaning of “that-AI-right” in the sense EY is talking about.
I don’t think that’s right, or EY’s position (I’d like evidence on that). Who’s to say that maximization is precisely what’s right? That might be a very good heuristic, but upon reflection the AI might decide to self-improve in a way that changes this subgoal (of the overall decision problem that includes all the other decision-making parts), by finding considerations that distinguish maximizing attitude to utility and the right attitude to utility. It would of course use its current utility-maximizing algorithm to come to that decision. But the conclusion might be that too much maximization is bad for environment or something. The AI would stop maximizing for the reason it’s not the most maximizing thing, the same way as a person would not kill for the reason that action leads to a death, even though avoid-causing-death is not the whole morality and doesn’t apply universally.
Agreed that on EY’s view (and my own), human “fundamental values” (1) have not yet been fully articulated/extrapolated; that we can’t say with confidence whether X is in that set.
But AFAICT, EY rejects the idea (which you seem here to claim that he endorses?) that an AI with a simple utility function can be sure that maximizing that function is the right thing to do. It might believe that maximizing that function is the right thing to do, but it would be wrong. (2)
AFAICT this is precisely what RichardChappell considers implausible: the idea that unlike the AI, humans can correctly believe that maximizing their utility function is the right thing to do.
==
(1) Supposing there exist any such things, of which I am not convinced.
(2) Necessarily wrong, in fact, since on EY’s view as I understand it there’s one and only one right set of values, and humans currently implement it, and the set of values humans implement is irreducably complex and therefore cannot be captured by a simple utility function. Therefore, an AI maximizing a simple utility function is necessarily not doing the right thing on EY’s view.
Sorry, I meant to use the two-place version; it wouldn’t be what’s right; what I meant is that the completely analogous concept of “that-AI-right” would consist simply of that utility function.
To the extent that you are still talking about EY’s views, I still don’t think that’s correct… I think he would reject the idea that “that-AI-right” is analogous to right, or that “right” is a 2-place predicate.
That said, given that this question has come up elsethread and I’m apparently in the minority, and given that I don’t understand what all this talk of right adds to the discussion in the first place, it becomes increasingly likely that I’ve just misunderstood something.
In any case, I suspect we all agree that the AI’s decisions are motivated by its simple utility function in a manner analogous to how human decisions are motivated by our (far more complex) utility function. What disagreement exists, if any, involves the talk of “right” that I’m happy to discard altogether.
EY bites this bullet in the abstract, but notes that it does not apply to humans. An AI with a simple utility function and full ability to analyze its own source code can be quite sure that maximizing that function is the meaning of “that-AI-right” in the sense EY is talking about.
But there is no analogue to that situation in human psychology, given how much we now know about self-deception, our conscious and unconscious mental machinery, and the increasing complexity of our values the more we think on them. We can, it’s true, say that “the correct extrapolation of my fundamental values is what’s right for me to do”, but this doesn’t guarantee whether value X is or is not a member of that set. The actual work of extrapolating human values (through moral arguments and other methods) still has to be done.
So practical objections to this sort of bullet-biting don’t apply to this metaethics; are there any important theoretical objections?
EDIT: Changed “right” to “that-AI-right”. Important clarification.
I don’t think that’s right, or EY’s position (I’d like evidence on that). Who’s to say that maximization is precisely what’s right? That might be a very good heuristic, but upon reflection the AI might decide to self-improve in a way that changes this subgoal (of the overall decision problem that includes all the other decision-making parts), by finding considerations that distinguish maximizing attitude to utility and the right attitude to utility. It would of course use its current utility-maximizing algorithm to come to that decision. But the conclusion might be that too much maximization is bad for environment or something. The AI would stop maximizing for the reason it’s not the most maximizing thing, the same way as a person would not kill for the reason that action leads to a death, even though avoid-causing-death is not the whole morality and doesn’t apply universally.
See also this comment.
Agreed that on EY’s view (and my own), human “fundamental values” (1) have not yet been fully articulated/extrapolated; that we can’t say with confidence whether X is in that set.
But AFAICT, EY rejects the idea (which you seem here to claim that he endorses?) that an AI with a simple utility function can be sure that maximizing that function is the right thing to do. It might believe that maximizing that function is the right thing to do, but it would be wrong. (2)
AFAICT this is precisely what RichardChappell considers implausible: the idea that unlike the AI, humans can correctly believe that maximizing their utility function is the right thing to do.
==
(1) Supposing there exist any such things, of which I am not convinced.
(2) Necessarily wrong, in fact, since on EY’s view as I understand it there’s one and only one right set of values, and humans currently implement it, and the set of values humans implement is irreducably complex and therefore cannot be captured by a simple utility function. Therefore, an AI maximizing a simple utility function is necessarily not doing the right thing on EY’s view.
Sorry, I meant to use the two-place version; it wouldn’t be what’s right; what I meant is that the completely analogous concept of “that-AI-right” would consist simply of that utility function.
To the extent that you are still talking about EY’s views, I still don’t think that’s correct… I think he would reject the idea that “that-AI-right” is analogous to right, or that “right” is a 2-place predicate.
That said, given that this question has come up elsethread and I’m apparently in the minority, and given that I don’t understand what all this talk of right adds to the discussion in the first place, it becomes increasingly likely that I’ve just misunderstood something.
In any case, I suspect we all agree that the AI’s decisions are motivated by its simple utility function in a manner analogous to how human decisions are motivated by our (far more complex) utility function. What disagreement exists, if any, involves the talk of “right” that I’m happy to discard altogether.