I can’t find the original post about the buck stopping after a bit of Googling. I’d like to keep looking into this!
The post I’m referring to is here, but I should note that EY used the phrase in a different context, and my view on terminal values does not reflect his view. My critique of the idea that all human values are complex is that it presupposes too narrow of an interpretation of “values”. Let’s talk about “goals” instead, defined as follows:
Imagine you could shape yourself and the world any way you like, unconstrained by the limits of what is considered feasible and what not, what would you do? Which changes would you make? The result describes your ideal world, it describes everything that is at all important to you. However, it does not yet describe how important these things are in relation to other things you consider important. So imagine that you had the same super-powers, but this time they are limited: You cannot make every change you had in mind, you need to prioritize some changes over others. Which changes would be most important to you? The outcome of this thought experiment approximates your goals. (This question is of course a very difficult one, and what someone says after thinking about it for five minutes might be quite different from what someone would choose if she had heard all the ethical arguments in the world and thought about the matter for a very long time. If you care about making decisions for good/informed reasons, you might want to refrain from committing too much to specific answers and instead give weight to what a better informed version of yourself would say after longer reflection.)
I took the definition from this blogpost I wrote a while back. The comment section there contains a long discussion on a similar issue where I elaborate on my view of terminal values.
Anyway, the way my definition of “goals” seems to differ from the interpretation of “values” in the phrase “human values are complex” is that “goals” allow for self-modification. If I could, I would self-modify into a utilitarian super-robot, regardless of whether it was still conscious or not. According to “human values are complex”, I’d be making a mistake in doing so. What sort of mistake would I be making?
The situation is as follows: Unlike some conceivable goal-architectures we might choose for artificial intelligence, humans do not have a clearly defined goal. When you ask people on the street what their goals are in life, they usually can’t tell you, and if they do tell you something, they’ll likely revise it as soon as you press them with an extreme thought experiment. Many humans are not agenty. Learning about rationality and thinking about personal goals can turn people into agents. How does this transition happen? The “human values are complex” theory seems to imply that we introspect, find out that we care/have intuitions about 5+ different axes of value, and end up accepting all of them for our goals. This is probably how quite a few people are doing it, but they’re victim of a gigantic typical mind fallacy if they think that’s the only way to do it. Here’s what happened to me personally (and incidentally, to about “20+” agents I know personally and to all the hedonistic utilitarians who are familiar with Lesswrong content and still keep their hedonistic utilitarian goals):
I started out with many things I like (friendship, love, self-actualization, non-repetitiveness, etc) plus some moral intuitions (anti-harm, fairness). I then got interested in ethics and figuring out the best ethical theory. I turned into a moral anti-realist soon, but still wanted to find a theory that incorporates my most fundamental intuitions. I realized that I don’t care intrinsically about “fairness” and became a utiltiarian in terms of my other-regarding/moral values. I then had the decision to what extent I should invest into utilitarianism/altruism, and how much into values that are more about me specifically. I chose altruism, because I have a strong, OCD-like tendency for doing things either fully or not at all, and I thought saving for retirement, eating healthy etc is just as bothersome as trying to be altruistic, because I don’t strongly self-identify with a 100-year-old version of me anyway, so might as well try to make sure that all future sentience will be suffering-free. I still take a lot of care about my long-term happiness and survival, but much less so than if I had the goal to live forever, and as I said I would instantly press the “self-modify into utilitarian robot” button, if there was one. I’d be curious to hear whether I am being “irrational” somewhere, whether there was a step involved that was clearly mistaken. I cannot imagine how that would be the case, and the matter seems obvious to me. So every time I read the link “human values are complex”, it seems like an intellectually dishonest discussion stopper to me.
The post I’m referring to is here, but I should note that EY used the phrase in a different context, and my view on terminal values does not reflect his view. My critique of the idea that all human values are complex is that it presupposes too narrow of an interpretation of “values”. Let’s talk about “goals” instead, defined as follows:
I took the definition from this blogpost I wrote a while back. The comment section there contains a long discussion on a similar issue where I elaborate on my view of terminal values.
Anyway, the way my definition of “goals” seems to differ from the interpretation of “values” in the phrase “human values are complex” is that “goals” allow for self-modification. If I could, I would self-modify into a utilitarian super-robot, regardless of whether it was still conscious or not. According to “human values are complex”, I’d be making a mistake in doing so. What sort of mistake would I be making?
The situation is as follows: Unlike some conceivable goal-architectures we might choose for artificial intelligence, humans do not have a clearly defined goal. When you ask people on the street what their goals are in life, they usually can’t tell you, and if they do tell you something, they’ll likely revise it as soon as you press them with an extreme thought experiment. Many humans are not agenty. Learning about rationality and thinking about personal goals can turn people into agents. How does this transition happen? The “human values are complex” theory seems to imply that we introspect, find out that we care/have intuitions about 5+ different axes of value, and end up accepting all of them for our goals. This is probably how quite a few people are doing it, but they’re victim of a gigantic typical mind fallacy if they think that’s the only way to do it. Here’s what happened to me personally (and incidentally, to about “20+” agents I know personally and to all the hedonistic utilitarians who are familiar with Lesswrong content and still keep their hedonistic utilitarian goals):
I started out with many things I like (friendship, love, self-actualization, non-repetitiveness, etc) plus some moral intuitions (anti-harm, fairness). I then got interested in ethics and figuring out the best ethical theory. I turned into a moral anti-realist soon, but still wanted to find a theory that incorporates my most fundamental intuitions. I realized that I don’t care intrinsically about “fairness” and became a utiltiarian in terms of my other-regarding/moral values. I then had the decision to what extent I should invest into utilitarianism/altruism, and how much into values that are more about me specifically. I chose altruism, because I have a strong, OCD-like tendency for doing things either fully or not at all, and I thought saving for retirement, eating healthy etc is just as bothersome as trying to be altruistic, because I don’t strongly self-identify with a 100-year-old version of me anyway, so might as well try to make sure that all future sentience will be suffering-free. I still take a lot of care about my long-term happiness and survival, but much less so than if I had the goal to live forever, and as I said I would instantly press the “self-modify into utilitarian robot” button, if there was one. I’d be curious to hear whether I am being “irrational” somewhere, whether there was a step involved that was clearly mistaken. I cannot imagine how that would be the case, and the matter seems obvious to me. So every time I read the link “human values are complex”, it seems like an intellectually dishonest discussion stopper to me.