A way to choose what subset of humanity gets included in CEV
I thought the point of defining CEV as what we would choose if we knew better was (partly) that you wouldn’t have to subset. We wouldn’t be superstitious, vengeful, and so on if we knew better.
Also, can you expand on what you mean by “Rawlesian Reflective Equilibrium”? Are you referring (however indirectly) to the “veil of ignorance” concept?
Why not? How does adding factual knowledge get rid of people’s desire to hurt someone else out of revenge?
We wouldn’t be superstitious, … if we knew better.
People who currently believe in superstitious belief system X would lose the factual falsehoods that X entailed. But most superstitious belief systems have evaluative aspects too, for example, the widespread religious belief that all nonbelievers “ought” to go to hell. I am a nonbeliever. I am also not Chinese, not Indian, not a follower of Sharia Law or Islam, not a member of the Chinese Communist Party, not a member of the Catholic Church, not a Mormon, not a “Good Christian”, and I didn’t intend to donate all my money and resources to saving lives in the third world before finding out about the singularity. There are lots of humans alive on this planet whose volitions could spring a very nasty surprise on people like us.
Why not? How does adding factual knowledge get rid of people’s desire to hurt someone else out of revenge?
Learning about the game-theoretic roots of a desire seems to generally weaken its force, and makes it apparent that one has a choice about whether or not to retain it. I don’t know what fraction of people would choose in such a state not to be vengeful, though. (Related: ‘hot’ and ‘cold’ motivational states. CEV seems to naturally privilege cold states, which should tend to reduce vengefulness, though I’m not completely sure this is the right thing to do rather than something like a negotiation between hot and cold subselves.)
What it’s like to be hurt is also factual knowledge, and seems like it might be extremely motivating towards empathy generally.
People who currently believe in superstitious belief system X would lose the factual falsehoods that X entailed. But most superstitious belief systems have evaluative aspects too, for example, the widespread religious belief that all nonbelievers “ought” to go to hell.
Why do you think it likely that people would retain that evaluative judgment upon losing the closely coupled beliefs? Far more plausibly, they could retain the general desire to punish violations of conservative social norms, but see above.
I find it interesting that there seems to be a lot of variation in people’s views regarding how much coherence there’d be in an extrapolation… You say that choosing a right group of humans is important while I’m under the impression that there is no such problem; basically everyone should be the game, and making higher level considerations about which humans to include is merely an additional source of error. Nevertheless, if there’ll be really as much coherence as I think, and I think there’d be hella lot, picking some subset of humanity would pretty much produce a CEV that is very akin to CEVs of other possible human groups.
I think that even being an Islamic radical fundamentalist is a petty factor in overall coherence. If I’m correct, Vladimir Nesov has said several times that people can be wrong about their values, and I pretty much agree. Of course, there is an obvious caveat that it’s rather shaky to guess what other people’s real values might be. Saying “You’re wrong about your professed value X, you’re real value is along the lines of Y because you cannot possibly diverge that much from the psychological unity of mankind” also risks seeming like claiming excessive moral authority. Still, I think it is a potentially valid argument, depending on the exact nature of X and Y.
Nevertheless, if there’ll be really as much coherence as I think, and I think there’d be hella lot, picking some subset of humanity would pretty much produce a CEV that is very akin to CEVs of other possible human groups.
And what would you do if Omega told you that the CEV of just {liberal westerners in your age group} is wildly different from the CEV of humanity? What do you think the right thing to do would be then?
I’d ask Omega, “Which construal of volition are you using?”
There’s light in us somewhere, a better world inside us somewhere, the question is how to let it out. It’s probably more closely akin to the part of us that says “Wouldn’t everyone getting their wishes really turn out to be awful?” than the part of us that thinks up cool wishes. And it may even be that Islamic fundamentalists just don’t have any note of grace in them at all, that there is no better future written in them anywhere, that every reasonable construal of them ends up with an atheist who still wants others to burn in hell; and if so, the test I cited in the other comment, about filtering portions of the extrapolated volition that wouldn’t respect the volition of another who unconditionally respected theirs, seems like it ought to filter that.
the test I cited in the other comment, about filtering portions of the extrapolated volition that wouldn’t respect the volition of another who unconditionally respected theirs, seems like it ought to filter that.
I agree that certain limiting factors, tests, etc could be useful. I haven’t thought hard enough about this particular proposal to say whether it is really of use. My first thought is that if you have thought about it carefully, then it probably relatively good, just based on your track record.
Eliezer has already talked about this and argued that the right thing would be to run the CEV on the whole of humanity, basing himself partly on an argument that if some particular group (not us) got control of the programming of the AI, we would prefer that they run it on the whole of humanity rather running it on themselves.
The lives of most evildoers are of course largely incredibly prosaic, and I find it hard to believe their values in their most prosaic doings are that dissimilar from everyone else around the world doing prosaic things.
I think that thinking in terms of good and evil belies a closet-realist approach to the problem. In reality, there are different people, with different cultures and biologically determined drives. These cultural and biological factors determine (approximately) a set of traditions, worldviews, ethical principles and moral rules, which can undergo a process of reflective equilibrium to determine a set of consistent preferences over the physical world.
We don’t know how the reflective equilibrium thing will go, but we know that it could depend upon the set of traditions, ethical principles and moral rules that go into it.
If someone is an illiterate devout pentecostal Christian who lives in a village in Angola, the eventual output of the preference formation process applied to them might be very different than if it were applied to the typical LW reader.
They’re not evil. They just might have a very different “should function” than me.
I think part of the point of what you call “moral anti-realism” is that it frees up words like “evil” so that they can refer to people who have particular kinds of “should function”, since there’s nothing cosmic that the word could be busy referring to instead.
If I had to offer a demonology, I guess I might loosely divide evil minds into: 1) those capable of serious moral reflection but avoiding it, e.g. because they’re busy wallowing in negative other-directed emotion, 2) those engaging in serious moral reflection but making cognitive mistakes in doing so, 3) those whose moral reflection genuinely outputs behavior that strongly conflicts with (the extension of) one’s own values. I think 1 comes closest to what’s traditionally meant by “evil”, with 2 being more “misguided” and 3 being more “Lovecraftian”. As I understand it, CEV is problematic if most people are “Lovecraftian” but less so if they’re merely “evil” or “misguided”, and I think you may in general be too quick to assume Lovecraftianity. (ETA: one main reason why I think this is that I don’t see many people actually retaining values associated with wrong belief systems when they abandon those belief systems; do you know of many atheists who think atheists or even Christians should burn in hell?)
“One main reason why I think this is that I don’t see many people actually retaining values associated with wrong belief systems when they abandon those belief systems; do you know of many atheists who think atheists or even Christians should burn in hell?)”
One main reason why you don’t see that happening is that the set of beliefs that you consider “right beliefs” is politically influenced, i.e. human beliefs come in certain patterns which are not connected in themselves, but are connected by the custom that people who hold one of the beliefs usually hold the others.
For example, I knew a woman (an agnostic) who favored animal rights, and some group on this basis sent her literature asking for her help with pro-abortion activities, namely because this is a typical pattern: People favoring animal rights are more likely to be pro-abortion. But she responded, “Just because I’m against torturing animals doesn’t mean I’m in favor of killing babies,” evidently quite a logical response, but not in accordance with the usual pattern.
In other words, your own values are partly determined by political patterns, and if they weren’t (which they wouldn’t be under CEV) you might well see people retaining values you dislike when they extrapolate.
As I understand it, CEV is problematic if most people are “Lovecraftian” but less so if they’re merely “evil” or “misguided”, and I think you may in general be too quick to assume Lovecraftianity.
Most people may or may not be “Lovecraftian”, but why take that risk?
There are gains from cooperating with as many others as possible. Maybe these and other factors outweigh the risk or maybe they don’t; the lower the probability and extent of Lovecraftianity, the more likely it is that they do.
Anyway, I’m not making any claims about what to do, I’m just saying people probably aren’t as Lovecraftian as Roko thinks, which I conclude both from introspection and from the statistics of what moral change we actually see in humans.
There are gains from cooperating with as many others as possible. Maybe these and other factors outweigh the risk or maybe they don’t; the lower the probability and extent of Lovecraftianity, the more likely it is that they do.
I agree that “probability and extent of Lovecraftianity” would be an important consideration if it were a matter of cooperation, and of deciding how many others to cooperate with, but Eliezer’s motivation in giving everyone equal weighting in CEV is altruism rather than cooperation. If it were cooperation, then the weights would be adjusted to account for contribution or bargaining power, instead of being equal.
Anyway, I’m not making any claims about what to do, I’m just saying people probably aren’t as Lovecraftian as Roko thinks, which I conclude both from introspection and from the statistics of what moral change we actually see in humans.
To reiterate, “how Lovecraftian” isn’t really the issue. Just by positing the possibility that most humans might turn out to be Lovecraftian, you’re operating in a meta-ethical framework at odds with Eliezer’s, and in which it doesn’t make sense to give everyone equal weight in CEV (or at least you’ll need a whole other set of arguments to justify that).
That aside, the statistics you mention might also be skewed by an anthropic selection effect.
If someone is an illiterate devout pentecostal Christian who lives in a village in Angola, the eventual output of the preference formation process applied to them might be very different than if it were applied to the typical LW reader.
Consider the distinction between whether the output of a preference-aggregation algorithm will be very different for the Angolan Christian, and whether it should be very different. Some preference-aggregation algorithms may just be confused into giving diverging results because of inconsequential distinctions, which would be bad news for everyone, even the “enlightened” westerners.
(To be precise, the relevant factual statement is about whether any two same-culture people get preferences visibly closer to each other than any two culturally distant people. It’s like with relatively small genetic relevance of skin color, where within-race variation is greater than between-races variation.)
I think we agree about this actually—several people’s picture of someone with alien values was an Islamic fundamentalist, and they were the “evildoers” I have in mind...
And what would you do if Omega told you that the CEV of just {liberal westerners in your age group} is wildly different from the CEV of humanity? What do you think the right thing to do would be then?
The right thing for me to do is to run CEV on myself, almost by definition. The CEV oracle that I am using to work out my CEV can dereference the dependencies to other CEVs better than I can.
No, not obviously; I can’t say I’ve ever seen anyone else claim to completely condition their concern for other people on the possession of similar reflective preferences.
(Or is your point that they probably wouldn’t stay people for very long, if given the means to act on their reflective preferences? That wouldn’t make it OK to kill them before then, and it would probably constitute undesirable True PD defection to do so afterwards.)
Well, my above reply was a bit tongue-in-cheek. My concern for other things in general is just as complex as my morality and it contains many meta elements such as “I’m willing to modify my preference X in order to conform to your preference Y because I currently care about your utility to a certain extent”. On the simplest level, I care for things on a sliding scale that ranges from myself to rocks or Clippy AIs with no functional analogues for human psychology (pain, etc.). Somebody with a literally wildly differing reflective preference would not be a person and, as you say, would be preferably dealt with in True PD manners rather than ordinary human-human altruism contaminated interactions.
Somebody with a literally wildly differing reflective preference would not be a person
This is a very nonstandard usage; personhood is almost universally defined in terms of consciousness and cognitive capacities, and even plausibly relevant desire-like properties like boredom don’t have much to do with reflective preference/volition.
How does adding factual knowledge get rid of people’s desire to hurt someone else out of revenge?
“If we knew better” is an ambiguous phrase, I probably should have used Eliezer’s original: “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. That carries a lot of baggage, at least for me.
I don’t experience (significant) desires of revenge, so I can only extrapolate from fictional evidence. Say the “someone” in question killed a loved one, and I wanted to hurt them for that. Suppose further that they were no longer able to kill anyone else. Given the time and the means to think about it clearly, I coud see that hurting them would not improve the state of the world for me, or for anyone else, and only impose further unnecessary suffering.
The (possibly flawed) assumption of CEV, as I understood it, is that if I could reason flawlessly, non-pathologically about all of my desires and preferences, I would no longer cleave to the self-undermining ones, and what remains would be compatible with the non-self-undermining desires and preferences of the rest of humanity.
Caveat: I have read the original CEV document but not quite as carefully as maybe I should have, mainly because it carried a “Warning: obsolete” label and I was expecting to come across more recent insights here.
The rest of Rawls’ Theory of Justice is good too. I’m trying to figure out for myself (before I finally break down and ask) how CEV compares to the veil of ignorance.
Useful and interesting list, thanks.
I thought the point of defining CEV as what we would choose if we knew better was (partly) that you wouldn’t have to subset. We wouldn’t be superstitious, vengeful, and so on if we knew better.
Also, can you expand on what you mean by “Rawlesian Reflective Equilibrium”? Are you referring (however indirectly) to the “veil of ignorance” concept?
Why not? How does adding factual knowledge get rid of people’s desire to hurt someone else out of revenge?
People who currently believe in superstitious belief system X would lose the factual falsehoods that X entailed. But most superstitious belief systems have evaluative aspects too, for example, the widespread religious belief that all nonbelievers “ought” to go to hell. I am a nonbeliever. I am also not Chinese, not Indian, not a follower of Sharia Law or Islam, not a member of the Chinese Communist Party, not a member of the Catholic Church, not a Mormon, not a “Good Christian”, and I didn’t intend to donate all my money and resources to saving lives in the third world before finding out about the singularity. There are lots of humans alive on this planet whose volitions could spring a very nasty surprise on people like us.
Learning about the game-theoretic roots of a desire seems to generally weaken its force, and makes it apparent that one has a choice about whether or not to retain it. I don’t know what fraction of people would choose in such a state not to be vengeful, though. (Related: ‘hot’ and ‘cold’ motivational states. CEV seems to naturally privilege cold states, which should tend to reduce vengefulness, though I’m not completely sure this is the right thing to do rather than something like a negotiation between hot and cold subselves.)
What it’s like to be hurt is also factual knowledge, and seems like it might be extremely motivating towards empathy generally.
Why do you think it likely that people would retain that evaluative judgment upon losing the closely coupled beliefs? Far more plausibly, they could retain the general desire to punish violations of conservative social norms, but see above.
I find it interesting that there seems to be a lot of variation in people’s views regarding how much coherence there’d be in an extrapolation… You say that choosing a right group of humans is important while I’m under the impression that there is no such problem; basically everyone should be the game, and making higher level considerations about which humans to include is merely an additional source of error. Nevertheless, if there’ll be really as much coherence as I think, and I think there’d be hella lot, picking some subset of humanity would pretty much produce a CEV that is very akin to CEVs of other possible human groups.
I think that even being an Islamic radical fundamentalist is a petty factor in overall coherence. If I’m correct, Vladimir Nesov has said several times that people can be wrong about their values, and I pretty much agree. Of course, there is an obvious caveat that it’s rather shaky to guess what other people’s real values might be. Saying “You’re wrong about your professed value X, you’re real value is along the lines of Y because you cannot possibly diverge that much from the psychological unity of mankind” also risks seeming like claiming excessive moral authority. Still, I think it is a potentially valid argument, depending on the exact nature of X and Y.
And what would you do if Omega told you that the CEV of just {liberal westerners in your age group} is wildly different from the CEV of humanity? What do you think the right thing to do would be then?
I’d ask Omega, “Which construal of volition are you using?”
There’s light in us somewhere, a better world inside us somewhere, the question is how to let it out. It’s probably more closely akin to the part of us that says “Wouldn’t everyone getting their wishes really turn out to be awful?” than the part of us that thinks up cool wishes. And it may even be that Islamic fundamentalists just don’t have any note of grace in them at all, that there is no better future written in them anywhere, that every reasonable construal of them ends up with an atheist who still wants others to burn in hell; and if so, the test I cited in the other comment, about filtering portions of the extrapolated volition that wouldn’t respect the volition of another who unconditionally respected theirs, seems like it ought to filter that.
I agree that certain limiting factors, tests, etc could be useful. I haven’t thought hard enough about this particular proposal to say whether it is really of use. My first thought is that if you have thought about it carefully, then it probably relatively good, just based on your track record.
Eliezer has already talked about this and argued that the right thing would be to run the CEV on the whole of humanity, basing himself partly on an argument that if some particular group (not us) got control of the programming of the AI, we would prefer that they run it on the whole of humanity rather running it on themselves.
The lives of most evildoers are of course largely incredibly prosaic, and I find it hard to believe their values in their most prosaic doings are that dissimilar from everyone else around the world doing prosaic things.
I wasn’t think of evildoers. I was thinking of people who are just different, and have their own culture, traditions and way of life.
I think that thinking in terms of good and evil belies a closet-realist approach to the problem. In reality, there are different people, with different cultures and biologically determined drives. These cultural and biological factors determine (approximately) a set of traditions, worldviews, ethical principles and moral rules, which can undergo a process of reflective equilibrium to determine a set of consistent preferences over the physical world.
We don’t know how the reflective equilibrium thing will go, but we know that it could depend upon the set of traditions, ethical principles and moral rules that go into it.
If someone is an illiterate devout pentecostal Christian who lives in a village in Angola, the eventual output of the preference formation process applied to them might be very different than if it were applied to the typical LW reader.
They’re not evil. They just might have a very different “should function” than me.
I think part of the point of what you call “moral anti-realism” is that it frees up words like “evil” so that they can refer to people who have particular kinds of “should function”, since there’s nothing cosmic that the word could be busy referring to instead.
If I had to offer a demonology, I guess I might loosely divide evil minds into: 1) those capable of serious moral reflection but avoiding it, e.g. because they’re busy wallowing in negative other-directed emotion, 2) those engaging in serious moral reflection but making cognitive mistakes in doing so, 3) those whose moral reflection genuinely outputs behavior that strongly conflicts with (the extension of) one’s own values. I think 1 comes closest to what’s traditionally meant by “evil”, with 2 being more “misguided” and 3 being more “Lovecraftian”. As I understand it, CEV is problematic if most people are “Lovecraftian” but less so if they’re merely “evil” or “misguided”, and I think you may in general be too quick to assume Lovecraftianity. (ETA: one main reason why I think this is that I don’t see many people actually retaining values associated with wrong belief systems when they abandon those belief systems; do you know of many atheists who think atheists or even Christians should burn in hell?)
“One main reason why I think this is that I don’t see many people actually retaining values associated with wrong belief systems when they abandon those belief systems; do you know of many atheists who think atheists or even Christians should burn in hell?)”
One main reason why you don’t see that happening is that the set of beliefs that you consider “right beliefs” is politically influenced, i.e. human beliefs come in certain patterns which are not connected in themselves, but are connected by the custom that people who hold one of the beliefs usually hold the others.
For example, I knew a woman (an agnostic) who favored animal rights, and some group on this basis sent her literature asking for her help with pro-abortion activities, namely because this is a typical pattern: People favoring animal rights are more likely to be pro-abortion. But she responded, “Just because I’m against torturing animals doesn’t mean I’m in favor of killing babies,” evidently quite a logical response, but not in accordance with the usual pattern.
In other words, your own values are partly determined by political patterns, and if they weren’t (which they wouldn’t be under CEV) you might well see people retaining values you dislike when they extrapolate.
Most people may or may not be “Lovecraftian”, but why take that risk?
There are gains from cooperating with as many others as possible. Maybe these and other factors outweigh the risk or maybe they don’t; the lower the probability and extent of Lovecraftianity, the more likely it is that they do.
Anyway, I’m not making any claims about what to do, I’m just saying people probably aren’t as Lovecraftian as Roko thinks, which I conclude both from introspection and from the statistics of what moral change we actually see in humans.
I agree that “probability and extent of Lovecraftianity” would be an important consideration if it were a matter of cooperation, and of deciding how many others to cooperate with, but Eliezer’s motivation in giving everyone equal weighting in CEV is altruism rather than cooperation. If it were cooperation, then the weights would be adjusted to account for contribution or bargaining power, instead of being equal.
To reiterate, “how Lovecraftian” isn’t really the issue. Just by positing the possibility that most humans might turn out to be Lovecraftian, you’re operating in a meta-ethical framework at odds with Eliezer’s, and in which it doesn’t make sense to give everyone equal weight in CEV (or at least you’ll need a whole other set of arguments to justify that).
That aside, the statistics you mention might also be skewed by an anthropic selection effect.
Alternately: They’re evil. They have a very different ‘should function’ to me.
Consider the distinction between whether the output of a preference-aggregation algorithm will be very different for the Angolan Christian, and whether it should be very different. Some preference-aggregation algorithms may just be confused into giving diverging results because of inconsequential distinctions, which would be bad news for everyone, even the “enlightened” westerners.
(To be precise, the relevant factual statement is about whether any two same-culture people get preferences visibly closer to each other than any two culturally distant people. It’s like with relatively small genetic relevance of skin color, where within-race variation is greater than between-races variation.)
I think we agree about this actually—several people’s picture of someone with alien values was an Islamic fundamentalist, and they were the “evildoers” I have in mind...
The right thing for me to do is to run CEV on myself, almost by definition. The CEV oracle that I am using to work out my CEV can dereference the dependencies to other CEVs better than I can.
If truly, really wildly different? Obviously, I’d just disassemble them to useful matter via nanobots.
No, not obviously; I can’t say I’ve ever seen anyone else claim to completely condition their concern for other people on the possession of similar reflective preferences.
(Or is your point that they probably wouldn’t stay people for very long, if given the means to act on their reflective preferences? That wouldn’t make it OK to kill them before then, and it would probably constitute undesirable True PD defection to do so afterwards.)
Well, my above reply was a bit tongue-in-cheek. My concern for other things in general is just as complex as my morality and it contains many meta elements such as “I’m willing to modify my preference X in order to conform to your preference Y because I currently care about your utility to a certain extent”. On the simplest level, I care for things on a sliding scale that ranges from myself to rocks or Clippy AIs with no functional analogues for human psychology (pain, etc.). Somebody with a literally wildly differing reflective preference would not be a person and, as you say, would be preferably dealt with in True PD manners rather than ordinary human-human altruism contaminated interactions.
This is a very nonstandard usage; personhood is almost universally defined in terms of consciousness and cognitive capacities, and even plausibly relevant desire-like properties like boredom don’t have much to do with reflective preference/volition.
“If we knew better” is an ambiguous phrase, I probably should have used Eliezer’s original: “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. That carries a lot of baggage, at least for me.
I don’t experience (significant) desires of revenge, so I can only extrapolate from fictional evidence. Say the “someone” in question killed a loved one, and I wanted to hurt them for that. Suppose further that they were no longer able to kill anyone else. Given the time and the means to think about it clearly, I coud see that hurting them would not improve the state of the world for me, or for anyone else, and only impose further unnecessary suffering.
The (possibly flawed) assumption of CEV, as I understood it, is that if I could reason flawlessly, non-pathologically about all of my desires and preferences, I would no longer cleave to the self-undermining ones, and what remains would be compatible with the non-self-undermining desires and preferences of the rest of humanity.
Caveat: I have read the original CEV document but not quite as carefully as maybe I should have, mainly because it carried a “Warning: obsolete” label and I was expecting to come across more recent insights here.
http://plato.stanford.edu/entries/reflective-equilibrium/
I am only part way through but I really recommend that link. So far it’s really helped me think about this.
The rest of Rawls’ Theory of Justice is good too. I’m trying to figure out for myself (before I finally break down and ask) how CEV compares to the veil of ignorance.