It must be that I am expecting the CEV will NOT reflect MY values. In particular, I am suggesting that the CEV will be too conservative in the sense of over-valuing humanity as it currently is and therefore undervaluing humaity as it eventually would be with further evolution, further self-modification.
CEV is supposed to value the same thing that humanity values, not value humanity itself. Since you and other humans value future slightly-nonhuman entities living worthwhile lives, CEV would assign value to them by extension.
Is there any kind of existence proof for a non-trivial CEV?
That’s kind of a tricky question. Humans don’t actually have utility functions, which is why the “coherent extrapolated” part is important. We don’t really know of a way to extract an underlying utility function from non-utility-maximizing agents, so I guess you could say that the answer is no. On the other hand, humans are often capable of noticing when it is pointed out to them that their choices contradict each other, and, even if they don’t actually change their behavior, can at least endorse some more consistent strategy, so it seems reasonable that a human, given enough intelligence, working memory, time to think, and something to point out inconsistencies, could come up with a consistent utility function that fits human preferences about as well as a utility function can. As far as I understand, that’s basically what CEV is.
CEV is likely to not have much in it about protecting us, even from ourselves.
Do you want to die? No? Then humanity’s CEV would assign negative utility to you dying, so an AI maximizing it would protect you from dying.
I am not saying I know it will, what I am saying is I don’t know why everybody else has already decided they can safely predict that even a human 100X or 1000X as smart as they are doesn’t crush them the way we crush a bullfrog when his stream is in the way of our road project or shopping mall.
If some attempt to extract a CEV has a result that is horrible for us, that means that our method for computing the CEV was incorrect, not that CEV would be horrible to us. In the “what would a smarter version of me decide?” formulation, that smarter version of you is supposed to have the same values you do. That might be poorly defined since humans don’t have coherent values, but CEV is defined as that which it would be awesome from our perspective for a strong AI to maximize, and using the utility function that a smarter version of ourselves would come up with is a proposed method for determining it.
Criticisms of the form “an AI maximizing our CEV would do bad thing X” involve a misunderstanding of the CEV concept. Criticisms of the form “no one has unambiguously specified a method of computing our CEV that would be sure to work, or even gotten close to doing so” I agree with.
My thought on CEV not actually including much individual protection followed something like this: I don’t want to die. I don’t want to live in a walled garden taken care of as though I was a favored pet. Apply intelligence to that and my FAI does what for me? Mostly lets me be since it is smart enough to realize that a policy of protecting my life winds up turning me into a favored pet. This is sort of the distinction ask somewhat what they want you might get stories of candy and leisure, look at them when they are happiest you might see when they are doing meaningful and difficult work and living in a healthy manner. Apply high intelligence and you are unlikely to promote candy and leisure. Ultimately, I think humanity careening along on its very own planet as the peak species, creating intelligence in the universe where previously there was none is very possibly as good as it can get for humanity, and I think it plausible FAI would be smart enough to realize that and we might be surprised how little it seemed to interfere. I also think it is pretty hard working part time to predict what something 1000X smarter than I am will conclude about human values, so I hardly imagine what I am saying is powerfully convincing to anybody who doesn’t lean that way, I’m just explaining why or how an FAI could wind up doing almost nothing, i.e. how CEV could wind up being trivially empty in a way.
THe other aspect of being empty for CEV I was not thinking our own internal contradictions although that is a good point. I was thinking disagreement across humanity. Certainly we have seen broad ranges of valuations on human life and equality and broadly different ideas about what respect should look like and what punishment should look like. THese indicate to me that a human CEV as opposed to a French CEV or even a Paris CEV, might well be quite sparse when designed to keep only what is reasonably common to all humanity and all potential humanity. If morality turns out to be more culturally determined than genetically, we could still have a CEV, but we would have to stop claiming it was human and admit it was just us, and when we said FAI we meant friendly to us but unfriendly to you. The baby-eaters might turn out to be the Indonesians or the Inuits in this case.
I know how hard it is to reach consensus in a group of humans exceeding about 20, I’m just wondering how much a more rigorous process applied across billions is going to come up with.
CEV is supposed to value the same thing that humanity values, not value humanity itself. Since you and other humans value future slightly-nonhuman entities living worthwhile lives, CEV would assign value to them by extension.
That’s kind of a tricky question. Humans don’t actually have utility functions, which is why the “coherent extrapolated” part is important. We don’t really know of a way to extract an underlying utility function from non-utility-maximizing agents, so I guess you could say that the answer is no. On the other hand, humans are often capable of noticing when it is pointed out to them that their choices contradict each other, and, even if they don’t actually change their behavior, can at least endorse some more consistent strategy, so it seems reasonable that a human, given enough intelligence, working memory, time to think, and something to point out inconsistencies, could come up with a consistent utility function that fits human preferences about as well as a utility function can. As far as I understand, that’s basically what CEV is.
Do you want to die? No? Then humanity’s CEV would assign negative utility to you dying, so an AI maximizing it would protect you from dying.
If some attempt to extract a CEV has a result that is horrible for us, that means that our method for computing the CEV was incorrect, not that CEV would be horrible to us. In the “what would a smarter version of me decide?” formulation, that smarter version of you is supposed to have the same values you do. That might be poorly defined since humans don’t have coherent values, but CEV is defined as that which it would be awesome from our perspective for a strong AI to maximize, and using the utility function that a smarter version of ourselves would come up with is a proposed method for determining it.
Criticisms of the form “an AI maximizing our CEV would do bad thing X” involve a misunderstanding of the CEV concept. Criticisms of the form “no one has unambiguously specified a method of computing our CEV that would be sure to work, or even gotten close to doing so” I agree with.
My thought on CEV not actually including much individual protection followed something like this: I don’t want to die. I don’t want to live in a walled garden taken care of as though I was a favored pet. Apply intelligence to that and my FAI does what for me? Mostly lets me be since it is smart enough to realize that a policy of protecting my life winds up turning me into a favored pet. This is sort of the distinction ask somewhat what they want you might get stories of candy and leisure, look at them when they are happiest you might see when they are doing meaningful and difficult work and living in a healthy manner. Apply high intelligence and you are unlikely to promote candy and leisure. Ultimately, I think humanity careening along on its very own planet as the peak species, creating intelligence in the universe where previously there was none is very possibly as good as it can get for humanity, and I think it plausible FAI would be smart enough to realize that and we might be surprised how little it seemed to interfere. I also think it is pretty hard working part time to predict what something 1000X smarter than I am will conclude about human values, so I hardly imagine what I am saying is powerfully convincing to anybody who doesn’t lean that way, I’m just explaining why or how an FAI could wind up doing almost nothing, i.e. how CEV could wind up being trivially empty in a way.
THe other aspect of being empty for CEV I was not thinking our own internal contradictions although that is a good point. I was thinking disagreement across humanity. Certainly we have seen broad ranges of valuations on human life and equality and broadly different ideas about what respect should look like and what punishment should look like. THese indicate to me that a human CEV as opposed to a French CEV or even a Paris CEV, might well be quite sparse when designed to keep only what is reasonably common to all humanity and all potential humanity. If morality turns out to be more culturally determined than genetically, we could still have a CEV, but we would have to stop claiming it was human and admit it was just us, and when we said FAI we meant friendly to us but unfriendly to you. The baby-eaters might turn out to be the Indonesians or the Inuits in this case.
I know how hard it is to reach consensus in a group of humans exceeding about 20, I’m just wondering how much a more rigorous process applied across billions is going to come up with.
You can just average across each individual.
Yes, “humanity” should be interpreted as referring to the current population.