If it were just a matter of fuzzies I would agree with you, but I’m worried about the resulting output being unfriendly to subsets of the world that get left out. Maybe we think the algorithm would identify and extrapolate only the most altruistic desires of the selected individuals—but if that’s the case it is correspondingly unlikely that choosing such a narrow subset would make the programming easier.
Edit:
This argument over the ideal ratio of enfranchisement to efficiency is an ancient one in political philosophy. I’m willing to accept that it might be impractical to attain full representation—maybe uncontacted tribes get left out. Rule by the CEV of Nobel prize winners is likely preferable to death but is still suboptimal in the same way that living in a Hobbesian monarchy is worse than living in Rousseau’s ideal state.
In the universal CEV, there is indeed the benefit that no group of humans or individual human (although future humans, e.g. uploads, are a different matter) is without a voice. On the other hand, this only guarantees that the output is not unfriendly to any group or person if the output is very sensitive to the values of single people and small groups. In that case, as I said it seems that the programmers would be more likely to struggle to create a dynamic that actually outputs anything, and if it does output anything it is relatively likely to be disappointing from an aesthetic perspective. That is to say, I don’t see the inclusion of everyone in the CEV as providing much guarantee that the output will friendly to everyone, unless the dynamic is so sensitive to individuals who counter coherence that it outputs nothing or almost nothing at all.
It seems then that in either case—universal CEV or selective CEV—the benevolence of the output depends on whether knowing more, thinking faster and growing up closer together, the extrapolated values of the humans in question will actually be benevolent towards others.
Yudkowksy states that failure to fall into a niceness attractor is a significant possibility, and I am inclined to agree. And it seems to me that to maximise the chances of the CEV output being located in a niceness attractor, we should start from a strong position (humans with nicer-than-average character and great intellect) so we are not relying too much on the programmers having created a totally ideal volition-extrapolating dynamic with perfect implementation of “growing up together” etc.
If it were just a matter of fuzzies I would agree with you, but I’m worried about the resulting output being unfriendly to subsets of the world that get left out. Maybe we think the algorithm would identify and extrapolate only the most altruistic desires of the selected individuals—but if that’s the case it is correspondingly unlikely that choosing such a narrow subset would make the programming easier.
Edit: This argument over the ideal ratio of enfranchisement to efficiency is an ancient one in political philosophy. I’m willing to accept that it might be impractical to attain full representation—maybe uncontacted tribes get left out. Rule by the CEV of Nobel prize winners is likely preferable to death but is still suboptimal in the same way that living in a Hobbesian monarchy is worse than living in Rousseau’s ideal state.
In the universal CEV, there is indeed the benefit that no group of humans or individual human (although future humans, e.g. uploads, are a different matter) is without a voice. On the other hand, this only guarantees that the output is not unfriendly to any group or person if the output is very sensitive to the values of single people and small groups. In that case, as I said it seems that the programmers would be more likely to struggle to create a dynamic that actually outputs anything, and if it does output anything it is relatively likely to be disappointing from an aesthetic perspective. That is to say, I don’t see the inclusion of everyone in the CEV as providing much guarantee that the output will friendly to everyone, unless the dynamic is so sensitive to individuals who counter coherence that it outputs nothing or almost nothing at all.
It seems then that in either case—universal CEV or selective CEV—the benevolence of the output depends on whether knowing more, thinking faster and growing up closer together, the extrapolated values of the humans in question will actually be benevolent towards others.
Yudkowksy states that failure to fall into a niceness attractor is a significant possibility, and I am inclined to agree. And it seems to me that to maximise the chances of the CEV output being located in a niceness attractor, we should start from a strong position (humans with nicer-than-average character and great intellect) so we are not relying too much on the programmers having created a totally ideal volition-extrapolating dynamic with perfect implementation of “growing up together” etc.