Why would CEV operate on humans that do exist, and not on humans that could exist?
To do the latter, you would need a definition of “human” that can not just distinguish existing humans from existing non-humans, but also pick out all human minds from the space of all possible minds. I don’t see how to specify this definition. (Is this problem not obvious to everyone else?)
For example, we might specify a prototypical human mind, and say that “human” is any mind which is less than a certain distance from the prototypical mind in design space. But then the CEV of this “humankind” is entirely dependent on the prototype that we pick. If the FAI designers are allowed to just pick any prototype they want, they can make the CEV of “humanity” come out however they wish, so they might as well have the FAI use the CEV of themselves. If they pick the prototype by taking the average of all existing humans, then that allows the same attack described in my post.
The problem is indeed there, but if the goal is to find out the human coherent extrapolated volition, then a definition of human is necessary.
If we have no way of picking out human minds from the space of all possible minds, then we don’t really know what we’re optimizing for. We can’t rule out the possibility that a human mind will come into existence that will not be (perfectly) happy with the way things turn out.* This may well be an inherent problem in CEV. If FAI will prevent such humans from coming into existence, then it has in effect enforced its own definition of a human on humanity.
But let’s try to salvage it. What if you were to use existing humans as a training set for an AI to determine what a human is and is not (assuming you can indeed carve reality/mind-space at the joints, which I am unsure about). Then you can use this definition to pick out the possible human minds from mind-space and calculate their coherent extrapolated volition.
This would be resistant to identity-space stuffing like what you describe, but not resistant to systematic wiping out of certain genes/portions of identity-space before CEV-application.
But the wiping out of genes and introduction of new ones is the very definition of evolution. We then would need to differentiate between intentional wiping out of genes by certain humans from natural wiping out of genes by reality/evolution, a rabbit-hole I can’t see the way out of, possibly a category error. If we can’t do that, we have to accept the gene-pool at the time of CEV-activation as the verdict of evolution about what a human is, which leaves a window open to gaming by genocide.
Perhaps taking as a starting poing the time of introduction of the idea of CEV as a means of preventing the possibility of manipulation would work, or perhaps trying to infer if there was any intentional gaming of CEV would also work. Actually this would deal with both genocide and stuffing without any additional steps. But this is assuming rewinding time and global knowledge of all human thoughts and memories as capabilities. Great fun :)
*coming to think of it, what guarantees that the result of CEV will not be something that some of us simply do not want? If such clusters exist, will the FAI create separate worlds for each one?
EDIT: Do you think there would be a noticeable difference between 1900AD!CEV and 2000AD!CEV?
If the FAI designers are allowed to just pick any prototype they want, they can make the CEV of “humanity” come out however they wish, so they might as well have the FAI use the CEV of themselves. If they pick the prototype by taking the average of all existing humans, then that allows the same attack described in my post.
Who ever said that CEV is about taking the average utility of all existing humans? The method of aggregating personal utilities should be determined by the extrapolation, on the basis of human cognitive architecture, and not by programmer fiat.
So how about taking all humans that do exist, determining the boundary humans, and using the entire section of identity-space delineated by them? That is still vulnerable to Dr. Evil killing everyone, but not to the trillion near-copy strategy. No?
To do the latter, you would need a definition of “human” that can not just distinguish existing humans from existing non-humans, but also pick out all human minds from the space of all possible minds. I don’t see how to specify this definition. (Is this problem not obvious to everyone else?)
For example, we might specify a prototypical human mind, and say that “human” is any mind which is less than a certain distance from the prototypical mind in design space. But then the CEV of this “humankind” is entirely dependent on the prototype that we pick. If the FAI designers are allowed to just pick any prototype they want, they can make the CEV of “humanity” come out however they wish, so they might as well have the FAI use the CEV of themselves. If they pick the prototype by taking the average of all existing humans, then that allows the same attack described in my post.
The problem is indeed there, but if the goal is to find out the human coherent extrapolated volition, then a definition of human is necessary.
If we have no way of picking out human minds from the space of all possible minds, then we don’t really know what we’re optimizing for. We can’t rule out the possibility that a human mind will come into existence that will not be (perfectly) happy with the way things turn out.* This may well be an inherent problem in CEV. If FAI will prevent such humans from coming into existence, then it has in effect enforced its own definition of a human on humanity.
But let’s try to salvage it. What if you were to use existing humans as a training set for an AI to determine what a human is and is not (assuming you can indeed carve reality/mind-space at the joints, which I am unsure about). Then you can use this definition to pick out the possible human minds from mind-space and calculate their coherent extrapolated volition.
This would be resistant to identity-space stuffing like what you describe, but not resistant to systematic wiping out of certain genes/portions of identity-space before CEV-application.
But the wiping out of genes and introduction of new ones is the very definition of evolution. We then would need to differentiate between intentional wiping out of genes by certain humans from natural wiping out of genes by reality/evolution, a rabbit-hole I can’t see the way out of, possibly a category error. If we can’t do that, we have to accept the gene-pool at the time of CEV-activation as the verdict of evolution about what a human is, which leaves a window open to gaming by genocide.
Perhaps taking as a starting poing the time of introduction of the idea of CEV as a means of preventing the possibility of manipulation would work, or perhaps trying to infer if there was any intentional gaming of CEV would also work. Actually this would deal with both genocide and stuffing without any additional steps. But this is assuming rewinding time and global knowledge of all human thoughts and memories as capabilities. Great fun :)
*coming to think of it, what guarantees that the result of CEV will not be something that some of us simply do not want? If such clusters exist, will the FAI create separate worlds for each one?
EDIT: Do you think there would be a noticeable difference between 1900AD!CEV and 2000AD!CEV?
Who ever said that CEV is about taking the average utility of all existing humans? The method of aggregating personal utilities should be determined by the extrapolation, on the basis of human cognitive architecture, and not by programmer fiat.
So how about taking all humans that do exist, determining the boundary humans, and using the entire section of identity-space delineated by them? That is still vulnerable to Dr. Evil killing everyone, but not to the trillion near-copy strategy. No?