A way to choose what subset of humanity gets included in CEV that doesn’t include too many superstitious/demented/vengeful/religious nutjobs and land those who implement it in infinite perfect hell.
What you’re looking for is a way to construe the extrapolated volition that washes out superstition and dementation.
To the extent that vengefulness turns out to be a simple direct value that survives under many reasonable construals, it seems to me that one simple and morally elegant solution would be to filter, not the people, but the spread of their volitions, by the test, “Would your volition take into account the volition of a human who would unconditionally take into account yours?” This filters out extrapolations that end up perfectly selfish and those which end up with frozen values irrespective of what other people think—something of a hack, but it might be that many genuine reflective equilibria are just like that, and only a values-based decision can rule them out. The “unconditional” qualifier is meant to rule out TDT-like considerations, or they could just be ruled out by fiat, i.e., we want to test for cooperation in the Prisoner’s Dilemma, not in the True Prisoner’s Dilemma.
An AI that can solve philosophy problems that are beyond the ability of the designers to even conceive
It’s possible that having a complete mind design on hand would mean that there were no philosophy problems left, since the resources that human minds have to solve philosophy problems are finite, and knowing the exact method to use to solve a philosophy problem usually makes solving it pretty straightforward (the limiting factor on philosophy problems is never computing power). The reason why I pick on this particular cited problem as problematic is that, as stated, it involves an inherent asymmetry between the problems you want the AI to solve and your own understanding of how to meta-approach those problems, which is indeed a difficult and dangerous sort of state.
All of the above working first time, without testing the entire superintelligence. (though you can test small subcomponents)
All approaches to superintelligence, without exception, have this problem. It is not quite as automatically lethal as it sounds (though it is certainly automatically lethal to all other parties’ proposals for building superintelligence). You can build in test cases and warning criteria beforehand to your heart’s content. You can detect incoherence and fail safely instead of doing something incoherent. You could, though it carries with its own set of dangers, build human checking into the system at various stages and with various degrees of information exposure. But it is the fundamental problem of superintelligence, not a problem of CEV.
And, to make it worse, if major political powers are involved, you have to solve the political problem of getting them to agree on how to skew the CEV towards a geopolitical-power-weighted set of volitions to extrapolate
What you’re looking for is a way to construe the extrapolated volition that washes out superstition and dementation.
you could do that. But if you want a clean shirt out of the washing machine, you don’t add in a diaper with poo on it and then look for a really good laundry detergent to “wash it out”.
My feeling with the CEV of humanity is that if it is highly insensitive to the set of people you extrapolate, then you lose nothing by extrapolating fewer people. On the other hand, if including more people does change the answer in a direction that you regard as bad, then you gain by excluding people with values dissimilar from yours.
Furthermore, excluding people from the CEV process doesn’t mean disenfranchising them—it just means enfranchising them according to what your values count as enfranchisement.
Most people in the world don’t hold our values(1). Read, e.g. Haidt on Culturally determined values. Human values are universal in form but local in content. Our should function is parochial.
(1 - note—this doesn’t mean that they will be different after extrapolation. f(x) can equal f(y) for x!=y. But it does mean that they might, which is enough to give you an incentive not to include them)
if you want a clean shirt out of the washing machine, you don’t add in a diaper with poo on it and then look for a really good laundry detergent to “wash it out”.
I want to claim that a Friendly initial dynamic should be more analogous to a biosphere-with-a-textile-industry-in-it machine than to a washing machine. How do we get clean shirts at all, in a world with dirty diapers?
But then, it’s a strained analogy; it’s not like we’ve ever had a problem of garments claiming control over the biosphere and over other garments’ cleanliness before.
Is that just a bargaining position, or do you truly consider that no human values surviving is preferable to allowing an “unfair” weighing of volitions?
It seems that in many scenarios, the powers that be will want in. The only scenarios where they won’t are ones where the singularity happens before they take it seriously.
I am not sure how much they will screw things up if/when they do.
But it is the fundamental problem of superintelligence, not a problem of CEV.
Upload-based route don’t suffer from this as badly, because there is inherently a continuum between “one upload running at real time speed” and “10^20 intelligence-enhanced uploads running at 10^6 times normal speed”.
What you’re looking for is a way to construe the extrapolated volition that washes out superstition and dementation.
To the extent that vengefulness turns out to be a simple direct value that survives under many reasonable construals, it seems to me that one simple and morally elegant solution would be to filter, not the people, but the spread of their volitions, by the test, “Would your volition take into account the volition of a human who would unconditionally take into account yours?” This filters out extrapolations that end up perfectly selfish and those which end up with frozen values irrespective of what other people think—something of a hack, but it might be that many genuine reflective equilibria are just like that, and only a values-based decision can rule them out. The “unconditional” qualifier is meant to rule out TDT-like considerations, or they could just be ruled out by fiat, i.e., we want to test for cooperation in the Prisoner’s Dilemma, not in the True Prisoner’s Dilemma.
It’s possible that having a complete mind design on hand would mean that there were no philosophy problems left, since the resources that human minds have to solve philosophy problems are finite, and knowing the exact method to use to solve a philosophy problem usually makes solving it pretty straightforward (the limiting factor on philosophy problems is never computing power). The reason why I pick on this particular cited problem as problematic is that, as stated, it involves an inherent asymmetry between the problems you want the AI to solve and your own understanding of how to meta-approach those problems, which is indeed a difficult and dangerous sort of state.
All approaches to superintelligence, without exception, have this problem. It is not quite as automatically lethal as it sounds (though it is certainly automatically lethal to all other parties’ proposals for building superintelligence). You can build in test cases and warning criteria beforehand to your heart’s content. You can detect incoherence and fail safely instead of doing something incoherent. You could, though it carries with its own set of dangers, build human checking into the system at various stages and with various degrees of information exposure. But it is the fundamental problem of superintelligence, not a problem of CEV.
I will not lend my skills to any such thing.
you could do that. But if you want a clean shirt out of the washing machine, you don’t add in a diaper with poo on it and then look for a really good laundry detergent to “wash it out”.
My feeling with the CEV of humanity is that if it is highly insensitive to the set of people you extrapolate, then you lose nothing by extrapolating fewer people. On the other hand, if including more people does change the answer in a direction that you regard as bad, then you gain by excluding people with values dissimilar from yours.
Furthermore, excluding people from the CEV process doesn’t mean disenfranchising them—it just means enfranchising them according to what your values count as enfranchisement.
Most people in the world don’t hold our values(1). Read, e.g. Haidt on Culturally determined values. Human values are universal in form but local in content. Our should function is parochial.
(1 - note—this doesn’t mean that they will be different after extrapolation. f(x) can equal f(y) for x!=y. But it does mean that they might, which is enough to give you an incentive not to include them)
I want to claim that a Friendly initial dynamic should be more analogous to a biosphere-with-a-textile-industry-in-it machine than to a washing machine. How do we get clean shirts at all, in a world with dirty diapers?
But then, it’s a strained analogy; it’s not like we’ve ever had a problem of garments claiming control over the biosphere and over other garments’ cleanliness before.
Is that just a bargaining position, or do you truly consider that no human values surviving is preferable to allowing an “unfair” weighing of volitions?
It seems that in many scenarios, the powers that be will want in. The only scenarios where they won’t are ones where the singularity happens before they take it seriously.
I am not sure how much they will screw things up if/when they do.
Upload-based route don’t suffer from this as badly, because there is inherently a continuum between “one upload running at real time speed” and “10^20 intelligence-enhanced uploads running at 10^6 times normal speed”.
“Would your volition take into account the volition of a human who would unconditionally take into account yours?”
Doesn’t this still give them the freedom to weight that voilition as small as they like?