Eliezer Yudkowsky comments on Open Thread: February 2010, part 2

Eliezer Yudkowsky 17 Feb 2010 20:10 UTC
12 points
0

A way to choose what subset of humanity gets included in CEV that doesn’t include too many superstitious/demented/vengeful/religious nutjobs and land those who implement it in infinite perfect hell.

What you’re looking for is a way to construe the extrapolated volition that washes out superstition and dementation.

To the extent that vengefulness turns out to be a simple direct value that survives under many reasonable construals, it seems to me that one simple and morally elegant solution would be to filter, not the people, but the spread of their volitions, by the test, “Would your volition take into account the volition of a human who would unconditionally take into account yours?” This filters out extrapolations that end up perfectly selfish and those which end up with frozen values irrespective of what other people think—something of a hack, but it might be that many genuine reflective equilibria are just like that, and only a values-based decision can rule them out. The “unconditional” qualifier is meant to rule out TDT-like considerations, or they could just be ruled out by fiat, i.e., we want to test for cooperation in the Prisoner’s Dilemma, not in the True Prisoner’s Dilemma.

An AI that can solve philosophy problems that are beyond the ability of the designers to even conceive

It’s possible that having a complete mind design on hand would mean that there were no philosophy problems left, since the resources that human minds have to solve philosophy problems are finite, and knowing the exact method to use to solve a philosophy problem usually makes solving it pretty straightforward (the limiting factor on philosophy problems is never computing power). The reason why I pick on this particular cited problem as problematic is that, as stated, it involves an inherent asymmetry between the problems you want the AI to solve and your own understanding of how to meta-approach those problems, which is indeed a difficult and dangerous sort of state.

All of the above working first time, without testing the entire superintelligence. (though you can test small subcomponents)

All approaches to superintelligence, without exception, have this problem. It is not quite as automatically lethal as it sounds (though it is certainly automatically lethal to all other parties’ proposals for building superintelligence). You can build in test cases and warning criteria beforehand to your heart’s content. You can detect incoherence and fail safely instead of doing something incoherent. You could, though it carries with its own set of dangers, build human checking into the system at various stages and with various degrees of information exposure. But it is the fundamental problem of superintelligence, not a problem of CEV.

And, to make it worse, if major political powers are involved, you have to solve the political problem of getting them to agree on how to skew the CEV towards a geopolitical-power-weighted set of volitions to extrapolate

I will not lend my skills to any such thing.
What links here?
- Nick_Tarleton's comment on The scourge of perverse-mindedness by simplicio (22 Mar 2010 0:24 UTC; 3 points)
- Roko 17 Feb 2010 23:52 UTC
  12 points
  0
  Parent
  
  What you’re looking for is a way to construe the extrapolated volition that washes out superstition and dementation.
  
  you could do that. But if you want a clean shirt out of the washing machine, you don’t add in a diaper with poo on it and then look for a really good laundry detergent to “wash it out”.
  
  My feeling with the CEV of humanity is that if it is highly insensitive to the set of people you extrapolate, then you lose nothing by extrapolating fewer people. On the other hand, if including more people does change the answer in a direction that you regard as bad, then you gain by excluding people with values dissimilar from yours.
  
  Furthermore, excluding people from the CEV process doesn’t mean disenfranchising them—it just means enfranchising them according to what your values count as enfranchisement.
  
  Most people in the world don’t hold our values(1). Read, e.g. Haidt on Culturally determined values. Human values are universal in form but local in content. Our should function is parochial.
  
  (1 - note—this doesn’t mean that they will be different after extrapolation. f(x) can equal f(y) for x!=y. But it does mean that they might, which is enough to give you an incentive not to include them)
  - Steve_Rayhawk 19 Feb 2010 22:44 UTC
    0 points
    0
    Parent
    
    if you want a clean shirt out of the washing machine, you don’t add in a diaper with poo on it and then look for a really good laundry detergent to “wash it out”.
    
    I want to claim that a Friendly initial dynamic should be more analogous to a biosphere-with-a-textile-industry-in-it machine than to a washing machine. How do we get clean shirts at all, in a world with dirty diapers?
    
    But then, it’s a strained analogy; it’s not like we’ve ever had a problem of garments claiming control over the biosphere and over other garments’ cleanliness before.
- Wei Dai 17 Feb 2010 20:50 UTC
  11 points
  0
  Parent
  
  I will not lend my skills to any such thing.
  
  Is that just a bargaining position, or do you truly consider that no human values surviving is preferable to allowing an “unfair” weighing of volitions?
- Roko 17 Feb 2010 23:37 UTC
  4 points
  0
  Parent
  
  I will not lend my skills to any such thing.
  
  It seems that in many scenarios, the powers that be will want in. The only scenarios where they won’t are ones where the singularity happens before they take it seriously.
  
  I am not sure how much they will screw things up if/when they do.
- Roko 17 Feb 2010 23:42 UTC
  2 points
  0
  Parent
  
  But it is the fundamental problem of superintelligence, not a problem of CEV.
  
  Upload-based route don’t suffer from this as badly, because there is inherently a continuum between “one upload running at real time speed” and “10^20 intelligence-enhanced uploads running at 10^6 times normal speed”.
- Paul Crowley 17 Feb 2010 23:05 UTC
  1 point
  0
  Parent
  “Would your volition take into account the volition of a human who would unconditionally take into account yours?”
  
  Doesn’t this still give them the freedom to weight that voilition as small as they like?