CarlShulman comments on Which drives can survive intelligence’s self modification?

CarlShulman 6 Mar 2012 18:51 UTC
13 points
0
This has been discussed at the FHI and SIAI. If the AI wireheads but is motivated to continue wireheading, then it has reason to destroy humanity and colonize the galaxy to eliminate potential threats. See my short paper on this (in part). Wireheading which prevents further actions (and takes place before the creation of surrogate AIs to protect the wireheading system) can just be thought of as the AI destroying itself.

Some have also hoped that unexpectedly rapidly self-improving AI might be like this, but I would tend to suspect that developers would just tweak parameters until they got a non-suicidal AI. An AI intentionally designed to try to destroy itself, but constrained from doing so (perhaps rewarded with the chance to destroy itself for good behavior) might be a bit easier to constrain than a survival machine, but still horribly dangerous, with many failure modes left untouched.
- Dmytry 6 Mar 2012 21:15 UTC
  1 point
  0
  Parent
  You’re proposing that the AI spontaneously adopts maximization of bliss*time instead of maximization of bliss. If the AI is prone to this sort of goal-switching, then not even the FAI appears safe (as the FAI for example could opt to put humanity into suspended storage until it colonizes the galaxy and eliminates the threats, even if it’s chances to do so appear to be small, given the dis-utility of letting humans multiply before potential battle with alien AI). It is a generic counter argument to any sort of non-dangerous AI that the AI would suddenly and on it’s own adopt some goals that we—the survival machines—have.
  
  We humans have self preservation so ingrained in us, to the point that it is hard for us to see that time does not have any inherent value of it’s own.
  - CarlShulman 6 Mar 2012 21:28 UTC
    1 point
    0
    Parent
    
    You’re proposing that the AI spontaneously adopts maximization of bliss*time instead of maximization of bliss.
    
    No, I’m discussing a variety of different behaviors people call “wireheading” that might emerge from different AI architectures, in the alternative.
    - Dmytry 6 Mar 2012 21:38 UTC
      1 point
      0
      Parent
      Why you propose to call it ‘destroying itself’ and ‘suicidal’ though?
      
      What is left of your argument if we ban apriori special treatment of the t coordinate by AI (why should it care about the length of the bliss in time rather than volume of the bliss in space?), and use of loaded concepts to which our own intelligence has strong aversion like ‘destroying itself’?
      
      Also, btw, for the FAI there’s the problem that they may want to wirehead you.
      - faul_sname 7 Mar 2012 1:11 UTC
        0 points
        0
        Parent
        Of the ways an AI could go bad, wireheading everyone is a fairly mild one.
        Dmytry 7 Mar 2012 19:09 UTC
        0 points
        0
        Parent
        Easy to go too far, a perfect wireheaded bliss is an end state—there’s no way but downhill when you are on top of a hill. End state as in, no further updates of any note; the clock ticking perhaps and that’s it.
- Will_Newsome 6 Mar 2012 19:44 UTC
  0 points
  0
  Parent
  
  just tweak parameters until they got a non-suicidal AI
  
  (This might be difficult unto impossibility with architectures that substantially write, rewrite, and refactor their own code. If so it might be necessary for humans to solve the grounding problem themselves rather than leave it to an AI, in which case we might have substantially more time until uFAI.)