XiXiDu comments on Why not just write failsafe rules into the superintelligent machine?

XiXiDu 9 Mar 2011 17:58 UTC
1 point

And if what you want to build is an optimizing agent that’s better at solving problems than you are...

Just some miscellaneous thoughts:

I always flinch when I read something along those lines. It sounds like you could come up with something that by definition you shouldn’t be able to come up with. I know that many humans can do better than one human alone but if it comes to the question of proving goal stability of superior agents then any agent will either have to face the same bottleneck or it isn’t an important problem at all. By definition we are unable to guess what a superior agent will be able to devise to get around failsafes, yet that will be the case for every iteration. Consequently, goal stability, or intelligence-independent ‘friendliness’ is a requirement for an intelligence explosion to happen in the first place. A paperclip maximizer wants to guarantee that its goal of maximizing paperclips will be preserved when it improves itself. By definition a paperclip maximizer is unfriendly, does not feature inherent goal-stability and therefore has to use its initial seed intelligence to devise a sort of paperclip-friendliness. And if goal-stability isn’t independent of the level of intelligence then that is another bottleneck that will slow down recursive self-improvement.
- TheOtherDave 9 Mar 2011 18:38 UTC
  1 point
  Parent
  I am having a lot of trouble following your point, here, or how what you’re saying relates to the line you quote.
  
  Taking a stab at it...
  
  I can see how, in some sense, goal stability is a prerequisite for an “intelligence explosion”.
  
  At least, if a system S that optimizes for a goal G is capable of building a new system S2 that is better suited to optimize for G, and this process continues through S3, S4 .. Sn, that’s as good a definition of an “intelligence explosion” as any I can think of off-hand.
  
  And it’s hard to see how that process gets off the ground without G in the first place… and I can see where if G keeps changing at each iteration, there’s no guarantee that progress is being made… S might not be exploding at all, just shuffling pathetically back and forth between minor variations on the same few states.
  
  So if any of that is relevant to what you were getting at, I guess I’m with you so far.
  
  But this account seems to ignore the possibility of S1, optimizing for G1, building S2, which is better at optimizing for the class of goals Gn, and in the process (for whatever reason) losing its focus on G1 and instead optimizing for G2. And here again this process could repeat through (S3, G3), (S4,G4), etc.
  
  In that case you would have an intelligence explosion, even though you would not have goal stability.
  
  All of that said, I’m not sure any of that is even relevant to what you were talking about.
  
  Do you think you could repeat what you said without using the words “intelligence” or “friendly”? I suspect you are implying things with those words that I am not inferring.