private_messaging comments on Tool for maximizing paperclips vs a paperclip maximizer

private_messaging 12 May 2012 12:21 UTC
8 points
0
But to someone from muzzle-loader times, AK47 would look rather daemon-like, it auto reloads and fires… and to someone with a self driving battle tank squad that runs itself using an AI from Starcraft or something, the motion sensing turret is just another land mine.
- David_Gerard 12 May 2012 13:39 UTC
  5 points
  0
  Parent
  True. It’s a gradient, not entirely discrete. C was once a high-level language, now it’s a portable assembler. Tools get ridiculously more sophisticated as we ride up Moore’s Law, while still being creatures of computer science that instantiate discrete mathematics.
  
  As I said over in that other thread, a necessary (though not sufficient, I think) difference between “daemon” and “independent agent” will be the optimisation of thinking up new optimisations. I would expect that compiler writers are all over this stuff already and that there’s a considerable body of work on the subject.
  
  And then there’s deciding if a lossy optimisation will do the job, which is where as a sysadmin I would not be entirely comfortable with my tools doing this unsupervised. (loose analogy: I know I can’t tell a 320kbps MP3 from a ²⁴⁄₉₆ FLAC, but it took ten years of A/B testing on humans for MP3 encoders not to suck.)
  - private_messaging 12 May 2012 14:06 UTC
    2 points
    0
    Parent
    Hmm, in my view it is more of a goal distinction than abilities distinction.
    
    The model popular here is that of ‘expected utility maximizer’, and the ‘utility function’ is defined on the real world. Then the agent does want to build most accurate model of the real world, to be able to maximize that function the best, and the agent tries to avoid corruption of the function, etc. It also wants it’s outputs to affect the world, and if put in a box, will try to craft output to do things in the real world even if you only wanted to look at them.
    
    This is all very ontologically basic to humans. We easily philosophize about such stuff.
    
    Meanwhile, we don’t know how to do that. We don’t know how to reduce that world ‘utility’ to elementary operations performed on the sensory input (neither directly nor on meta level). The current solution involves making some part that creates/updates mathematically defined problem that other part finds mathematical solutions to, and then the solutions are shown if it is a tool or get applied to the real world if it isn’t. The wisdom of applying those solutions to the real world is an entirely separate issue. The point is that the latter works like a tool if boxed, not like a caged animal (or a caged human).
    
    edit: another problem i think is that many of the ‘difficulty of friendliness’ arguments are just special cases of general ‘difficulty of world intentionality’.
    - CarlShulman 13 May 2012 10:50 UTC
      3 points
      0
      Parent
      
      The model popular here is that of ‘expected utility maximizer’, and the ‘utility function’ is defined on the real world.
      
      I think this is a bit of a misperception stemming from the use of the “paperclip maximizer” example to illustrate things about instrumental reasoning. Certainly folk like Eliezer or Wei Dai or Stuart Armstrong or Paul Christiano have often talked about how a paperclip maximizer is much of the way to FAI (in having a world-model robust enough to use consequentialism). Note that people also like to use the AIXI framework as a model, and use it to talk about how AIXI is set up not as a paperclip maximizer but a wireheader (pornography and birth control rather than sex and offspring), with its utility function defined over sensory inputs rather than a model of the external world.
      
      For another example, when talking about the idea of creating an AI with some external reward that can be administered by humans but not as easily hacked/wireheaded by the AI itself people use the example of an AI designed to seek factors of certain specified numbers, or a proof or disproof of the Riemann hypothesis according to some internal proof-checking mechanism, etc, recognizing the role of wireheading and the difficulty of specifying goals externally rather than using simple percepts and the like.