David_Gerard comments on Question about self modifying AI getting “stuck” in religion

David_Gerard 1 Jan 2011 21:55 UTC
3 points
0
One question I’m not sure about—and remember, the comment above is just a sketch—is whether it can be formally shown that there is always a ’sploit.

(If so, then what you would need for security is to make such a ’sploit infeasible for practical purposes. The question in security is always “what’s the threat model?”)

For purposes of ’sploits on mere human minds, I think it’s enough to note that in security terms the human mind is somewhere around Windows ’98 and that general intelligence is a fairly late addition that occasionally affects what the human does.
- DanArmak 1 Jan 2011 22:08 UTC
  5 points
  0
  Parent
  There isn’t always an exploit, for certain classes of exploits.
  
  For instance, when we compile a statically checked language like Java, we guarantee that it won’t take over the VM it’s executing in. Therefore, it won’t have exploits of some varieties: for instance, we can limit its CPU time and memory use, and we can inspect and filter all its communications with any other programs or data. This is essentially a formal proof of properties of the program’s behavior.
  
  The question is, can we prove enough interesting properties about something? This depends mostly on the design of the AI mind executing (or looking at) the new ideas.
  - David_Gerard 1 Jan 2011 22:18 UTC
    5 points
    0
    Parent
    As I’ve noted, my original comment isn’t arguing what I thought I was arguing—I thought I was arguing that there’s always some sort of ’sploit, in the sense of giving the mind a bad meme that takes it over, but I was actually arguing that it can’t know there isn’t. Which is also interesting (if my logic holds), but not nearly as strong.
    
    I am very interested in the idea of whether there would always be a virulent poison meme ’sploit (even if building it would require infeasible time), but I suspect that requires a different line of argument.
    - JoshuaZ 1 Jan 2011 22:26 UTC
      5 points
      0
      Parent
      I’m not aware of anything resembling a clear enough formalism of what people mean by mind or meme to answer either your original question or this one. I suspect we don’t have anywhere near the understanding of minds in general to hope to answer the question, but my intuition is that it is the sort of question that we should be trying to answer.