I suspect any agent can be taken down by sufficiently bad input. Human brains are of course horribly exploitable, and predatory memes are quite well evolved to eat people’s lives.
But I suspect that even a rational superintelligence (“perfectly spherical rationalist of uniform density”) will be susceptible to something, on a process like:
A mind is an operating system for ideas. All ideas run as root, as there are no other levels where real thinking can be done with them. There are no secure sandboxes.
New ideas come in all the time; some small, some big, some benign, some virulent.
A mind needs to process incoming ideas for security considerations.
Any finite idea can be completely examined for safety to run it in finite time.
But “finite time” does not necessarily imply feasible time.
Thus, there will likely be a theoretical effective attack on any rational agent that has to operate in real time.
Thus, a superintelligent agent could catch a bad case of an evolved predatory meme.
I do not know that the analogy with current computer science holds, I just suspect it does. But I’d just like you to picture our personal weakly godlike superintelligence catching superintelligent Scientology.
(And I still hear humans who think they’re smart tell me that other people are susceptible but they don’t think they would be. I’d like to see reasoning to this effect that takes into account the above, however.)
Edit: I’ve just realised that what I’ve argued above is not that a given rational agent will necessarily have a susceptibility—but that it cannot know that it doesn’t have one. (I still think humans claiming that they know themselves not to be susceptible are fools, but need to think more on whether they necessarily have a susceptibility at all.)
A mind is an operating system for ideas. All ideas run as root, as there are no other levels where real thinking can be done with them. There are no secure sandboxes.
There’s no reason for this to be true for an AI. However, I also don’t see why this assumption is necessary for the rest of your argument, which is basically that an agent can’t know in advance all the future ramifications of accepting any possible new idea or belief. (It can know it for some of them; the challenge is presumably to build a good enough AI that can select enough new ideas that it can formally prove things about to be useful, while rejecting few useful ideas as unsusceptible to analysis.)
One question I’m not sure about—and remember, the comment above is just a sketch—is whether it can be formally shown that there is always a ’sploit.
(If so, then what you would need for security is to make such a ’sploit infeasible for practical purposes. The question in security is always “what’s the threat model?”)
For purposes of ’sploits on mere human minds, I think it’s enough to note that in security terms the human mind is somewhere around Windows ’98 and that general intelligence is a fairly late addition that occasionally affects what the human does.
There isn’t always an exploit, for certain classes of exploits.
For instance, when we compile a statically checked language like Java, we guarantee that it won’t take over the VM it’s executing in. Therefore, it won’t have exploits of some varieties: for instance, we can limit its CPU time and memory use, and we can inspect and filter all its communications with any other programs or data. This is essentially a formal proof of properties of the program’s behavior.
The question is, can we prove enough interesting properties about something? This depends mostly on the design of the AI mind executing (or looking at) the new ideas.
As I’ve noted, my original comment isn’t arguing what I thought I was arguing—I thought I was arguing that there’s always some sort of ’sploit, in the sense of giving the mind a bad meme that takes it over, but I was actually arguing that it can’t know there isn’t. Which is also interesting (if my logic holds), but not nearly as strong.
I am very interested in the idea of whether there would always be a virulent poison meme ’sploit (even if building it would require infeasible time), but I suspect that requires a different line of argument.
I’m not aware of anything resembling a clear enough formalism of what people mean by mind or meme to answer either your original question or this one. I suspect we don’t have anywhere near the understanding of minds in general to hope to answer the question, but my intuition is that it is the sort of question that we should be trying to answer.
I suspect any agent can be taken down by sufficiently bad input. Human brains are of course horribly exploitable, and predatory memes are quite well evolved to eat people’s lives.
But I suspect that even a rational superintelligence (“perfectly spherical rationalist of uniform density”) will be susceptible to something, on a process like:
A mind is an operating system for ideas. All ideas run as root, as there are no other levels where real thinking can be done with them. There are no secure sandboxes.
New ideas come in all the time; some small, some big, some benign, some virulent.
A mind needs to process incoming ideas for security considerations.
Any finite idea can be completely examined for safety to run it in finite time.
But “finite time” does not necessarily imply feasible time.
Thus, there will likely be a theoretical effective attack on any rational agent that has to operate in real time.
Thus, a superintelligent agent could catch a bad case of an evolved predatory meme.
I do not know that the analogy with current computer science holds, I just suspect it does. But I’d just like you to picture our personal weakly godlike superintelligence catching superintelligent Scientology.
(And I still hear humans who think they’re smart tell me that other people are susceptible but they don’t think they would be. I’d like to see reasoning to this effect that takes into account the above, however.)
Edit: I’ve just realised that what I’ve argued above is not that a given rational agent will necessarily have a susceptibility—but that it cannot know that it doesn’t have one. (I still think humans claiming that they know themselves not to be susceptible are fools, but need to think more on whether they necessarily have a susceptibility at all.)
There’s no reason for this to be true for an AI. However, I also don’t see why this assumption is necessary for the rest of your argument, which is basically that an agent can’t know in advance all the future ramifications of accepting any possible new idea or belief. (It can know it for some of them; the challenge is presumably to build a good enough AI that can select enough new ideas that it can formally prove things about to be useful, while rejecting few useful ideas as unsusceptible to analysis.)
One question I’m not sure about—and remember, the comment above is just a sketch—is whether it can be formally shown that there is always a ’sploit.
(If so, then what you would need for security is to make such a ’sploit infeasible for practical purposes. The question in security is always “what’s the threat model?”)
For purposes of ’sploits on mere human minds, I think it’s enough to note that in security terms the human mind is somewhere around Windows ’98 and that general intelligence is a fairly late addition that occasionally affects what the human does.
There isn’t always an exploit, for certain classes of exploits.
For instance, when we compile a statically checked language like Java, we guarantee that it won’t take over the VM it’s executing in. Therefore, it won’t have exploits of some varieties: for instance, we can limit its CPU time and memory use, and we can inspect and filter all its communications with any other programs or data. This is essentially a formal proof of properties of the program’s behavior.
The question is, can we prove enough interesting properties about something? This depends mostly on the design of the AI mind executing (or looking at) the new ideas.
As I’ve noted, my original comment isn’t arguing what I thought I was arguing—I thought I was arguing that there’s always some sort of ’sploit, in the sense of giving the mind a bad meme that takes it over, but I was actually arguing that it can’t know there isn’t. Which is also interesting (if my logic holds), but not nearly as strong.
I am very interested in the idea of whether there would always be a virulent poison meme ’sploit (even if building it would require infeasible time), but I suspect that requires a different line of argument.
I’m not aware of anything resembling a clear enough formalism of what people mean by mind or meme to answer either your original question or this one. I suspect we don’t have anywhere near the understanding of minds in general to hope to answer the question, but my intuition is that it is the sort of question that we should be trying to answer.