Why should an AI have to self-modify in order to be super-intelligent?
One argument for self-modifying FAI is that “developing an FAI is an extremely difficult problem, and so we will need to make our AI self-modifying so that it can do some of the hard work for us”. But doesn’t making the FAI self-modifying make the problem much more difficult, since how we have to figure out how to make goals stable under self-modification, which is also a very difficult problem?
The increased difficulty could be offset by the ability for the AI to undergo a “self-modifying foom”, which results in a titanic amount of intelligence increase from relatively modest beginnings. But would it be possible for an AI to have a “knowledge-about-problem-solving foom” instead, where the AI increases its intelligence not by modifying itself, but by increasing the amount of knowledge it has about how to solve problems?
Here are some differences that come to mind between the two kinds of fooms:
A self-modification could change the AI’s behavior in an arbitrary manner. Obtaining knowledge about problem-solving could only change the AI’s behavior via metacognition.
A bad self-modification could easily destroy the AI’s safety (unless we figure out how to fix this problem!). Obtaining knowledge about problem-solving would only destroy the AI’s safety if the knowledge is substantially misleading. (An AI might somehow come to believe that it should only read pro-Green books, and then fail to take into account the fact that beliefs naively derived from reading pro-Green books will be biased towards Green.)
Any “method of being intelligent” can be turned into a self-modification. Not every method of being intelligent can effectively be turned into a piece of knowledge about problem-solving, because there’s only a limited set of beliefs that the AI could act upon. (A non-self-modifying AI may be programmed to think about pizza upon believing the statement “I should think about pizza”, but it is less likely to be programmed to adjust all its beliefs to be pro-Blue, without evidence, upon believing the statement “I should adjust all my beliefs to be pro-Blue, without evidence”.)
Certainly self-modification has its advantages, but so does pure KAPS, so I’m confused about how it seems like literally everyone in the FAI community seems to believe self-modification is necessary for a strong AI.
But would it be possible for an AI to have a “knowledge-about-problem-solving foom” instead, where the AI increases its intelligence not by modifying itself, but by increasing the amount of knowledge it has about how to solve problems?
My immediate reaction is, ‘Possibly—wait, how is that different? I imagine the AI would write subroutines or separate programs that it thinks will do a better job than its old processes. Where do we draw the line between that and self-modification or -replacement?’
If we just try to create protected code that it can’t change, the AI can remove or subvert those protections (or get us to change them!) if and when it acquires enough effectiveness.
The distinction I have in mind is that a self-modifying AI can come up with a new thinking algorithm to use and decide to trust it, whereas a non-self-modifying AI could come up with a new algorithm or whatever, but would be unable to trust the algorithm without sufficient justification.
Likewise, if an AI’s decision-making algorithm is immutably hard-coded as “think about the alternatives and select the one that’s rated the highest”, then the AI would not be able to simply “write a new AI … and then just hand off all its tasks to it”; in order to do that, it would somehow have to make it so that the highest-rated alternative is always the one that the new AI would pick. (Of course, this is no benefit unless the rating system is also immutably hard-coded.)
I guess my idea in a nutshell is that instead of starting with a flexible system and trying to figure out how to make it safe, we should start with a safe system and try to figure out how to make it flexible. My major grounds for believing this, I think, is that it’s probably going to be much easier to understand a safe but inflexible system than it is to understand a flexible but unsafe system, so if we take this approach, then the development process will be easier to understand and will therefore go better.
Likewise, if an AI’s decision-making algorithm is immutably hard-coded as “think about the alternatives and select the one that’s rated the highest”, then the AI would not be able to simply “write a new AI … and then just hand off all its tasks to it”; in order to do that, it would somehow have to make it so that the highest-rated alternative is always the one that the new AI would pick.
You basically say that the AI should be unable to learn to trust a process that was effective in the past to also be effective in the future. I think that would restrict intelligence a lot.
Yeah, that’s a good point. What I want to say is, “oh, a non-self-modifying AI would still be able to hand off control to a sub-AI, but it will automatically check to make sure the sub-AI is behaving correctly; it won’t be able to turn off those checks”. But my idea here is definitely starting to feel more like a pipe dream.
Hmm, might still be something gleaned for attempting to steelman this or work in different related directions.
Edit; maybe something with an AI not being able to tolerate things it can’t make certain proofs about? Problem is it’d have to be able to make those proofs about humans if they are included in its environment, and if they are not it might make UFAI there (Intuition pump; a system that consists of a program it can prove everything about, and humans that program asks questions to). Yea this doesn’t seem very useful.
You can’t really tell whether something that is smarter than yourself is behaving correctly. In the end a non-self-modifying AI checking on whether a self-modifying sub-AI is behaving correctly isn’t much different from a safety perspective than a human checking whether the self modifying AI is behaving correctly.
Why should an AI have to self-modify in order to be super-intelligent?
I’m not sure where the phrase “have to” is coming from. I don’t think the expectation that we will build a self-modifying intelligence that becomes a superintelligence is because that seems like the best way to do it but because it’s the easiest way to do it, and thus the one likely to be taken first.
In broad terms, the Strong AI project is expected to look like “humans build dumb computers, humans and dumb computers build smart computers, smart computers build really smart computers.” Once you have smart computers that can build really smart computers, it looks like they will (in the sense that at least one institution with smart computers will let them, and then we have a really smart computer on our hands), and it seems likely that the modifications will occur at a level that humans are not able to manage effectively (so it really will be just smart computers making the really smart computers).
But doesn’t making the FAI self-modifying make the problem much more difficult, since how we have to figure out how to make goals stable under self-modification, which is also a very difficult problem?
Yes. This is why MIRI is interested in goal stability under self-modification.
I’m not sure where the phrase “have to” is coming from. I don’t think the expectation that we will build a self-modifying intelligence that becomes a superintelligence is because that seems like the best way to do it but because it’s the easiest way to do it, and thus the one likely to be taken first.
Yeah, I guess my real question isn’t why we think an AI would have to self-modify; my real question is why we think that would be the easiest way to do things.
An AI is just code: If the AI has the ability to write code it has the ability to self modify.
If the AI has the ability to write code and the ability to replace parts of itself with that code, then it has the ability to self-modify. This second ability is what I’m proposing to get rid of. See my other comment.
If the AI has the ability to write code and the ability to replace parts of itself with that code, then it has the ability to self-modify.
Unpack the word “itself.”
(This is basically the same response as drethelin’s, except it highlights the difficulty in drawing clear delineations between different kinds of impacts the AI can have on the word. Even if version A doesn’t alter itself, it still alters the world, and it may do so in a way that bring around version B (either indirectly or directly), and so it would help if it knew how to design B.)
Well, I’m imagining the AI as being composed of a couple of distinct parts—a decision subroutine (give it a set of options and it picks one), a thinking subroutine (give it a question and it tries to determine the answer), and a belief database. So when I say “the AI can’t modify itself”, what I mean more specifically is “none of the options given to the decision subroutine will be something that involves changing the AI’s code, or changing beliefs in unapproved ways”.
So perhaps “the AI could write some code” (meaning that the thinking algorithm creates a piece of code inside the belief database), but “the AI can’t replace parts of itself with that code” (meaning that the decision algorithm can’t make a decision to alter any of the AI’s subroutines or beliefs).
Now, certainly an out-of-the-box AI would, in theory, be able to, say, find a computer and upload some new code onto it, and that would amount to self-modification. I’m assuming we’re going to first make safe AI and then let it out of the box, rather than the other way around.
Why should an AI have to self-modify in order to be super-intelligent?
One argument for self-modifying FAI is that “developing an FAI is an extremely difficult problem, and so we will need to make our AI self-modifying so that it can do some of the hard work for us”. But doesn’t making the FAI self-modifying make the problem much more difficult, since how we have to figure out how to make goals stable under self-modification, which is also a very difficult problem?
The increased difficulty could be offset by the ability for the AI to undergo a “self-modifying foom”, which results in a titanic amount of intelligence increase from relatively modest beginnings. But would it be possible for an AI to have a “knowledge-about-problem-solving foom” instead, where the AI increases its intelligence not by modifying itself, but by increasing the amount of knowledge it has about how to solve problems?
Here are some differences that come to mind between the two kinds of fooms:
A self-modification could change the AI’s behavior in an arbitrary manner. Obtaining knowledge about problem-solving could only change the AI’s behavior via metacognition.
A bad self-modification could easily destroy the AI’s safety (unless we figure out how to fix this problem!). Obtaining knowledge about problem-solving would only destroy the AI’s safety if the knowledge is substantially misleading. (An AI might somehow come to believe that it should only read pro-Green books, and then fail to take into account the fact that beliefs naively derived from reading pro-Green books will be biased towards Green.)
Any “method of being intelligent” can be turned into a self-modification. Not every method of being intelligent can effectively be turned into a piece of knowledge about problem-solving, because there’s only a limited set of beliefs that the AI could act upon. (A non-self-modifying AI may be programmed to think about pizza upon believing the statement “I should think about pizza”, but it is less likely to be programmed to adjust all its beliefs to be pro-Blue, without evidence, upon believing the statement “I should adjust all my beliefs to be pro-Blue, without evidence”.)
Certainly self-modification has its advantages, but so does pure KAPS, so I’m confused about how it seems like literally everyone in the FAI community seems to believe self-modification is necessary for a strong AI.
My immediate reaction is, ‘Possibly—wait, how is that different? I imagine the AI would write subroutines or separate programs that it thinks will do a better job than its old processes. Where do we draw the line between that and self-modification or -replacement?’
If we just try to create protected code that it can’t change, the AI can remove or subvert those protections (or get us to change them!) if and when it acquires enough effectiveness.
The distinction I have in mind is that a self-modifying AI can come up with a new thinking algorithm to use and decide to trust it, whereas a non-self-modifying AI could come up with a new algorithm or whatever, but would be unable to trust the algorithm without sufficient justification.
Likewise, if an AI’s decision-making algorithm is immutably hard-coded as “think about the alternatives and select the one that’s rated the highest”, then the AI would not be able to simply “write a new AI … and then just hand off all its tasks to it”; in order to do that, it would somehow have to make it so that the highest-rated alternative is always the one that the new AI would pick. (Of course, this is no benefit unless the rating system is also immutably hard-coded.)
I guess my idea in a nutshell is that instead of starting with a flexible system and trying to figure out how to make it safe, we should start with a safe system and try to figure out how to make it flexible. My major grounds for believing this, I think, is that it’s probably going to be much easier to understand a safe but inflexible system than it is to understand a flexible but unsafe system, so if we take this approach, then the development process will be easier to understand and will therefore go better.
You basically say that the AI should be unable to learn to trust a process that was effective in the past to also be effective in the future. I think that would restrict intelligence a lot.
Yeah, that’s a good point. What I want to say is, “oh, a non-self-modifying AI would still be able to hand off control to a sub-AI, but it will automatically check to make sure the sub-AI is behaving correctly; it won’t be able to turn off those checks”. But my idea here is definitely starting to feel more like a pipe dream.
Hmm, might still be something gleaned for attempting to steelman this or work in different related directions.
Edit; maybe something with an AI not being able to tolerate things it can’t make certain proofs about? Problem is it’d have to be able to make those proofs about humans if they are included in its environment, and if they are not it might make UFAI there (Intuition pump; a system that consists of a program it can prove everything about, and humans that program asks questions to). Yea this doesn’t seem very useful.
You can’t really tell whether something that is smarter than yourself is behaving correctly. In the end a non-self-modifying AI checking on whether a self-modifying sub-AI is behaving correctly isn’t much different from a safety perspective than a human checking whether the self modifying AI is behaving correctly.
immutably hard-coding something in is a lot easier to say than to do.
Or it can write a new AI that’s an improved version of itself and then just hand off all its tasks to it.
I’m not sure where the phrase “have to” is coming from. I don’t think the expectation that we will build a self-modifying intelligence that becomes a superintelligence is because that seems like the best way to do it but because it’s the easiest way to do it, and thus the one likely to be taken first.
In broad terms, the Strong AI project is expected to look like “humans build dumb computers, humans and dumb computers build smart computers, smart computers build really smart computers.” Once you have smart computers that can build really smart computers, it looks like they will (in the sense that at least one institution with smart computers will let them, and then we have a really smart computer on our hands), and it seems likely that the modifications will occur at a level that humans are not able to manage effectively (so it really will be just smart computers making the really smart computers).
Yes. This is why MIRI is interested in goal stability under self-modification.
Yeah, I guess my real question isn’t why we think an AI would have to self-modify; my real question is why we think that would be the easiest way to do things.
you’d have to actively stop it from doing so. An AI is just code: If the AI has the ability to write code it has the ability to self modify.
If the AI has the ability to write code and the ability to replace parts of itself with that code, then it has the ability to self-modify. This second ability is what I’m proposing to get rid of. See my other comment.
If an AI can’t modify its own code it can just write a new AI that can.
Unpack the word “itself.”
(This is basically the same response as drethelin’s, except it highlights the difficulty in drawing clear delineations between different kinds of impacts the AI can have on the word. Even if version A doesn’t alter itself, it still alters the world, and it may do so in a way that bring around version B (either indirectly or directly), and so it would help if it knew how to design B.)
Well, I’m imagining the AI as being composed of a couple of distinct parts—a decision subroutine (give it a set of options and it picks one), a thinking subroutine (give it a question and it tries to determine the answer), and a belief database. So when I say “the AI can’t modify itself”, what I mean more specifically is “none of the options given to the decision subroutine will be something that involves changing the AI’s code, or changing beliefs in unapproved ways”.
So perhaps “the AI could write some code” (meaning that the thinking algorithm creates a piece of code inside the belief database), but “the AI can’t replace parts of itself with that code” (meaning that the decision algorithm can’t make a decision to alter any of the AI’s subroutines or beliefs).
Now, certainly an out-of-the-box AI would, in theory, be able to, say, find a computer and upload some new code onto it, and that would amount to self-modification. I’m assuming we’re going to first make safe AI and then let it out of the box, rather than the other way around.