Why should an AI have to self-modify in order to be super-intelligent?
I’m not sure where the phrase “have to” is coming from. I don’t think the expectation that we will build a self-modifying intelligence that becomes a superintelligence is because that seems like the best way to do it but because it’s the easiest way to do it, and thus the one likely to be taken first.
In broad terms, the Strong AI project is expected to look like “humans build dumb computers, humans and dumb computers build smart computers, smart computers build really smart computers.” Once you have smart computers that can build really smart computers, it looks like they will (in the sense that at least one institution with smart computers will let them, and then we have a really smart computer on our hands), and it seems likely that the modifications will occur at a level that humans are not able to manage effectively (so it really will be just smart computers making the really smart computers).
But doesn’t making the FAI self-modifying make the problem much more difficult, since how we have to figure out how to make goals stable under self-modification, which is also a very difficult problem?
Yes. This is why MIRI is interested in goal stability under self-modification.
I’m not sure where the phrase “have to” is coming from. I don’t think the expectation that we will build a self-modifying intelligence that becomes a superintelligence is because that seems like the best way to do it but because it’s the easiest way to do it, and thus the one likely to be taken first.
Yeah, I guess my real question isn’t why we think an AI would have to self-modify; my real question is why we think that would be the easiest way to do things.
An AI is just code: If the AI has the ability to write code it has the ability to self modify.
If the AI has the ability to write code and the ability to replace parts of itself with that code, then it has the ability to self-modify. This second ability is what I’m proposing to get rid of. See my other comment.
If the AI has the ability to write code and the ability to replace parts of itself with that code, then it has the ability to self-modify.
Unpack the word “itself.”
(This is basically the same response as drethelin’s, except it highlights the difficulty in drawing clear delineations between different kinds of impacts the AI can have on the word. Even if version A doesn’t alter itself, it still alters the world, and it may do so in a way that bring around version B (either indirectly or directly), and so it would help if it knew how to design B.)
Well, I’m imagining the AI as being composed of a couple of distinct parts—a decision subroutine (give it a set of options and it picks one), a thinking subroutine (give it a question and it tries to determine the answer), and a belief database. So when I say “the AI can’t modify itself”, what I mean more specifically is “none of the options given to the decision subroutine will be something that involves changing the AI’s code, or changing beliefs in unapproved ways”.
So perhaps “the AI could write some code” (meaning that the thinking algorithm creates a piece of code inside the belief database), but “the AI can’t replace parts of itself with that code” (meaning that the decision algorithm can’t make a decision to alter any of the AI’s subroutines or beliefs).
Now, certainly an out-of-the-box AI would, in theory, be able to, say, find a computer and upload some new code onto it, and that would amount to self-modification. I’m assuming we’re going to first make safe AI and then let it out of the box, rather than the other way around.
I’m not sure where the phrase “have to” is coming from. I don’t think the expectation that we will build a self-modifying intelligence that becomes a superintelligence is because that seems like the best way to do it but because it’s the easiest way to do it, and thus the one likely to be taken first.
In broad terms, the Strong AI project is expected to look like “humans build dumb computers, humans and dumb computers build smart computers, smart computers build really smart computers.” Once you have smart computers that can build really smart computers, it looks like they will (in the sense that at least one institution with smart computers will let them, and then we have a really smart computer on our hands), and it seems likely that the modifications will occur at a level that humans are not able to manage effectively (so it really will be just smart computers making the really smart computers).
Yes. This is why MIRI is interested in goal stability under self-modification.
Yeah, I guess my real question isn’t why we think an AI would have to self-modify; my real question is why we think that would be the easiest way to do things.
you’d have to actively stop it from doing so. An AI is just code: If the AI has the ability to write code it has the ability to self modify.
If the AI has the ability to write code and the ability to replace parts of itself with that code, then it has the ability to self-modify. This second ability is what I’m proposing to get rid of. See my other comment.
If an AI can’t modify its own code it can just write a new AI that can.
Unpack the word “itself.”
(This is basically the same response as drethelin’s, except it highlights the difficulty in drawing clear delineations between different kinds of impacts the AI can have on the word. Even if version A doesn’t alter itself, it still alters the world, and it may do so in a way that bring around version B (either indirectly or directly), and so it would help if it knew how to design B.)
Well, I’m imagining the AI as being composed of a couple of distinct parts—a decision subroutine (give it a set of options and it picks one), a thinking subroutine (give it a question and it tries to determine the answer), and a belief database. So when I say “the AI can’t modify itself”, what I mean more specifically is “none of the options given to the decision subroutine will be something that involves changing the AI’s code, or changing beliefs in unapproved ways”.
So perhaps “the AI could write some code” (meaning that the thinking algorithm creates a piece of code inside the belief database), but “the AI can’t replace parts of itself with that code” (meaning that the decision algorithm can’t make a decision to alter any of the AI’s subroutines or beliefs).
Now, certainly an out-of-the-box AI would, in theory, be able to, say, find a computer and upload some new code onto it, and that would amount to self-modification. I’m assuming we’re going to first make safe AI and then let it out of the box, rather than the other way around.