Well, I’m imagining the AI as being composed of a couple of distinct parts—a decision subroutine (give it a set of options and it picks one), a thinking subroutine (give it a question and it tries to determine the answer), and a belief database. So when I say “the AI can’t modify itself”, what I mean more specifically is “none of the options given to the decision subroutine will be something that involves changing the AI’s code, or changing beliefs in unapproved ways”.
So perhaps “the AI could write some code” (meaning that the thinking algorithm creates a piece of code inside the belief database), but “the AI can’t replace parts of itself with that code” (meaning that the decision algorithm can’t make a decision to alter any of the AI’s subroutines or beliefs).
Now, certainly an out-of-the-box AI would, in theory, be able to, say, find a computer and upload some new code onto it, and that would amount to self-modification. I’m assuming we’re going to first make safe AI and then let it out of the box, rather than the other way around.
Well, I’m imagining the AI as being composed of a couple of distinct parts—a decision subroutine (give it a set of options and it picks one), a thinking subroutine (give it a question and it tries to determine the answer), and a belief database. So when I say “the AI can’t modify itself”, what I mean more specifically is “none of the options given to the decision subroutine will be something that involves changing the AI’s code, or changing beliefs in unapproved ways”.
So perhaps “the AI could write some code” (meaning that the thinking algorithm creates a piece of code inside the belief database), but “the AI can’t replace parts of itself with that code” (meaning that the decision algorithm can’t make a decision to alter any of the AI’s subroutines or beliefs).
Now, certainly an out-of-the-box AI would, in theory, be able to, say, find a computer and upload some new code onto it, and that would amount to self-modification. I’m assuming we’re going to first make safe AI and then let it out of the box, rather than the other way around.