Lets taboo the words narrow AI, AGI, Oracle, as I think they’re getting in the way.
Lets say you’ve found a reflective decision theory and found a pretty good computational approximation. You could go off and try to find the perfect utility function and link the two together and press “start”, this is what we normally imagine doing.
Alternatively, you could code up the decision theory and run it for “one step” with a predefined input and set of things in memory (which might include an approximate utility function or a set of utility values for different options etc.) and see what it outputs as the next action to take. Importantly, the program doesn’t do anything that it thinks of as in its option set (like “rewrite your code so that it’s faster” or “turn on sensor X”, or “press the blue button”), it just returns which option it deems best. You take this output and do what you want with it: maybe use it, discard it or maybe re run the decision theory with modified inputs. One of its outputs might be “replace my program with code X because it’s smarter” (so I don’t think it’s useful to call it narrow AI”), but it doesn’t automatically replace its code as such.
I don’t understand what you mean by “running a decision theory for one step”. Assume you give the system a problem (in the form of a utility function to maximize or a goal to achieve), and ask to find the best next action to take. This makes the system an Agent with the goal of finding the best next action (subject to all additional constraints you may specify, like maximal computation time, etc). If the system is really intelligent, and the problem (of finding the best next action) is hard so the system needs more resources, and there is any hole anywhere in the box, then the system will get out.
Regarding self-modification, I don’t think it is relevant to the safety issues by itself. It is only important in that using it, the system may become very intelligent very fast. The danger is intelligence, not self-modification. Also, a sufficiently intelligent program may be able to create and run a new program without your knowledge or consent, either by simulating an interpreter (slow, but if the new program makes exponential time-saving, this would still make a huge impact), or by finding and exploiting bugs in its own programming.
Lets taboo the words narrow AI, AGI, Oracle, as I think they’re getting in the way.
Lets say you’ve found a reflective decision theory and found a pretty good computational approximation. You could go off and try to find the perfect utility function and link the two together and press “start”, this is what we normally imagine doing.
Alternatively, you could code up the decision theory and run it for “one step” with a predefined input and set of things in memory (which might include an approximate utility function or a set of utility values for different options etc.) and see what it outputs as the next action to take. Importantly, the program doesn’t do anything that it thinks of as in its option set (like “rewrite your code so that it’s faster” or “turn on sensor X”, or “press the blue button”), it just returns which option it deems best. You take this output and do what you want with it: maybe use it, discard it or maybe re run the decision theory with modified inputs. One of its outputs might be “replace my program with code X because it’s smarter” (so I don’t think it’s useful to call it narrow AI”), but it doesn’t automatically replace its code as such.
I don’t understand what you mean by “running a decision theory for one step”. Assume you give the system a problem (in the form of a utility function to maximize or a goal to achieve), and ask to find the best next action to take. This makes the system an Agent with the goal of finding the best next action (subject to all additional constraints you may specify, like maximal computation time, etc). If the system is really intelligent, and the problem (of finding the best next action) is hard so the system needs more resources, and there is any hole anywhere in the box, then the system will get out.
Regarding self-modification, I don’t think it is relevant to the safety issues by itself. It is only important in that using it, the system may become very intelligent very fast. The danger is intelligence, not self-modification. Also, a sufficiently intelligent program may be able to create and run a new program without your knowledge or consent, either by simulating an interpreter (slow, but if the new program makes exponential time-saving, this would still make a huge impact), or by finding and exploiting bugs in its own programming.