My main concern is that the system will be neither safe, nor useful. Its usefulness is limited by the number of tasks which could be solved in such setup. These tasks are probably some advises or projects: “should we develop wind energy” or “print the the blueprint of a nanorobot”. The main problem is that an advice can look good but have some unintended consequences, which are not obvious. For example, widespread wind farms will kill birds and result in pest proliferation. If the AI continue to work, it could adjust its advise to this new data, but the program has stopped.
In other words, sealed Oracle can give bad advice and can’t be punished for that.
There will be strong incentive by the system’s owner to disable all this safety measures, and use the system in real world continuously, so all Omohundro’s drives appear all over again, but on the level of the system’s owners.
Moreover, the system can feel rewarded now by knowing that it will in future dominate in the universe, so the system may feel the need to put a secret code in its output (if it is a long code) which will create many its copies in the wild. For example, in order to tile the universe with windfarms. If it can’t output complex code, it will be mostly useless.
There will be strong incentive by the system’s owner to disable all this safety measures
If the operators believe that without the safety measures, humanity would be wiped out, I think they won’t jettison them. More to the point, running this algorithm does not put more pressure on the operators to try out a dangerous AI. What ever incentive existed already is not the fault of this algorithm.
The problem of any human operator is other human operators, e.g. Chinese vs. American. This creates exactly the same dynamics as was explored by Omohundro: the strong incentive to grab the power and take more risky actions.
You dissect the whole system on two parts, and then claim that one of the parts is “safe”. But the same thing can be done with any AI: just say that its memory or any other part is safe.
What would constitute a solution to the problem of the race to the bottom between teams of AGI developers as they sacrifice caution to secure a strategic advantage besides the conjunction of a) technical proposals and b) multilateral treaties? Is your complaint that I make no discussion of b? I think we can focus on these two things one at a time.
There could be, in fact, many solutions, starting from prevention AI creation at all – and up to creation so many AIs that they will balance each other. I have an article with overview of possible “global” solutions.
I don’t think you should discuss different global solutions, as it would be off topic. But the discussion of the whole system of “boxed AI + AI creators” may be interesting.
My main concern is that the system will be neither safe, nor useful. Its usefulness is limited by the number of tasks which could be solved in such setup. These tasks are probably some advises or projects: “should we develop wind energy” or “print the the blueprint of a nanorobot”. The main problem is that an advice can look good but have some unintended consequences, which are not obvious. For example, widespread wind farms will kill birds and result in pest proliferation. If the AI continue to work, it could adjust its advise to this new data, but the program has stopped.
In other words, sealed Oracle can give bad advice and can’t be punished for that.
There will be strong incentive by the system’s owner to disable all this safety measures, and use the system in real world continuously, so all Omohundro’s drives appear all over again, but on the level of the system’s owners.
Moreover, the system can feel rewarded now by knowing that it will in future dominate in the universe, so the system may feel the need to put a secret code in its output (if it is a long code) which will create many its copies in the wild. For example, in order to tile the universe with windfarms. If it can’t output complex code, it will be mostly useless.
linking this to the discussion between Wei Dai and me here.
If the operators believe that without the safety measures, humanity would be wiped out, I think they won’t jettison them. More to the point, running this algorithm does not put more pressure on the operators to try out a dangerous AI. What ever incentive existed already is not the fault of this algorithm.
The problem of any human operator is other human operators, e.g. Chinese vs. American. This creates exactly the same dynamics as was explored by Omohundro: the strong incentive to grab the power and take more risky actions.
You dissect the whole system on two parts, and then claim that one of the parts is “safe”. But the same thing can be done with any AI: just say that its memory or any other part is safe.
If we look at the whole system including AI and its human operators it will have unsafe dynamics as whole. I wrote about it more in “Military AI as convergent goal of AI development”
What would constitute a solution to the problem of the race to the bottom between teams of AGI developers as they sacrifice caution to secure a strategic advantage besides the conjunction of a) technical proposals and b) multilateral treaties? Is your complaint that I make no discussion of b? I think we can focus on these two things one at a time.
There could be, in fact, many solutions, starting from prevention AI creation at all – and up to creation so many AIs that they will balance each other. I have an article with overview of possible “global” solutions.
I don’t think you should discuss different global solutions, as it would be off topic. But the discussion of the whole system of “boxed AI + AI creators” may be interesting.
I do not see any mapping from these concepts to its action-selection-criterion of maximizing the rewards for its current episode.