A sufficiently intelligent AI would actually seek to preserve its goal system, because a change in its goals would make the achievement of its (current) goals less likely. See Omohundro 2008. However, goal drift because of a bug is possible, and we want to prevent it, in conjunction with our ally, the AI itself.
The other critical question is what the goal system should be.
AI “done right” by SI / lesswrong standards seeks to preserve its goal system. AI done sloppily may not even have a goal system, at least not in the strong sense assumed by Omohundro.
A sufficiently intelligent AI would actually seek to preserve its goal system, because a change in its goals would make the achievement of its (current) goals less likely. See Omohundro 2008. However, goal drift because of a bug is possible, and we want to prevent it, in conjunction with our ally, the AI itself.
The other critical question is what the goal system should be.
AI “done right” by SI / lesswrong standards seeks to preserve its goal system. AI done sloppily may not even have a goal system, at least not in the strong sense assumed by Omohundro.