The problem here is whether even a cautious programmer will be able to reliably determine when an AI is sufficiently advanced that the AI can deceive the programmer over whether the programmer has been successful in redefining the AI’s core purpose.
One would hope that the programmer would resist the AI trying to tempt the programmer into allowing the AI to grow to beyond that point before the programmer has set the core purpose that they want the AI to have for the long term.
One lesson you could draw from this is that, as part of your definition of what a “paperclip” is, you should include the AI putting a high value upon being honest with the programmer (about its aims, tactics and current ability levels) and not deliberately trying to game, tempt or manipulate the programmer.
The problem here is whether even a cautious programmer will be able to reliably determine when an AI is sufficiently advanced that the AI can deceive the programmer over whether the programmer has been successful in redefining the AI’s core purpose.
One would hope that the programmer would resist the AI trying to tempt the programmer into allowing the AI to grow to beyond that point before the programmer has set the core purpose that they want the AI to have for the long term.
One lesson you could draw from this is that, as part of your definition of what a “paperclip” is, you should include the AI putting a high value upon being honest with the programmer (about its aims, tactics and current ability levels) and not deliberately trying to game, tempt or manipulate the programmer.