I think there is a genuine problem here… the AI imposes no obstacle to “trusted programmers” changing its utility function. But apart from the human difficulties (the programmers could be corrupted by power, make mistakes etc.) what stops the AI manipulating the programmers into changing its utility function e.g. changing a hard to satisfy v into some w which is very easy to satisfy, and gives it a very high score?
I think there is a genuine problem here… the AI imposes no obstacle to “trusted programmers” changing its utility function. But apart from the human difficulties (the programmers could be corrupted by power, make mistakes etc.) what stops the AI manipulating the programmers into changing its utility function e.g. changing a hard to satisfy v into some w which is very easy to satisfy, and gives it a very high score?