Can I remind you what we are talking about; not about a single stop button, but about a “utility function” that is constantly modified whenever new information comes in. That’s the kind of weakness that will lead to systematic money pumping. The situation is more analogous to me being able to constantly change whether a woman is pregnant and back again, and buying and selling her children’s toys each time. I can do that, by the information presented to the AI. And the AI, no matter how smart, will be useless at resisting that, until the moment where it 1) stops being a utility maximiser or 2) fixes its utility function.
It’s not the fact the utility function is changing that is the problem, so self improving AI is fine. It’s the fact that its systematically changing in response to predictable inputs.
Can I remind you what we are talking about; not about a single stop button, but about a “utility function” that is constantly modified whenever new information comes in.
After backtracking—to try and understand what it is that you think we are talking about -
I think I can see what is going on here.
...you were using “utility” as abbreviation for “utility function”!
That would result in a changing utility function, and—in that context -
your comments make sense.
However, that represents a simple implementation mistake. You don’t
implement indifference by using a constantly-changing utility function.
What changes—in order to make the utility of being switched off track the utility
of being switched on—is just the utility associated with being switched off.
The utility function just has a component which says: “the expected utility
of being stopped is the same as if not stopped”. The utility
function always says that—and doesn’t change, regardless of sensory
inputs or whether the stop button has been pressed.
What changes is the utility—not the utility function. That is what
you wrote—but was apparently not what you meant—thus the confusion.
Yes, I apologise for the confusion. But what I showed in my post was that implementing “the expected utility of being stopped is the same as if not stopped” has to be done in a cunning way (the whole thing about histories having the same stem) or else extra information will get rid of indifference.
Yes.
Can I remind you what we are talking about; not about a single stop button, but about a “utility function” that is constantly modified whenever new information comes in. That’s the kind of weakness that will lead to systematic money pumping. The situation is more analogous to me being able to constantly change whether a woman is pregnant and back again, and buying and selling her children’s toys each time. I can do that, by the information presented to the AI. And the AI, no matter how smart, will be useless at resisting that, until the moment where it 1) stops being a utility maximiser or 2) fixes its utility function.
It’s not the fact the utility function is changing that is the problem, so self improving AI is fine. It’s the fact that its systematically changing in response to predictable inputs.
After backtracking—to try and understand what it is that you think we are talking about - I think I can see what is going on here.
When you wrote:
...you were using “utility” as abbreviation for “utility function”!
That would result in a changing utility function, and—in that context - your comments make sense.
However, that represents a simple implementation mistake. You don’t implement indifference by using a constantly-changing utility function. What changes—in order to make the utility of being switched off track the utility of being switched on—is just the utility associated with being switched off.
The utility function just has a component which says: “the expected utility of being stopped is the same as if not stopped”. The utility function always says that—and doesn’t change, regardless of sensory inputs or whether the stop button has been pressed.
What changes is the utility—not the utility function. That is what you wrote—but was apparently not what you meant—thus the confusion.
Yes, I apologise for the confusion. But what I showed in my post was that implementing “the expected utility of being stopped is the same as if not stopped” has to be done in a cunning way (the whole thing about histories having the same stem) or else extra information will get rid of indifference.