Dovnvoted because you even cosider the possibility that an AI will wirehead. That is faulty reasoning, and you should seriously look more into the mathematical concept of self-modifying, optimizing, utility function maximizing agents.
To elaborate: There is a mathematical theorem about self-modifying agents that state an agent will not self modify to invalidate it’s utility function, because if the agent does that, the modified agent will not maximize the current agents utility function. One very good way to invalidate your utility function is to trick yourself into thinking your utility function is being maximized.
An initial AI isn’t necessarily a utility maximizer of this very sophisticated form (with a utility function defined in terms of a robust model of the world from a 3rd person perspective), building such a thing is a further challenge beyond making AI
If someone designs an AI with a sensory utility function, taking control of its sensory channel is just optimizing for its utility function; it’s “wireheading” from the designer’s perspective if they expected to be able to ensure that the preferred inputs could only be obtained by performing assigned tasks
A utility-maximizer could have reason to modify or even eliminate its own utility function for a variety of reasons, especially when interacting with powerful agents and when its internals are at least partially transparent
Precisely, thank you! I hate arguing such points. Just because you can say something in English does not make it an utility function in the mathematical sense. Furthermore, just because in English it sounds like modification of utility function, does not mean that it is mathematically a modification of utility function. Real-world intentionality seem to be a separate problem from making a system that would figure out how to solve problems (mathematically defined problems), and likely, a very hard problem (in the sense of being very difficult to mathematically define).
Real-world intentionality seem to be a separate problem from making a system that would figure out how to solve problems (mathematically defined problems), and likely, a very hard problem (in the sense of being very difficult to mathematically define).
I think I disagree with you, depending on what you mean here. Limited “intentionality” (as in Dennett’s intentional stance) shows up as soon as you have a system that selects the best of several actions using prediction algorithms and an evaluation function: a chess engine like Rybka in the context of a game can be modeled well as selecting good moves. That intentionality is limited because the system has a tightly constrained set of actions and only evaluates consequences using a very limited model of the world, but these things can be scaled up. Robust problem-solving and prediction algorithms capable of solving arbitrary problems would be terribly hard, but intentionality would not be much of a further problem. On the other hand if we talk about very narrowly defined problems then systems capable of doing well on those will not be able to address the very economically and scientifically important mass of ill-specified problems.
Also, the separability of action and analysis is limited: Rybka can evaluate opening moves, looking ahead a fair ways, but it cannot provide a comprehensive strategy to win a game (carrying on to the end) without the later moves. You could put a “human in the loop” who would use Rybka to evaluate particular moves, and then make the actual move, but at the cost of adding a bottleneck (humans are slow, cannot follow thousands or millions of decisions at once). The more experimentation and interactive learning are important, the less viable the detached analytical algorithm.
It is true that an AI’s utility function-accomplishment-accessing methods can be circumvented. But having an AI that circumvents it’s own utility function, would be evidence towards poor utility function design.
Also, eliminating your own utility function is a perfectly valid move if it leads to fullfillment of the current utility function. That is the principle in the above statement: Every planned course of action is evaluated against it’s current utility function, if removing the construct that constitutes the utility function is an action that has high utility, then it is a valid course of action.
Now if an AI’s utility function is not properly designed it will of course self modify to satisfy it. If that involves putting a blue colour filter in front of your eyes that is a perfectly valid course of action.
But having an AI that circumvents it’s own utility function, would be evidence towards poor utility function design.
By circumvent, do you mean something like “wireheading”, i.e. some specious satisfaction of the utility function that involves behavior that is both unexpected and undesirable, or do you also include modifications to the utility function? The former meaning would make your statement a tautology, and the latter would make it highly non-trivial.
There is a mathematical theorem about self-modifying agents that state an agent will not self modify to invalidate it’s utility function, because if the agent does that, the modified agent will not maximize the current agents utility function.
This sounds like extremely useful information. Do you have more detail, or a reference to further reading on this theorem?
I am a bit stumped to actually remember where I read it. Give me a few years to study some more advanced economics, and I can probably present you with a home brewed proof.
There is a mathematical theorem about self-modifying agents that state an agent will not self modify to invalidate it’s utility function, because if the agent does that, the modified agent will not maximize the current agents utility function.
That’s not correrct. There is no such “mathematical theorem”.
Indeed we know that some agents will wirehead, since we can see things like heroin addicts, hyperinflation and Enron in the real world.
Sorry, I notice you’ve had this argument at least once before. That’ll learn me to shoot my mouth off. In my defense, the wiki just says “[utility functions] do not work very well in practice for individual humans” without any mention of this fact.
However, I’m still not certain that you can take heroin addicts as proof that some agents self-modify to invalidate their utility functions.
Dovnvoted because you even cosider the possibility that an AI will wirehead. That is faulty reasoning, and you should seriously look more into the mathematical concept of self-modifying, optimizing, utility function maximizing agents.
To elaborate: There is a mathematical theorem about self-modifying agents that state an agent will not self modify to invalidate it’s utility function, because if the agent does that, the modified agent will not maximize the current agents utility function. One very good way to invalidate your utility function is to trick yourself into thinking your utility function is being maximized.
This is wrong in several ways.
An initial AI isn’t necessarily a utility maximizer of this very sophisticated form (with a utility function defined in terms of a robust model of the world from a 3rd person perspective), building such a thing is a further challenge beyond making AI
If someone designs an AI with a sensory utility function, taking control of its sensory channel is just optimizing for its utility function; it’s “wireheading” from the designer’s perspective if they expected to be able to ensure that the preferred inputs could only be obtained by performing assigned tasks
A utility-maximizer could have reason to modify or even eliminate its own utility function for a variety of reasons, especially when interacting with powerful agents and when its internals are at least partially transparent
Precisely, thank you! I hate arguing such points. Just because you can say something in English does not make it an utility function in the mathematical sense. Furthermore, just because in English it sounds like modification of utility function, does not mean that it is mathematically a modification of utility function. Real-world intentionality seem to be a separate problem from making a system that would figure out how to solve problems (mathematically defined problems), and likely, a very hard problem (in the sense of being very difficult to mathematically define).
I think I disagree with you, depending on what you mean here. Limited “intentionality” (as in Dennett’s intentional stance) shows up as soon as you have a system that selects the best of several actions using prediction algorithms and an evaluation function: a chess engine like Rybka in the context of a game can be modeled well as selecting good moves. That intentionality is limited because the system has a tightly constrained set of actions and only evaluates consequences using a very limited model of the world, but these things can be scaled up. Robust problem-solving and prediction algorithms capable of solving arbitrary problems would be terribly hard, but intentionality would not be much of a further problem. On the other hand if we talk about very narrowly defined problems then systems capable of doing well on those will not be able to address the very economically and scientifically important mass of ill-specified problems.
Also, the separability of action and analysis is limited: Rybka can evaluate opening moves, looking ahead a fair ways, but it cannot provide a comprehensive strategy to win a game (carrying on to the end) without the later moves. You could put a “human in the loop” who would use Rybka to evaluate particular moves, and then make the actual move, but at the cost of adding a bottleneck (humans are slow, cannot follow thousands or millions of decisions at once). The more experimentation and interactive learning are important, the less viable the detached analytical algorithm.
It is true that an AI’s utility function-accomplishment-accessing methods can be circumvented. But having an AI that circumvents it’s own utility function, would be evidence towards poor utility function design.
Also, eliminating your own utility function is a perfectly valid move if it leads to fullfillment of the current utility function. That is the principle in the above statement: Every planned course of action is evaluated against it’s current utility function, if removing the construct that constitutes the utility function is an action that has high utility, then it is a valid course of action.
Now if an AI’s utility function is not properly designed it will of course self modify to satisfy it. If that involves putting a blue colour filter in front of your eyes that is a perfectly valid course of action.
By circumvent, do you mean something like “wireheading”, i.e. some specious satisfaction of the utility function that involves behavior that is both unexpected and undesirable, or do you also include modifications to the utility function? The former meaning would make your statement a tautology, and the latter would make it highly non-trivial.
I mean it in the tautological sense. I try to refrain from stating highly-non trivial things without extensive explanations.
This sounds like extremely useful information. Do you have more detail, or a reference to further reading on this theorem?
I am a bit stumped to actually remember where I read it. Give me a few years to study some more advanced economics, and I can probably present you with a home brewed proof.
That’s not correrct. There is no such “mathematical theorem”.
Indeed we know that some agents will wirehead, since we can see things like heroin addicts, hyperinflation and Enron in the real world.
Humans don’t have utility functions, though.
Edit: Oops. Apparently they do.
See: Any computable agent may described using a utility function.
Sorry, I notice you’ve had this argument at least once before. That’ll learn me to shoot my mouth off. In my defense, the wiki just says “[utility functions] do not work very well in practice for individual humans” without any mention of this fact.
However, I’m still not certain that you can take heroin addicts as proof that some agents self-modify to invalidate their utility functions.