I have a line of thinking that makes me less worried about unfriendly AI. The smarter an AI gets, the more it is able to follow its utility function. Where the utility function is simple or the AI is stupid, we have useful things like game opponents.
But as we give smarter AI’s interesting ‘real world’ problems, the difference between what we asked for and what we want shows up more explicitly. Developers usually interpret this as the AI being stupid or broken, and patch over either the utility function or the reasoning it led to. These patches don’t lead to extra intelligence because that would just make the AI appear more broken.
If it is the case that there is a big gap between AI’s that are smart enough for their non-human utility functions to be annoyingly evident and AI’s that are smart enough to improve themselves, then non-friendly AGI research will hit an apparent wall where all avenues are unproductive. (If This line of thought is correct, it hit that wall a long time ago.)
This is not a guarantee. There is always a chance someone will hit on a self-improving system.
Where the utility function is simple or the AI is stupid, we have useful things like game opponents.
Rather, where the utility function is simple AND the program is stupid. Paperclippers are not useful things.
If it is the case that there is a big gap between AI’s that are smart enough for their non-human utility functions to be annoyingly evident and AI’s that are smart enough to improve themselves, then non-friendly AGI research will hit an apparent wall where all avenues are unproductive.
Reinforcement-based utility definition plus difficult games with well-defined winning conditions seems to constitute a counterexample to this principle (a way of doing AI that won’t hit the wall you described). This could function even on top of supplemental ad-hoc utility function building, as in chess, where a partially hand-crafted utility function over specific board positions is an important component of chess-playing programs—you’d just need to push the effort to a “meta-AI” that is only interested in the real winning conditions.
Rather, where the utility function is simple AND the program is stupid. Paperclippers are not useful things.
I was thinking of current top chess programs as smart(well above average humans), with simple utility functions.
Reinforcement-based utility definition plus difficult games with well-defined winning conditions seems to constitute a counterexample to this principle (a way of doing AI that won’t hit the wall you described).
This is a good example, but it might not completely explain it away.
Can we, by hand or by algorithm, construct a utility function that does what we want, even when we know exactly what we want?
I think you could still have a situation in which a smarter agent does worse because it’s learned utility function does not match the winning conditions (it’s learned utility function would constitute a created subgoal of “maximize reward”)
Learning about the world and constructing subgoals would probably be part of any near-human AI. I don’t think we have a way to construct reliable subgoals, even with a rules-defined supergoal and perfect knowledge of the world. (such a process would be a huge boon for FAI)
Likewise, I don’t think we can be certain that the utility functions we create by hand would reliably lead a high-intelligence AI to seek the goal we want, even for well-defined tasks.
A smarter agent might have the advantage of learning the winning conditions faster, but if it is comparatively better at implementing a flawed utility function than it is at fixing it’s utility function, then could be outpaced by stupider versions, and you’re working more in an evolutionary design space.
So I think it would hit the same kind of wall, at least in some games.
I meant the AI to be limited to the formal game universe, which should be easily feasible for non-superintelligent AIs. In this case, smarter agents always have an advantage, maximization of reward is the same as the intended goal.
A smarter agent might have the advantage of learning the winning conditions faster, but if it is comparatively better at implementing a flawed utility function than it is at fixing it’s utility function, then could be outpaced by stupider versions, and you’re working more in an evolutionary design space.
If you give the AI a perfect utility function for a game, it still has to break down subgoals and seek those. You don’t have a good general theory for making sure your generated subgoals actually serve your supergoals, but you’ve tweaked things enough that it’s actually very good at achieving the ‘midlevel’ things.
When you give it something more complex, it improperly breaks down the goal into faulty subgoals that are ineffective or contradictory, and then effectively carries out each of them. This yields a mess.
At this point you could improve some of the low level goal-achievement and do much better at a range of low level tasks, but this wouldn’t buy you much in the complex tasks, and might just send you further off track.
If you understand that the complex subgoals are faulty, you might be able to re-patch it, but this might not help you solve different problems of similar complexity, let alone more complex problems.
What led me to this answer:
Thinking deeply until you get eaten by a sabertooth is not smart.
There may not be a trade off at play here. For example: At each turn you give the AI indefinite time and memory to learn all it can from the information it has so far, and to plan. (limted by your patience and budget, but let’s handwave that computation resources are cheap, and every turn the AI comes in well below it’s resource limit.)
You have a fairly good move optimizer that can achieve a wide range of in game goals, and a reward modeler that tries to learn what it is supposed to do and updates the utility function.
I meant the AI to be limited to the formal game universe, which should be easily feasible for non-superintelligent AIs. In this case, smarter agents always have an advantage, maximization of reward is the same as the intended goal.
But how do they know how to maximize reward? I was assuming they have to learn the reward criteria. If they have a flawed concept of that criteria, they will seek non-reward.
If the utility function is one and the same as winning, then the (see Top)
I don’t see a clear argument, and failing that, I can’t take confidence in a clear lawful conclusion (AGI hits a wall). I don’t think this line of inquiry is worthwhile.
I have a line of thinking that makes me less worried about unfriendly AI. The smarter an AI gets, the more it is able to follow its utility function. Where the utility function is simple or the AI is stupid, we have useful things like game opponents.
But as we give smarter AI’s interesting ‘real world’ problems, the difference between what we asked for and what we want shows up more explicitly. Developers usually interpret this as the AI being stupid or broken, and patch over either the utility function or the reasoning it led to. These patches don’t lead to extra intelligence because that would just make the AI appear more broken.
If it is the case that there is a big gap between AI’s that are smart enough for their non-human utility functions to be annoyingly evident and AI’s that are smart enough to improve themselves, then non-friendly AGI research will hit an apparent wall where all avenues are unproductive. (If This line of thought is correct, it hit that wall a long time ago.)
This is not a guarantee. There is always a chance someone will hit on a self-improving system.
Rather, where the utility function is simple AND the program is stupid. Paperclippers are not useful things.
Reinforcement-based utility definition plus difficult games with well-defined winning conditions seems to constitute a counterexample to this principle (a way of doing AI that won’t hit the wall you described). This could function even on top of supplemental ad-hoc utility function building, as in chess, where a partially hand-crafted utility function over specific board positions is an important component of chess-playing programs—you’d just need to push the effort to a “meta-AI” that is only interested in the real winning conditions.
I was thinking of current top chess programs as smart(well above average humans), with simple utility functions.
This is a good example, but it might not completely explain it away.
Can we, by hand or by algorithm, construct a utility function that does what we want, even when we know exactly what we want?
I think you could still have a situation in which a smarter agent does worse because it’s learned utility function does not match the winning conditions (it’s learned utility function would constitute a created subgoal of “maximize reward”)
Learning about the world and constructing subgoals would probably be part of any near-human AI. I don’t think we have a way to construct reliable subgoals, even with a rules-defined supergoal and perfect knowledge of the world. (such a process would be a huge boon for FAI)
Likewise, I don’t think we can be certain that the utility functions we create by hand would reliably lead a high-intelligence AI to seek the goal we want, even for well-defined tasks.
A smarter agent might have the advantage of learning the winning conditions faster, but if it is comparatively better at implementing a flawed utility function than it is at fixing it’s utility function, then could be outpaced by stupider versions, and you’re working more in an evolutionary design space.
So I think it would hit the same kind of wall, at least in some games.
I meant the AI to be limited to the formal game universe, which should be easily feasible for non-superintelligent AIs. In this case, smarter agents always have an advantage, maximization of reward is the same as the intended goal.
Thinking deeply until you get eaten by a sabertooth is not smart.
Answer is here, thinking out loud is below
If you give the AI a perfect utility function for a game, it still has to break down subgoals and seek those. You don’t have a good general theory for making sure your generated subgoals actually serve your supergoals, but you’ve tweaked things enough that it’s actually very good at achieving the ‘midlevel’ things.
When you give it something more complex, it improperly breaks down the goal into faulty subgoals that are ineffective or contradictory, and then effectively carries out each of them. This yields a mess.
At this point you could improve some of the low level goal-achievement and do much better at a range of low level tasks, but this wouldn’t buy you much in the complex tasks, and might just send you further off track.
If you understand that the complex subgoals are faulty, you might be able to re-patch it, but this might not help you solve different problems of similar complexity, let alone more complex problems.
What led me to this answer:
There may not be a trade off at play here. For example: At each turn you give the AI indefinite time and memory to learn all it can from the information it has so far, and to plan. (limted by your patience and budget, but let’s handwave that computation resources are cheap, and every turn the AI comes in well below it’s resource limit.)
You have a fairly good move optimizer that can achieve a wide range of in game goals, and a reward modeler that tries to learn what it is supposed to do and updates the utility function.
But how do they know how to maximize reward? I was assuming they have to learn the reward criteria. If they have a flawed concept of that criteria, they will seek non-reward.
If the utility function is one and the same as winning, then the (see Top)
End-of-conversation status:
I don’t see a clear argument, and failing that, I can’t take confidence in a clear lawful conclusion (AGI hits a wall). I don’t think this line of inquiry is worthwhile.