If you give the AI a perfect utility function for a game, it still has to break down subgoals and seek those. You don’t have a good general theory for making sure your generated subgoals actually serve your supergoals, but you’ve tweaked things enough that it’s actually very good at achieving the ‘midlevel’ things.
When you give it something more complex, it improperly breaks down the goal into faulty subgoals that are ineffective or contradictory, and then effectively carries out each of them. This yields a mess.
At this point you could improve some of the low level goal-achievement and do much better at a range of low level tasks, but this wouldn’t buy you much in the complex tasks, and might just send you further off track.
If you understand that the complex subgoals are faulty, you might be able to re-patch it, but this might not help you solve different problems of similar complexity, let alone more complex problems.
What led me to this answer:
Thinking deeply until you get eaten by a sabertooth is not smart.
There may not be a trade off at play here. For example: At each turn you give the AI indefinite time and memory to learn all it can from the information it has so far, and to plan. (limted by your patience and budget, but let’s handwave that computation resources are cheap, and every turn the AI comes in well below it’s resource limit.)
You have a fairly good move optimizer that can achieve a wide range of in game goals, and a reward modeler that tries to learn what it is supposed to do and updates the utility function.
I meant the AI to be limited to the formal game universe, which should be easily feasible for non-superintelligent AIs. In this case, smarter agents always have an advantage, maximization of reward is the same as the intended goal.
But how do they know how to maximize reward? I was assuming they have to learn the reward criteria. If they have a flawed concept of that criteria, they will seek non-reward.
If the utility function is one and the same as winning, then the (see Top)
I don’t see a clear argument, and failing that, I can’t take confidence in a clear lawful conclusion (AGI hits a wall). I don’t think this line of inquiry is worthwhile.
Answer is here, thinking out loud is below
If you give the AI a perfect utility function for a game, it still has to break down subgoals and seek those. You don’t have a good general theory for making sure your generated subgoals actually serve your supergoals, but you’ve tweaked things enough that it’s actually very good at achieving the ‘midlevel’ things.
When you give it something more complex, it improperly breaks down the goal into faulty subgoals that are ineffective or contradictory, and then effectively carries out each of them. This yields a mess.
At this point you could improve some of the low level goal-achievement and do much better at a range of low level tasks, but this wouldn’t buy you much in the complex tasks, and might just send you further off track.
If you understand that the complex subgoals are faulty, you might be able to re-patch it, but this might not help you solve different problems of similar complexity, let alone more complex problems.
What led me to this answer:
There may not be a trade off at play here. For example: At each turn you give the AI indefinite time and memory to learn all it can from the information it has so far, and to plan. (limted by your patience and budget, but let’s handwave that computation resources are cheap, and every turn the AI comes in well below it’s resource limit.)
You have a fairly good move optimizer that can achieve a wide range of in game goals, and a reward modeler that tries to learn what it is supposed to do and updates the utility function.
But how do they know how to maximize reward? I was assuming they have to learn the reward criteria. If they have a flawed concept of that criteria, they will seek non-reward.
If the utility function is one and the same as winning, then the (see Top)
End-of-conversation status:
I don’t see a clear argument, and failing that, I can’t take confidence in a clear lawful conclusion (AGI hits a wall). I don’t think this line of inquiry is worthwhile.