Should we make a “skill” file for the AI to play Pokemon?
Hmmm… on one hand, this feels like cheating, depending on how much detail we provide. In extreme, we could give the AI an entire sequence of moves to execute in order to complete the game. That would definitely be cheating. The advice should be more generic. But how generic is generic enough? Is it okay to leave reminders such as “if there is a skill you need to overcome an obstacle, and if getting that skill requires you to do something, maybe prioritize doing that thing”, or is that already too specific?
(Intuitively, perhaps an advice is generic enough if it can be used to solve multiple different games? Unless that is a union of very specific advice for all the games in the test set, of course.)
On the other hand, the situation in deployment would be that we want the AI to solve the problem and we do whatever is necessary to help it. I mean, if someone told you “make Claude solve Pokemon in 2 days or I will kill you” and wouldn’t specify any conditions, you would cheat as hard as you could, like upload complete walkthroughs etc. So perhaps solving a problem that we humans have already solve is not suitable for a realistic challenge.
Should we make a “skill” file for the AI to play Pokemon?
Hmmm… on one hand, this feels like cheating, depending on how much detail we provide. In extreme, we could give the AI an entire sequence of moves to execute in order to complete the game. That would definitely be cheating. The advice should be more generic. But how generic is generic enough? Is it okay to leave reminders such as “if there is a skill you need to overcome an obstacle, and if getting that skill requires you to do something, maybe prioritize doing that thing”, or is that already too specific?
(Intuitively, perhaps an advice is generic enough if it can be used to solve multiple different games? Unless that is a union of very specific advice for all the games in the test set, of course.)
On the other hand, the situation in deployment would be that we want the AI to solve the problem and we do whatever is necessary to help it. I mean, if someone told you “make Claude solve Pokemon in 2 days or I will kill you” and wouldn’t specify any conditions, you would cheat as hard as you could, like upload complete walkthroughs etc. So perhaps solving a problem that we humans have already solve is not suitable for a realistic challenge.