I’ll mention beating pokemon isn’t that big of a challenge in and of itself, what’s important here is that this thing that wasn’t trained to do pokemon can. *
Depending on how strict you want to be with what you call AI beating pokemon we have Ai’s that beat pokemon in less than 2 hours or if you want to go with the interpretation that “AI beating pokemon is a program that beats pokemon” we have “Ai’s” that beat pokemon in less than 2 minutes or less than 1:30 if you want a more strict definition of “beat the game”.
I’m not sure that TAS counts as “AI” since they’re usually compiled by humans, but the “PokeBotBad” you linked is interesting, hadn’t heard of that before. It’s an Any% Glitchless speedrun bot that ran until ~2017 and which managed a solid 1:48:27 time on 2/25/17, which was better than the human world record until 2/12/18. Still, I’d say this is more a programmed “bot” than an AI in the sense we care about.
Anyway, you’re right that the whole reason the Pokémon benchmark exists is because it’s interesting to see how well an untrained LLM can do playing it.
>I’m not sure that TAS counts as “AI” since they’re usually compiled by humans
Agreed, it’s more “this is what the limit looks like”
>Still, I’d say this is more a programmed “bot” than an AI in the sense we care about.
Is stockfish 8 not an AI? I feel like the goalposts of what counts as “Ai” keep getting shifted. Pokebotbad is an “AI” that searches to solve the pokemon state space
I’ll mention beating pokemon isn’t that big of a challenge in and of itself, what’s important here is that this thing that wasn’t trained to do pokemon can. *
Depending on how strict you want to be with what you call AI beating pokemon we have Ai’s that beat pokemon in less than 2 hours or if you want to go with the interpretation that “AI beating pokemon is a program that beats pokemon” we have “Ai’s” that beat pokemon in less than 2 minutes or less than 1:30 if you want a more strict definition of “beat the game”.
I’m not sure that TAS counts as “AI” since they’re usually compiled by humans, but the “PokeBotBad” you linked is interesting, hadn’t heard of that before. It’s an Any% Glitchless speedrun bot that ran until ~2017 and which managed a solid 1:48:27 time on 2/25/17, which was better than the human world record until 2/12/18. Still, I’d say this is more a programmed “bot” than an AI in the sense we care about.
Anyway, you’re right that the whole reason the Pokémon benchmark exists is because it’s interesting to see how well an untrained LLM can do playing it.
>I’m not sure that TAS counts as “AI” since they’re usually compiled by humans
Agreed, it’s more “this is what the limit looks like”
>Still, I’d say this is more a programmed “bot” than an AI in the sense we care about.
Is stockfish 8 not an AI? I feel like the goalposts of what counts as “Ai” keep getting shifted. Pokebotbad is an “AI” that searches to solve the pokemon state space