I’m not sure that TAS counts as “AI” since they’re usually compiled by humans, but the “PokeBotBad” you linked is interesting, hadn’t heard of that before. It’s an Any% Glitchless speedrun bot that ran until ~2017 and which managed a solid 1:48:27 time on 2/25/17, which was better than the human world record until 2/12/18. Still, I’d say this is more a programmed “bot” than an AI in the sense we care about.
Anyway, you’re right that the whole reason the Pokémon benchmark exists is because it’s interesting to see how well an untrained LLM can do playing it.
>I’m not sure that TAS counts as “AI” since they’re usually compiled by humans
Agreed, it’s more “this is what the limit looks like”
>Still, I’d say this is more a programmed “bot” than an AI in the sense we care about.
Is stockfish 8 not an AI? I feel like the goalposts of what counts as “Ai” keep getting shifted. Pokebotbad is an “AI” that searches to solve the pokemon state space
I’m not sure that TAS counts as “AI” since they’re usually compiled by humans, but the “PokeBotBad” you linked is interesting, hadn’t heard of that before. It’s an Any% Glitchless speedrun bot that ran until ~2017 and which managed a solid 1:48:27 time on 2/25/17, which was better than the human world record until 2/12/18. Still, I’d say this is more a programmed “bot” than an AI in the sense we care about.
Anyway, you’re right that the whole reason the Pokémon benchmark exists is because it’s interesting to see how well an untrained LLM can do playing it.
>I’m not sure that TAS counts as “AI” since they’re usually compiled by humans
Agreed, it’s more “this is what the limit looks like”
>Still, I’d say this is more a programmed “bot” than an AI in the sense we care about.
Is stockfish 8 not an AI? I feel like the goalposts of what counts as “Ai” keep getting shifted. Pokebotbad is an “AI” that searches to solve the pokemon state space