Pokémon is actually load-bearing for your models? I’m imagining a counterfactual world in which Sonnet 3.7′s initial report involved it beating Pokémon Red, and I don’t think my present-day position would’ve been any different in it.
Even aside from tons of walkthrough information present in LLMs’ training set, and iterative prompting allowing to identify and patch holes in LLMs’ pretrained instinctive game knowledge, Pokémon is simply not a good test of open-ended agency. At the macro-scale, the game state can only progress forward, and progressing it requires solving relatively closed-form combat/navigational challenges. Which means if you’re not too unlikely to blunder through each of those isolated challenges, you’re fated to “fail upwards”. The game-state topology doesn’t allow you to progress backward or get stuck in a dead end: you can’t lose a badge or un-win a boss battle. I. e.: there’s basically an implicit “long-horizon agency scaffold” built into the game.
Which means what this tests is mainly the ability to solve somewhat-diverse isolated challenges in sequence. But not the ability to autonomously decompose long-term tasks into said isolated challenges in a way such that the sequence of isolated challenges implacably points at the long-term task’s accomplishment.
Pokémon is actually load-bearing for your models? I’m imagining a counterfactual world in which Sonnet 3.7′s initial report involved it beating Pokémon Red, and I don’t think my present-day position would’ve been any different in it.
Even aside from tons of walkthrough information present in LLMs’ training set, and iterative prompting allowing to identify and patch holes in LLMs’ pretrained instinctive game knowledge, Pokémon is simply not a good test of open-ended agency. At the macro-scale, the game state can only progress forward, and progressing it requires solving relatively closed-form combat/navigational challenges. Which means if you’re not too unlikely to blunder through each of those isolated challenges, you’re fated to “fail upwards”. The game-state topology doesn’t allow you to progress backward or get stuck in a dead end: you can’t lose a badge or un-win a boss battle. I. e.: there’s basically an implicit “long-horizon agency scaffold” built into the game.
Which means what this tests is mainly the ability to solve somewhat-diverse isolated challenges in sequence. But not the ability to autonomously decompose long-term tasks into said isolated challenges in a way such that the sequence of isolated challenges implacably points at the long-term task’s accomplishment.
Hmm, maybe I’m suffering from having never played Pokémon… who would’ve thought that could be an important hole in my education?