A few things ended up being key to the successful run:
Map labeling—very detailed labeling of individual map tiles (including identifying tiles that move you to a new location (“warps” like doorways, ladders, cave entrances, etc.) and identifying puzzle entities)
Separate instances of Gemini with different, narrower prompts—these were used by the main Gemini playing the game to reason about certain tasks (ex. navigation, boulder puzzles, critique of current plans)
Detailed prompting—a lot of iteration on this (up to the point of ex. “if you’re navigating a long distance that crosses water midway through, make sure to use surf”)
For these and other reasons, it was not a “clean” win in a certain sense (nor a short one, it took over 100,000 thinking actions), but the victory is still a notable accomplishment. What’s next is LLMs beating Pokémon with less handholding and difficulty.
I think it’s not cheating in a practical sense, since applications of AI typically have a team of devs noticing when it’s tripping up and adding special handling to fix that, so it’s reflective of real-world use of AI.
But I think it’s illustrative of how artificial intelligence most likely won’t lead to artificial general agency and alignment x-risk, because the agency will be created through unblocking a bunch of narrow obstacles, which will be goal-specific and thus won’t generalize to misalignment.
This seems, frankly, like a bizarre extrapolation. Are you serious? The standard objection is of course that if humans have general agency, future AI systems can have it too, and indeed to a greater degree than humans.
You can make highly general AIs, they will just lack agency. You then plop a human on top of the AI and the human will provide plenty agency for basically all legible purposes.
This statement is pretty ambiguous. “Artificial employee” makes me think of some program that is meant to perform tasks in a semi-independent manner. It would be trivial to generate a million different prompts and then have some interface that routes stuff to these prompts in some way. You could also register it as a corporation. It would presumably be slightly less useful than your generic AI chatbot, because the cost and latency would be slightly higher than if you didn’t set up the chatbot in this way. But only slightly.
Though one could argue that since AI chatbots lack agency, they don’t count as artificial employees. But then is there anything that counts? Like at some point it just seems like a confused goal to me.
By “artificial employee” I mean “something than can fully replace human employee, including their agentic capabilities”. And, of course, it should be much more useful than generic AI chatbot, it should be useful like owning Walmart (1,200,000 employees) is useful.
Ok, so then since one can’t make artificial general agents, it’s not so confusing that an AI-assisted human can’t solve the task. I guess it’s true though that my description needs to be amended to rule out things constrained by possibility, budget, or alignment.
Detailed prompting—a lot of iteration on this (up to the point of ex. “if you’re navigating a long distance that crosses water midway through, make sure to use surf”)
I take it the final iteration isn’t published anywhere yet? Wasn’t able to find it.
Seems like the most important part for deciding how to update on that.
Also, it’s worth checking if the final version can actually beat the whole game. If it was modified on the fly, later modifications may have broken earlier performance?
This is kinda-sorta being done at the moment, after Gemini beat the game, the stream has just kept on going. Currently Gemini is lost in Mt. Moon, as is tradition. In fact, the fact that it already explored Mt. Moon earlier seems to be hampering it (no unexplored areas on minimap to lure it towards the right direction).
I believe the dev is planning to do a fresh run soon-ish once they’ve stabilized their scaffold.
Gemini 2.5 Pro just beat Pokémon Blue. (https://x.com/sundarpichai/status/1918455766542930004)
A few things ended up being key to the successful run:
Map labeling—very detailed labeling of individual map tiles (including identifying tiles that move you to a new location (“warps” like doorways, ladders, cave entrances, etc.) and identifying puzzle entities)
Separate instances of Gemini with different, narrower prompts—these were used by the main Gemini playing the game to reason about certain tasks (ex. navigation, boulder puzzles, critique of current plans)
Detailed prompting—a lot of iteration on this (up to the point of ex. “if you’re navigating a long distance that crosses water midway through, make sure to use surf”)
For these and other reasons, it was not a “clean” win in a certain sense (nor a short one, it took over 100,000 thinking actions), but the victory is still a notable accomplishment. What’s next is LLMs beating Pokémon with less handholding and difficulty.
Were these key things made by the AI, or by the people making the run?
My understanding is they were made by the dev, and added throughout the run, which is kind of cheating.
I think it’s not cheating in a practical sense, since applications of AI typically have a team of devs noticing when it’s tripping up and adding special handling to fix that, so it’s reflective of real-world use of AI.
But I think it’s illustrative of how artificial intelligence most likely won’t lead to artificial general agency and alignment x-risk, because the agency will be created through unblocking a bunch of narrow obstacles, which will be goal-specific and thus won’t generalize to misalignment.
This seems, frankly, like a bizarre extrapolation. Are you serious? The standard objection is of course that if humans have general agency, future AI systems can have it too, and indeed to a greater degree than humans.
https://www.lesswrong.com/posts/puv8fRDCH9jx5yhbX/johnswentworth-s-shortform?commentId=qaZez2DbBmTd5K2KZ
As far as I can understand it, this doesn’t seem to contain an argument for why you think highly general AIs aren’t possible.
You can make highly general AIs, they will just lack agency. You then plop a human on top of the AI and the human will provide plenty agency for basically all legible purposes.
Maybe at some point you want to write a post on why you think high agency could or would not be automated.
https://www.lesswrong.com/posts/puv8fRDCH9jx5yhbX/johnswentworth-s-shortform?commentId=jZ2KRPoxEWexBoYSc
“Set up a corporation with a million of artificial employees” is pretty legible, but human amount of agency is catastrophically insufficient for it.
This statement is pretty ambiguous. “Artificial employee” makes me think of some program that is meant to perform tasks in a semi-independent manner. It would be trivial to generate a million different prompts and then have some interface that routes stuff to these prompts in some way. You could also register it as a corporation. It would presumably be slightly less useful than your generic AI chatbot, because the cost and latency would be slightly higher than if you didn’t set up the chatbot in this way. But only slightly.
Though one could argue that since AI chatbots lack agency, they don’t count as artificial employees. But then is there anything that counts? Like at some point it just seems like a confused goal to me.
By “artificial employee” I mean “something than can fully replace human employee, including their agentic capabilities”. And, of course, it should be much more useful than generic AI chatbot, it should be useful like owning Walmart (1,200,000 employees) is useful.
Ok, so then since one can’t make artificial general agents, it’s not so confusing that an AI-assisted human can’t solve the task. I guess it’s true though that my description needs to be amended to rule out things constrained by possibility, budget, or alignment.
Correct. See a more complete list of scaffold features here.
I take it the final iteration isn’t published anywhere yet? Wasn’t able to find it.
Seems like the most important part for deciding how to update on that.
Also, it’s worth checking if the final version can actually beat the whole game. If it was modified on the fly, later modifications may have broken earlier performance?
This is kinda-sorta being done at the moment, after Gemini beat the game, the stream has just kept on going. Currently Gemini is lost in Mt. Moon, as is tradition. In fact, the fact that it already explored Mt. Moon earlier seems to be hampering it (no unexplored areas on minimap to lure it towards the right direction).
I believe the dev is planning to do a fresh run soon-ish once they’ve stabilized their scaffold.
Yeah it’s not open source or published anywhere unfortunately.