Julian Bradshaw comments on Julian Bradshaw’s Shortform

Julian Bradshaw 3 May 2025 4:59 UTC
49 points
0
Gemini 2.5 Pro just beat Pokémon Blue. (https://x.com/sundarpichai/status/1918455766542930004)

A few things ended up being key to the successful run:
1. Map labeling—very detailed labeling of individual map tiles (including identifying tiles that move you to a new location (“warps” like doorways, ladders, cave entrances, etc.) and identifying puzzle entities)
2. Separate instances of Gemini with different, narrower prompts—these were used by the main Gemini playing the game to reason about certain tasks (ex. navigation, boulder puzzles, critique of current plans)
3. Detailed prompting—a lot of iteration on this (up to the point of ex. “if you’re navigating a long distance that crosses water midway through, make sure to use surf”)
For these and other reasons, it was not a “clean” win in a certain sense (nor a short one, it took over 100,000 thinking actions), but the victory is still a notable accomplishment. What’s next is LLMs beating Pokémon with less handholding and difficulty.
- tailcalled 3 May 2025 15:26 UTC
  6 points
  0
  Parent
  Were these key things made by the AI, or by the people making the run?
  - Cole Wyeth 3 May 2025 15:57 UTC
    12 points
    2
    Parent
    My understanding is they were made by the dev, and added throughout the run, which is kind of cheating.
    - tailcalled 3 May 2025 16:15 UTC
      3 points
      −16
      Parent
      I think it’s not cheating in a practical sense, since applications of AI typically have a team of devs noticing when it’s tripping up and adding special handling to fix that, so it’s reflective of real-world use of AI.
      But I think it’s illustrative of how artificial intelligence most likely won’t lead to artificial general agency and alignment x-risk, because the agency will be created through unblocking a bunch of narrow obstacles, which will be goal-specific and thus won’t generalize to misalignment.
      - cubefox 3 May 2025 17:54 UTC
        4 points
        4
        Parent
        This seems, frankly, like a bizarre extrapolation. Are you serious? The standard objection is of course that if humans have general agency, future AI systems can have it too, and indeed to a greater degree than humans.
        tailcalled 3 May 2025 17:56 UTC
        2 points
        0
        Parent
        https://www.lesswrong.com/posts/puv8fRDCH9jx5yhbX/johnswentworth-s-shortform?commentId=qaZez2DbBmTd5K2KZ
        cubefox 3 May 2025 19:33 UTC
        2 points
        2
        Parent
        As far as I can understand it, this doesn’t seem to contain an argument for why you think highly general AIs aren’t possible.
        tailcalled 3 May 2025 19:49 UTC
        −7 points
        −2
        Parent
        You can make highly general AIs, they will just lack agency. You then plop a human on top of the AI and the human will provide plenty agency for basically all legible purposes.
        cubefox 3 May 2025 20:06 UTC
        2 points
        0
        Parent
        Maybe at some point you want to write a post on why you think high agency could or would not be automated.
        tailcalled 3 May 2025 20:13 UTC
        2 points
        0
        Parent
        https://www.lesswrong.com/posts/puv8fRDCH9jx5yhbX/johnswentworth-s-shortform?commentId=jZ2KRPoxEWexBoYSc
        quetzal_rainbow 3 May 2025 20:04 UTC
        2 points
        0
        Parent
        “Set up a corporation with a million of artificial employees” is pretty legible, but human amount of agency is catastrophically insufficient for it.
        tailcalled 3 May 2025 20:18 UTC
        2 points
        0
        Parent
        This statement is pretty ambiguous. “Artificial employee” makes me think of some program that is meant to perform tasks in a semi-independent manner. It would be trivial to generate a million different prompts and then have some interface that routes stuff to these prompts in some way. You could also register it as a corporation. It would presumably be slightly less useful than your generic AI chatbot, because the cost and latency would be slightly higher than if you didn’t set up the chatbot in this way. But only slightly.
        Though one could argue that since AI chatbots lack agency, they don’t count as artificial employees. But then is there anything that counts? Like at some point it just seems like a confused goal to me.
        quetzal_rainbow 3 May 2025 22:10 UTC
        3 points
        0
        Parent
        By “artificial employee” I mean “something than can fully replace human employee, including their agentic capabilities”. And, of course, it should be much more useful than generic AI chatbot, it should be useful like owning Walmart (1,200,000 employees) is useful.
        Expand this thread
        tailcalled 4 May 2025 6:39 UTC
        2 points
        0
        Parent
        Ok, so then since one can’t make artificial general agents, it’s not so confusing that an AI-assisted human can’t solve the task. I guess it’s true though that my description needs to be amended to rule out things constrained by possibility, budget, or alignment.
    - Julian Bradshaw 4 May 2025 8:07 UTC
      2 points
      0
      Parent
      Correct. See a more complete list of scaffold features here.
- Thane Ruthenis 3 May 2025 16:04 UTC
  3 points
  0
  Parent
  Detailed prompting—a lot of iteration on this (up to the point of ex. “if you’re navigating a long distance that crosses water midway through, make sure to use surf”)
  I take it the final iteration isn’t published anywhere yet? Wasn’t able to find it.
  Seems like the most important part for deciding how to update on that.
  - Cole Wyeth 3 May 2025 20:22 UTC
    6 points
    0
    Parent
    Also, it’s worth checking if the final version can actually beat the whole game. If it was modified on the fly, later modifications may have broken earlier performance?
    - Julian Bradshaw 4 May 2025 7:45 UTC
      4 points
      0
      Parent
      This is kinda-sorta being done at the moment, after Gemini beat the game, the stream has just kept on going. Currently Gemini is lost in Mt. Moon, as is tradition. In fact, the fact that it already explored Mt. Moon earlier seems to be hampering it (no unexplored areas on minimap to lure it towards the right direction).
      I believe the dev is planning to do a fresh run soon-ish once they’ve stabilized their scaffold.
  - Julian Bradshaw 4 May 2025 7:41 UTC
    4 points
    0
    Parent
    Yeah it’s not open source or published anywhere unfortunately.