MrCheeze

Karma: 199

MrCheeze 4 Oct 2025 20:00 UTC
5 points
2
on: Where does Sonnet 4.5′s desire to “not get too comfortable” come from?
It’s something introduced by the more agentic coding capabilities. Like if there’s a bug Claude is trying to fix and its current angle of attack to it isn’t being useful, then eventually it makes sense to recognize that and switch tactics. Claude playing Pokemon also had the issue of being willing to repeat the same ineffective strategies over and over and getting stuck because of that. Could that unexpectedly generalize to something like a desire for variety in conversation?
This is exactly what I was thinking while reading the post. They didn’t advertise conversational changes, but they DID advertise agentic improvements, and improving its ability to vary its approaches to tasks is an obvious way of doing that.
(That said, it’s not necessarily true that this a general Claude improvement rather than one that just happens by chance to show up in this specific test.)

MrCheeze 28 Apr 2025 14:04 UTC
16 points
12
in reply to: Daniel Kokotajlo’s comment on: Recent AI model progress feels mostly like bullshit
But you have to be careful here, since the results heavily depend on details of the harness, as well as on how thoroughly they have memorized walkthroughs of the game.

MrCheeze 22 Apr 2025 19:03 UTC
3 points
1
in reply to: qazzquimby’s comment on: Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red
Text adventures do seem like a good eval right now, since they’re the ONLY games that can be tested without either relying on vision (which is still very bad), or writing a custom harness for each game (in which case your results depend heavily on the harness).

MrCheeze 21 Apr 2025 20:54 UTC
2 points
0
in reply to: Ozyrus’s comment on: Is Gemini now better than Claude at Pokémon?
(Gemini did actually write much of the Gemini_Plays_Pokemon scaffolding, but only in the sense of doing what David told it to do, not designing and testing it.)
I think you’re probably right that a LLM coding its own scaffolding is probably more achievable than one playing the game like a human, but I don’t think current models can do it—watching the streams, the models don’t seem like they understand their own flaws, although admittedly they haven’t been prompted to focus on this.

MrCheeze 21 Apr 2025 13:05 UTC
6 points
0
on: Is Gemini now better than Claude at Pokémon?
On the other hand, Claude has (arguably) a better pathfinding tool. As long as it requests to be moved to a valid set of coordinates from the screenshot overlay grid, the tool will move it there. Gemini mostly navigates on its own, although it has access to another instance of Gemini dedicated just to pathfinding.
I very much argue this. Claude’s navigator tool can only navigate to coordinates that are onscreen, meaning that the main model needs to have some idea of where it’s going. Which means grappling with problems that are extremely difficult for both models, such as “go AROUND the wall instead of right through it”.
In contrast, the Gemini pathfinder tool can travel to a coordinate halfway across the map, totally bypassing that problem. (Yes, the pathfinder is technically another instance of Gemini, but it’s been prompted with exactly what algorithm to follow, so this is not a major handicap.) When returning to a previously visited map—Gemini is banned from using the pathfinder tool to enter unexplored tiles—it can probably traverse even mazes that take the Claude scaffolding all day, in just one or two turns.
Of course this has further advantages for maintaining coherence, since if you spend all day on a maze, you forget what your plan even was after you get to the end of it.

MrCheeze 21 Apr 2025 12:36 UTC
24 points
3
on: Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red
I have not tested if Gemini can distinguish this tree (and intend to eventually). This may very well be the only reason Gemini has progressed further.
You missed an important fact about the Gemini stream, which is that it just reads the presence of these trees from RAM and labels them for the model (along with a few other special tiles like ledges and water). Nevertheless I do think Gemini’s vision is better, by which I mean if you provide it a screenshot it will sometimes identify the correct tree, unlike Claude who will never do so. (Although to my knowledge the Gemini in the stream has literally never used vision for anything.) And in general the Gemini streamer is far more liberal about updating the scaffolding to address challenges than the Claude streamer is.
Also there’s one other reason that Gemini has gotten farther: it simply has the whole walkthrough of the game memorized, while Claude doesn’t know what to do after the thunderbadge. (I don’t think either model would be remotely competent on RPGs that aren’t in the training data.)
This doesn’t mean memory is not a problem. The problems are just more subtle than one might imagine. For instance, the lack of direct memory means models lack a real sense of time, or how hard a task is. That means even when given a notepad to record observations, they will not consistently record “HOW TO SOLVE THAT PUZZLE THAT TOOK FOREVER” because they don’t realize it took forever. And of course if it’s not written down it falls completely out of “long-term” memory.
This has been a recurring problem with the Claude stream, where the model is given the ability to take notes. Whenever he’s struggling and failing to solve a problem for a long time, he’ll endlessly write notes about his (wrong) ideas for what to do, reinforcing that behaviour. When he finally tries the right thing, it seems like it was easy, so you MIGHT get one note written down about it. If you’re lucky.
In general, however incompetent this post makes it sound like the models are at playing the game, they’re even worse than that. I feel like this is in large part because of LLMs having frozen weights—every single mistake that they make will be repeated every time the situation reoccurs, instead of just once as a human would do. Taking notes doesn’t help this very much, as their basic instincts being wrong seems to make far more difference than what’s in their notes.

MrCheeze 8 Mar 2025 14:36 UTC
18 points
0
in reply to: MrCheeze’s comment on: So how well is Claude playing Pokémon?
And now in the second run it has entered a similar delusional loop. It knows the way to Cerulean City is via Route 4, but the route before and after Mt. Moon are both considered part of Route 4. Therefore it deluded itself into thinking it can get to Cerulean from the first part of the route. Because of that, every time it accidentally stumbles into Mt Moon and is making substantial progress towards the exit, it intentionally blacks out to get teleported back outside the entrance, so it can look for the nonexistent path forwards.
From what I’ve seen on stream, the chances of it questioning and breaking from this delusion are basically zero. There’s still the possibility of progress by getting lost in Mt Moon and stumbling into the exit, but it will never actually figure out what it was doing wrong here.
People in the stream chat and subreddit have been discussing this paper suggesting that LLM agents often get into these “meltdown” loops that they aren’t able to recover from: https://www.reddit.com/r/ClaudePlaysPokemon/comments/1j65jqf/vendingbench_a_benchmark_for_longterm_coherence
Also, the stream admin seemed to think the same thing, saying during the first run that “some runs just are cursed” and setting up a poll for whether to reset the game.

MrCheeze 8 Mar 2025 14:19 UTC
14 points
2
in reply to: Clara’s comment on: So how well is Claude playing Pokémon?
Note that the creator stated that the setup is intentionally somewhat underengineered:
I do not claim this is the world’s most incredible agent harness; in fact, I explicitly have tried not to “hyper engineer” this to be like the best chance that exists to beat Pokemon. I think it’d be trivial to build a better computer program to beat Pokemon with Claude in the loop.
This is like meant to be some combination of like “understand what Claude’s good at and Benchmark and understand Claude-alongside-a-simple-agent-harness”, so what that boils down to is this is like a pretty straightforward tool-using agent.
What links here?
- Is Gemini now better than Claude at Pokémon? by Julian Bradshaw (19 Apr 2025 23:34 UTC; 92 points)

MrCheeze 7 Mar 2025 12:27 UTC
37 points
3
on: So how well is Claude playing Pokémon?
This basically sums up how it’s doing: https://www.reddit.com/r/ClaudePlaysPokemon/comments/1j568ck/the_mount_moon_experience
Of course much of that is basic capability issues -poor spatial reasoning, short term memory that doesn’t come anywhere close to lasting for 1 lap, etc.
But I’ve also noticed ways in which Claude’s personality is sabotaging it. Claude is capable of taking notes saying that it “THOROUGHLY confirmed NO passages” through the eastern barrier—but never gets impatient or frustrated, so this doesn’t actually prevent it from trying the same thing every time it sees the eastern wall again.
And it general, it seems to have a strong bias towards visiting places that are mentioned frequently in its notes—even though that’s the exact opposite of what you should be doing for exploration. I’ve seen it reach the uncommonly reached second ladder on the floor, and then promptly decided it needs to run back to the first ladder (which it has seen hundreds of times) to see whether the first ladder goes anywhere.
And it should definitely be mentioned that run #1 was mercy killed when its knowledge base was populated almost entirely with falsehoods both about how far it had progressed in the game and how to get further, leading to a singleminded obsession with exploring the southern wall of Cerulean City forever.
What links here?
- Inputs, outputs, and valued outcomes by Kaj_Sotala (13 Mar 2026 20:08 UTC; 33 points)

MrCheeze 2 May 2024 1:07 UTC
7 points
5
in reply to: Odd anon’s comment on: Why I’m doing PauseAI
“Under development” and “currently training” I interpret as having significantly different meanings.

MrCheeze 15 Apr 2023 17:49 UTC
11 points
8
in reply to: Matthew_Opitz’s comment on: The ‘ petertodd’ phenomenon
Doesn’t strike me as inevitable at all, just a result of OpenAI following similar methods for creating their tokenizer twice. (In both cases, leading to a few long strings being included as tokens even though they don’t actually appear frequently in large corpuses.)

They presumably had already made the GPT-4 tokenizer long before SolidGoldMagikarp was discovered in the GPT-2/GPT-3 one.

MrCheeze 15 Apr 2023 12:46 UTC
26 points
3
on: The ‘ petertodd’ phenomenon
Prior to OpenAI’s 2023-02-14 patching of ChatGPT (which seemingly prevents it from directly encountering glitch tokens like ‘ petertodd’)
I’ve never seen it mentioned around here, but since that update, ChatGPT is using a different tokenizer that has glitch tokens of its own:
https://github.com/openai/tiktoken/blob/46287bfa493f8ccca4d927386d7ea9cc20487525/tiktoken/model.py#L16
https://wetdry.world/@MrCheeze/110130795421274483

MrCheeze 26 Jan 2013 3:10 UTC
0 points
0
in reply to: Ezekiel’s comment on: Rationality Quotes September 2012
I’d say this captures the spirit of Less Wrong perfectly.

MrCheeze 28 Oct 2012 0:48 UTC
0 points
0
on: Unbounded Scales, Huge Jury Awards, & Futurism
500 years still sounds optimistic to me.

MrCheeze 10 Sep 2012 2:46 UTC
1 point
0
in reply to: Mitchell_Porter’s comment on: AI timeline predictions: are we getting better?
The key is in the phrase “much more complicated”. The sort of algorithm that could become a mind would be an enormous leap forward in comparison to anything that has ever been done so far.

MrCheeze 30 Aug 2012 18:46 UTC
0 points
0
on: AI timeline predictions: are we getting better?
Man, people’s estimations seem REALLY early. The idea of AI in fifty years seems almost absurd to me.

MrCheeze 12 Aug 2012 2:31 UTC
0 points
0
on: Math is Subjunctively Objective
I still stand by my belief that 2 + 3 = 5 does not in fact exist, and yet it is still true that adding two things with three things will always result in five things.

MrCheeze 4 Jun 2012 0:54 UTC
−1 points
0
on: How to Seem (and Be) Deep
“I think that if you took someone who was immortal, and asked them if they wanted to die for benefit X, they would say no.”

This doesn’t help against arguments that stable immortality is impossible or incredibly unlikely, of course, but I suppose those aren’t the arguments you were countering at the time.

MrCheeze 23 Apr 2012 22:33 UTC
0 points
0
in reply to: DanielLC’s comment on: Pascal’s Mugging: Tiny Probabilities of Vast Utilities
Yes, but the chance of magic powers from outside the matrix is low enough that what he says has an insignificant difference.

...or is an insignificant difference even possible?

MrCheeze 11 Oct 2011 0:31 UTC
0 points
0
on: SIAI—An Examination
Hm. I’d rather have seen more of the analysis on whether what they do with the money is useful, but this is something.