Repeated token sequences—is it possible that those tokens are computational? Detached from their meaning by RL, now emitted solely to perform some specific sort of computation in the hidden state? Top left quadrant—useful thought, just not at all a language.
Did anyone replicate this specific quirk in an open source LLM?
“Spandrel” is very plausible for that too. LLMs have a well known repetition bias, so it’s easy to see how that kind of behavior could pop up randomly and then get reinforced by an accident. So is “use those tokens to navigate into the right frame of mind”, it seems to get at one common issue with LLM thinking.
We had METR evaluate GPT-5 and find that GPT-5′s CoT contained armies of dots on which Kokotajlo conjectured that the model was getting distracted. While METR cut some dots and spaces out for brevity, nearly every block of dots contained exactly 16 dots. So the dots either didn’t count anything or the counting was done in the part that METR threw away.
Repeated token sequences—is it possible that those tokens are computational? Detached from their meaning by RL, now emitted solely to perform some specific sort of computation in the hidden state? Top left quadrant—useful thought, just not at all a language.
Did anyone replicate this specific quirk in an open source LLM?
“Spandrel” is very plausible for that too. LLMs have a well known repetition bias, so it’s easy to see how that kind of behavior could pop up randomly and then get reinforced by an accident. So is “use those tokens to navigate into the right frame of mind”, it seems to get at one common issue with LLM thinking.
We had METR evaluate GPT-5 and find that GPT-5′s CoT contained armies of dots on which Kokotajlo conjectured that the model was getting distracted. While METR cut some dots and spaces out for brevity, nearly every block of dots contained exactly 16 dots. So the dots either didn’t count anything or the counting was done in the part that METR threw away.