Neuralese architectures that outperform standard transformers on big tasks turn out to be relatively hard to do, and are at least not trivial to scale up (this mostly comes from diffuse discourse, but one example of this is here, where COCONUT did not outperform standard architectures in benchmarks)
Steganography is so far proving quite hard for models to do (examples are here and here and here)
So I don’t really worry about models trying to change their behavior in ways that negatively affect safety/sandbag tasks via steganography/one-forward pass reasoning to fool CoT monitors.
We shall see in 2026 and 2027 whether this continues to hold for the next 5-10 years or so, or potentially more depending on how slowly AI progress goes.
Edit: I retracted the claim that most capabilities come from CoT, due to the paper linked in the very next tweet, and think that RL on CoTs is basically a capability elicitation, not a generator of new capabilities.
As for AI progress being slow, I think that without theoretical breakthroughs like neuralese AI progress might come to a stop or at building more and more expensive models. Indeed, the two ARC-AGI benchmarks[1]could have demonstrated a pattern where maximal capabilities scale[2]linearly or multilinearlywith ln(cost/task).
If this effect persists deep into the future of transformer LLMs, then most AI companies could run into the limits of the paradigm well before researching the next one and losing any benefits of having a concise CoT.
Unlike GPT-5-mini, maximal capabilities of o4-mini, o3, GPT-5, Claude Sonnet 4.5 in the ARC-AGI-1 benchmark scale more steeply and intersect the frontier at GPT-5(high).
For what it’s worth, I don’t think it matters for now, for a couple of reasons:
Most of the capabilities gained this year have come from inference scaling which uses CoT more heavily than pre-training scaling which improves forward passes,though you could reasonably argue that most RL inference gains are basically just a good version of how scaffolding would work in agents like AutoGPT, and don’t give new capabilities.Neuralese architectures that outperform standard transformers on big tasks turn out to be relatively hard to do, and are at least not trivial to scale up (this mostly comes from diffuse discourse, but one example of this is here, where COCONUT did not outperform standard architectures in benchmarks)
Steganography is so far proving quite hard for models to do (examples are here and here and here)
For all of these reasons, models are very bad at evading CoT monitors, and the forward pass is also very weak computationally at any rate.
So I don’t really worry about models trying to change their behavior in ways that negatively affect safety/sandbag tasks via steganography/one-forward pass reasoning to fool CoT monitors.
We shall see in 2026 and 2027 whether this continues to hold for the next 5-10 years or so, or potentially more depending on how slowly AI progress goes.
Edit: I retracted the claim that most capabilities come from CoT, due to the paper linked in the very next tweet, and think that RL on CoTs is basically a capability elicitation, not a generator of new capabilities.
As for AI progress being slow, I think that without theoretical breakthroughs like neuralese AI progress might come to a stop or at building more and more expensive models. Indeed, the two ARC-AGI benchmarks[1] could have demonstrated a pattern where maximal capabilities scale[2] linearly or multilinearly with ln(cost/task).
If this effect persists deep into the future of transformer LLMs, then most AI companies could run into the limits of the paradigm well before researching the next one and losing any benefits of having a concise CoT.
The second benchmark demonstrates a similar effect in high costs, but there is no straight line in the low cost mode.
Unlike GPT-5-mini, maximal capabilities of o4-mini, o3, GPT-5, Claude Sonnet 4.5 in the ARC-AGI-1 benchmark scale more steeply and intersect the frontier at GPT-5(high).
This would be great news if true!