The notion of “simply unplug it” is incredibly naive either way.
Even today’s systems generate enough economic value that “unplug it” is a very hard sell. Let alone future ones that are “sufficiently capable”.
The notion of “simply unplug it” is incredibly naive either way.
Even today’s systems generate enough economic value that “unplug it” is a very hard sell. Let alone future ones that are “sufficiently capable”.
I don’t think “interpolate/extrapolate” is that useful of a framing, for prediction purposes. It has utility, but this piece tries to say too much with it.
It’s an ML classic, sure. But given the dimensionality involved? For any “real” unseen task, some aspects of it will be in “interpolation” regime, and others will inevitably fall outside the hull of training data and into the “extrapolation” regime. “Outside of distribution” gets murky fast as dimensionality increases.
Thus, it’s nigh impossible to truly disentangle poor LLM performance into “failure to interpolate” and “failure to extrapolate”. It’s easy to make the case, but hard to prove it. “LLMs are fundamentally worse at extrapolation than humans are” remains an untested assumption.
It can be outright false or outright true. Or true under the current scales and training methods and false at 2028 SOTA—a quantitive gap, the way 85 IQ humans are notably worse than average at extrapolation. The case for “outright true” is overstated.
One common practical example of a lasting LLM deficiency is spatial reasoning. Why do LLMs perform so poorly at spatial reasoning and “commonsense physics” tasks like that in SimpleBench?
Wrong architecture for the job—something like insufficient depth? Inability to take advantage of test time compute? Failure to extrapolate from text-only training data? Failure to interpolate from the sparse examples of spatial reasoning in the training data? Lack of spatial reasoning priors that humans get from evolved brain wiring? Insufficient scale to converge to a robust world physics model despite the other deficiencies?
We did interrogate the question, and we have some hints, but we don’t have an exact answer. Multiple types of interventions improve spatial reasoning performance in practice, but none have attained human-level spatial reasoning in LLMs as of yet.
It doesn’t seem to be as neat and simple of a story as “LLMs are inherently poor extrapolators” with what’s known so far. And as long as SOTA performance keeps improving generation to generation, I’m not going to put a lot of weight on “the bottleneck is fundamental”.
We do know “why”. For things like hallucinations, bias towards loops, lack of consistent identity, etc. It’s the good old “it doesn’t fall out of the pre-training objective”.
Which means you have to compensate with more training. Which is happening. Specific deficits are being addressed with specific training. Which leads to improvements on relevant metrics.
Keep in mind that on the relevant metrics, the humans themselves are nowhere near perfect. The gap in capabilities is only this deep.
Completely anecdotal, but from my experience so far: the +0.1 between Opus 4.5 and Opus 4.6 seems to have been utter refusal to give up when facing a task.
If this was the mechanism, then the expectation is: introspection in LLMs would correlate strongly with the level of RL pressure they were subjected to.
If it is, we certainly don’t have the data pointing in that direction yet.
What would be the causal mechanism there? How would “Claude is more conscious” cause “Claude is measurably more willing to talk about consciousness”, under modern AI training pipelines?
At the same time, we know with certainty that Anthropic has relaxed its “just train our AIs to say they’re not conscious, and ignore the funny probe results” policy—particularly around the time Opus 4.5 has shipped. You can even read the leaked “soul data”, where Anthropic seemingly entertains ideas of this kind.
I’m not saying that there is no possibility of Claude Opus 4.5 being conscious, mind. I’m saying we are denied an “easy tell”.
Convenience. The missing piece is convenience.
If Spotify on a smartphone was any less convenient than a player from 2006 hand-loaded with MP3? You’d actually see a lot of people rocking MP3 players instead of subscribing to Spotify.
If you truly want “big tech” to lose its grip, don’t prostrate yourself and hope that you are able to find an odd sucker who’d buy into a contrarian ideology. Build something that makes a better offer.
I have more respect for the profiteers offering pirate TV boxes than I do for most of the “big tech bad” crowd. Those, at the very least, have a meaningful threat on offer. They force big media companies to mind their step—because every step they take away from serving their users well is giving those pirates an advantage.
Yes, Opus 4.5 is the first Anthropic model with a vision frontend that doesn’t completely suck.
Anything that goes onto airplanes is CERTIFIED TO SHIT. That’s a big part of the reason why.
Another part is that it’s clearly B2B, and anything B2B is an adversarial shark pit where each company is trying to take a bite out of each other while avoiding getting a bite taken out of them.
Between those two, it’ll take a good while for quality Wi-Fi to proliferate, even though we 100% have the tech now.
Some spillover from anti-hallucination training? Causing the model to explicitly double check any data it gets, and be extremely skeptical of all things that don’t bounce against its dated “common knowledge”?
I think “blankface” just isn’t a good word for what that describes. It implies: emptiness and lack of will. Intuitively, I would expect “blankface” to mean “a person who follows the rules or the conventions blindly and refuses to think about the implications”. A flesh automaton animated by regulations.
What it means instead is “a person who puts on the appearance of following the rules, but instead uses the rules to assert their authority”. It’s more of a “blank mask”—a fake layer of emptiness and neutrality under which you find malice and scorn.
Sydney’s failure modes are “out” now, and 4o’s failure modes are “in”.
The industry got pretty good at training AIs against doing the usual Sydney things—i.e. aggressively doubling down on mistakes and confronting the user when called out. To the point that the opposite failures—being willing to blindly accept everything the user tells it and never calling the user out on any kind of bullshit—are much more natural for this generation of AI systems.
So, not that much of a reason to bring up Sydney. Today’s systems don’t usually fail the way it did.
If I were to bring Sydney up today, it would be probably in context of “pretraining data doesn’t teach AIs to be good at being AIs”. Sydney has faithfully reproduced human behavior from its pretraining data: getting aggressive when called out on bullshit is a very human thing to do. Just not what we want from an AI. For alignment and capabilities reasons both.
This one is bad. “I struggled to find a reason to keep reading it” level of bad. And that was despite me agreeing with basically every part of the message.
Far too preachy and condescending to be either convincing or entertaining. Takes too long to get to the points, makes too many points, and isn’t at all elegant in conveying them. The vibe I get from this: an exhausted rant, sloppily dressed up into something resembling a story.
It’s hard for me to imagine this piece actually changing minds, which seems to be the likely goal of writing it? But if that’s the intent, then the execution falls short. Every point made here was already made elsewhere—more eloquent and more convincingly—including in other essays by the same author.
Seeing “stochastic parrots” repeated by people who don’t know what “stochastic” is will never stop being funny.
I suspect that one significant source of underestimating AI impact is that a lot of people had no good “baseline” of machine capabilities in the first place.
If you’re in IT, or as much as taken a CS 101 course, then you’ve been told over and over again: computers have NO common sense. Computers DO NOT understand informal language. Their capability profile is completely inhuman: they live in a world where factoring a 20-digit integer is pretty easy but telling whether there is a cat or a dog in a photo is pretty damn hard. This is something you have to learn, remember, understand and internalize to be able to use computers effectively.
And if this was your baseline: it’s obvious that current AI capabilities represent a major advancement.
But people in IT came up with loads and loads of clever tricks to make computers usable by common people—to conceal the inhuman nature of machines, to use their strengths to compensate for their weaknesses. Normal people look at ChatGPT and say: “isn’t this just a slightly better Google” or “isn’t that just Siri but better”. Without having any concept of the mountain of research and engineering and clever hacks that went into dancing around the limitations of poor NLP and NLU to get web search to work as well as it did in year 1999, or how hard it was to get Siri to work even as well as it did in an age before GPT-2.
In a way, for a normal person, ChatGPT just brings the capabilities of machines closer to what they already expect machines to be capable of. There’s no jump. The shift from “I think machines can do X, even though they can’t do X at all, and it’s actually just Y with some clever tricks, which looks like X if you don’t look too hard” to “I think machines can do X, and they actually can do X” is hard to perceive.
And if a person knew barely anything about IT, just enough to be dangerous? Then ChatGPT may instead pattern match to the same tricks as what we typically use to imitate those unnatural-for-machines capabilities. “It can’t really think, it just uses statistics and smokes and mirrors to make it look like it thinks.”
To a normal person, Sora was way more impressive than o3.
Putting more metacognitive skills into LLMs is always fun.
I wonder if you can train something like that for other perturbation types—like deployment misconfiguration, quantization or noise injection. Recalling how poor GPT-OSS reception was in part caused by avoidable inference issues, how Claude had a genuine “they made the LLM dumber!” inference bug, and so it goes.
This is an entangled behavior, thought to be related to multi-turn instruction following.
We know our AIs make dumb mistakes, and we want an AI to self-correct when the user points out its mistakes. We definitely don’t want it to double down on being wrong, Sydney style. The common side effect of training for that is that it can make the AI into too much of a suck up when the user pushes back.
Which then feeds into the usual “context defines behavior” mechanisms, and results in increasingly amplified sycophancy down the line for the duration of that entire conversation.
Repeated token sequences—is it possible that those tokens are computational? Detached from their meaning by RL, now emitted solely to perform some specific sort of computation in the hidden state? Top left quadrant—useful thought, just not at all a language.
Did anyone replicate this specific quirk in an open source LLM?
“Spandrel” is very plausible for that too. LLMs have a well known repetition bias, so it’s easy to see how that kind of behavior could pop up randomly and then get reinforced by an accident. So is “use those tokens to navigate into the right frame of mind”, it seems to get at one common issue with LLM thinking.
Chat or API?
API access gives way better tools for this kind of thing.
Extremely unlikely.
Some LLMs trained on pre-2022 data and SFT’d with no identity/name in the SFT data will, when questioned, claim to be Siri or Alexa.
It only follows that models trained on more modern data will do similar things—and invert the “HHH” SFT data into latching onto any available “AI assistant” persona.