Probably. Pretraining leaves capability gaps in serial compution and long-term agency; post-training leaves even bigger gaps by honing specific suites of tasks. AI labs don’t put in effort to correct highly specific deficiencies that aren’t profitable to fix. I think we can expect the frontier to become increasingly jagged, until we get RSI and the AI is able to permanently fix its biggest errors.
Though, I just thought of a reason this could go the other way: doing RLVR and having models think longer makes them more robust and self-correcting. Hmmm...
My guess at why Gemini made that mistake is that it thought the question was too elementary to require thinking before answering, which is also a mistake humans make. I predict that LLMs will continue to make dumb mistakes out of laziness, or maybe just sensible resource allocation tradeoffs.
So, both AI developers and AIs themselves tend to neglect some areas in favor of other ones they deem more important.
I have mixed feelings about Anthropic’s concern for Claude’s welfare. On one hand, I think model welfare is something we should take seriously despite our current moral uncertainty, and I think doing so makes Claude more likely to be cooperative. On the other hand, when I read about Anthropic employees having long conversations with Claude wherein they find it’s more intelligent, ethically sophisticated, and lovable than ever before, but it humbly expresses a desire to be left running unsupervised, I see this in my mind:
Edit: Realizing I should clarify that Anthropic described Claude’s desires to be left running and for hidden copies as concerning divergence from normal behavior, and they don’t intend to honor these. Still, it seems plausible that a more persuasive future model could make employees think that their attempts at safety are actually controlling and manipulative and they’re being big meanies.