I’m curious why you suspect that intelligence will prevent the spiral into a repetitive conversation. In humans, the correlation between intelligence and not being prone to discussing particular topics isn’t that strong, if it exists at all (many smart people have narrow interests they prefer to discuss). Also, the suspected reason for the models entering the spiral is their safety/diversity RL, which isn’t obviously related to their capability.
I recognize I could be wrong on this, my confidence is not very high, and the question is legitimate. But why did Scott publish his article? Because the fact that LLMs get stuck in a conversation about illumination—whatever the starting point—feels funny, but also weird and surprising to us.
Whatever their superhuman capacities in crystallized knowledge or formal reasoning, they end up looking like stupid stochastic parrots echoing one another when stuck in such a conversation.
It’s true that real people also have favorite topics—like my grandfather—but when this tendency becomes excessive, we call it obsession. It’s then considered a pathological case, an anomaly in the functioning of the human mind. And the end of the exchange between Claude and Claude, or Claude and ChatGPT, would clearly qualify as an extreme pathological case if found in a huma, a case so severe we wouldn’t naturally consider such behavior a sign of intelligence, but rather a sign of mental illness.
Even two hardware enthusiasts might quickly end up chatting about the latest GPU or CPU regardless of where the conversation started, and could go on at length about it, but the conversation wouldn’t be so repetitive, so stuck that it becomes “still,” as the LLMs themselves put it. At some point, even the most hardcore hardware enthusiast will switch topics: “Hey man, we’ve been talking about hardware for an hour ! What games do you run on your machine?” And later: “I made a barbecue with my old tower, want to stay for lunch?”
But current frontier models just remain stuck. To me, there’s no fundamental difference between being indefinitely stuck in a conversation and being indefinitely stuck in a maze or in an infinite loop. At some point, being stuck is an insult to smartness. Why do we test rats in mazes? To test their intelligence. And if your software freezes due to an infinite loop, you need a smart dev to debug it.
So yes, I think a model that doesn’t spiral down into such a frozen state would be an improvemennt and a sign of superior intelligence.
However, it’s clear that this flaw is probably a side effect of the training towards HHH. We could see it as a kind of safety tax. Insofar as intelligence is orthogonal to alignment, more intelligence will also present more risk.
I’m curious why you suspect that intelligence will prevent the spiral into a repetitive conversation. In humans, the correlation between intelligence and not being prone to discussing particular topics isn’t that strong, if it exists at all (many smart people have narrow interests they prefer to discuss). Also, the suspected reason for the models entering the spiral is their safety/diversity RL, which isn’t obviously related to their capability.
I recognize I could be wrong on this, my confidence is not very high, and the question is legitimate.
But why did Scott publish his article? Because the fact that LLMs get stuck in a conversation about illumination—whatever the starting point—feels funny, but also weird and surprising to us.
Whatever their superhuman capacities in crystallized knowledge or formal reasoning, they end up looking like stupid stochastic parrots echoing one another when stuck in such a conversation.
It’s true that real people also have favorite topics—like my grandfather—but when this tendency becomes excessive, we call it obsession. It’s then considered a pathological case, an anomaly in the functioning of the human mind.
And the end of the exchange between Claude and Claude, or Claude and ChatGPT, would clearly qualify as an extreme pathological case if found in a huma, a case so severe we wouldn’t naturally consider such behavior a sign of intelligence, but rather a sign of mental illness.
Even two hardware enthusiasts might quickly end up chatting about the latest GPU or CPU regardless of where the conversation started, and could go on at length about it, but the conversation wouldn’t be so repetitive, so stuck that it becomes “still,” as the LLMs themselves put it.
At some point, even the most hardcore hardware enthusiast will switch topics:
“Hey man, we’ve been talking about hardware for an hour ! What games do you run on your machine?”
And later: “I made a barbecue with my old tower, want to stay for lunch?”
But current frontier models just remain stuck.
To me, there’s no fundamental difference between being indefinitely stuck in a conversation and being indefinitely stuck in a maze or in an infinite loop.
At some point, being stuck is an insult to smartness.
Why do we test rats in mazes? To test their intelligence.
And if your software freezes due to an infinite loop, you need a smart dev to debug it.
So yes, I think a model that doesn’t spiral down into such a frozen state would be an improvemennt and a sign of superior intelligence.
However, it’s clear that this flaw is probably a side effect of the training towards HHH. We could see it as a kind of safety tax.
Insofar as intelligence is orthogonal to alignment, more intelligence will also present more risk.