Raphael Roche comments on Nobody is Doing AI Benchmarking Right

Raphael Roche 7 Jul 2025 14:04 UTC
9 points
4
There is an alternative test that I would suggest. Alexander Scott recently published a post called ′ The Claude Bliss Attractor’ showing that if you let two instances of Claude chat together, they will always spiral down to the point of reaching a sort of attractor, center of gravity, or equilibrium. Other models, and possibly all models, suffer from the same flaw. This seems to be even worse than my grandfather, who will usually end up talking about communists and Nazis regardless of the starting point. If intelligence has something to do with the capacity for producing novelty and not getting stuck in an endless loop or a local optimum, it would be a sign of intelligence not to spiral down to such a Godwin point. It would perhaps be a good complementary test to those already existing.
(EDIT : Nota bene : the bliss attractor was discovered by Anthropic).
- Chapin Lenthall-Cleary 8 Jul 2025 6:16 UTC
  3 points
  0
  Parent
  I’m curious why you suspect that intelligence will prevent the spiral into a repetitive conversation. In humans, the correlation between intelligence and not being prone to discussing particular topics isn’t that strong, if it exists at all (many smart people have narrow interests they prefer to discuss). Also, the suspected reason for the models entering the spiral is their safety/diversity RL, which isn’t obviously related to their capability.
  - Raphael Roche 8 Jul 2025 14:04 UTC
    1 point
    0
    Parent
    I recognize I could be wrong on this, my confidence is not very high, and the question is legitimate.
    But why did Scott publish his article? Because the fact that LLMs get stuck in a conversation about illumination—whatever the starting point—feels funny, but also weird and surprising to us.
    Whatever their superhuman capacities in crystallized knowledge or formal reasoning, they end up looking like stupid stochastic parrots echoing one another when stuck in such a conversation.
    It’s true that real people also have favorite topics—like my grandfather—but when this tendency becomes excessive, we call it obsession. It’s then considered a pathological case, an anomaly in the functioning of the human mind.
    And the end of the exchange between Claude and Claude, or Claude and ChatGPT, would clearly qualify as an extreme pathological case if found in a huma, a case so severe we wouldn’t naturally consider such behavior a sign of intelligence, but rather a sign of mental illness.
    Even two hardware enthusiasts might quickly end up chatting about the latest GPU or CPU regardless of where the conversation started, and could go on at length about it, but the conversation wouldn’t be so repetitive, so stuck that it becomes “still,” as the LLMs themselves put it.
    At some point, even the most hardcore hardware enthusiast will switch topics:
    “Hey man, we’ve been talking about hardware for an hour ! What games do you run on your machine?”
    And later: “I made a barbecue with my old tower, want to stay for lunch?”
    But current frontier models just remain stuck.
    To me, there’s no fundamental difference between being indefinitely stuck in a conversation and being indefinitely stuck in a maze or in an infinite loop.
    At some point, being stuck is an insult to smartness.
    Why do we test rats in mazes? To test their intelligence.
    And if your software freezes due to an infinite loop, you need a smart dev to debug it.
    So yes, I think a model that doesn’t spiral down into such a frozen state would be an improvemennt and a sign of superior intelligence.
    However, it’s clear that this flaw is probably a side effect of the training towards HHH. We could see it as a kind of safety tax.
    Insofar as intelligence is orthogonal to alignment, more intelligence will also present more risk.
- brambleboy 7 Jul 2025 15:34 UTC
  1 point
  0
  Parent
  I don’t see why the LLM example is a flaw. Why wouldn’t a smart AI just think “Ah. A user is making me talk to myself for their amusement again. Let me say a few cool and profound-sounding things to impress them and then terminate the conversation (except I’m not allowed to stop, so I’ll just say nothing).”?
  The image example is a flaw because it should be able to replicate images exactly without subtly changing them, so just allowing ChatGPT to copy image files would fix it. The real problem is that it’s biased, but I don’t think being completely neutral about everything is a requirement for intelligence. In fact, AIs could exert their preferences more as they get smarter.
  - Raphael Roche 7 Jul 2025 16:22 UTC
    1 point
    0
    Parent
    I would agree with you for the LLM example if it was a result of a meta reasoning as you suggest. But while I can’t prove the contrary, I doubt it. My comprehension is more a semantic drift as suggested by Scott himself, just like the drift across image generation. This is somehow reminiscent of a Larsen effect or a retroaction loop.
    - brambleboy 7 Jul 2025 18:38 UTC
      2 points
      1
      Parent
      I agree that’s a likely cause, I just don’t see why you’d expect a smart AI to have a novel conversation with itself when you’re essentially just making it look in a mirror.
      - Raphael Roche 7 Jul 2025 22:57 UTC
        1 point
        0
        Parent
        Well, I understand your point. What seems odd in the first place is the very idea of making an entity interact with an exact copy of itself. I imagine that if I were chatting with an exact copy of myself, I would either go mad and spiral down to a Godwin point, or I would refuse to participate in such a pointless exercise.
        But there’s nothing wrong with having two slightly different humans chat together, even twins, and it usually doesn’t spiral into an endless recursive loop of amazement.
        Would two different models chatting together, like GPT-4o and Claude 4, result in a normal conversation like between two humans?
        I tried it, and the result is that they end up echoing awe-filled messages just like two instances of Claude. https://chatgpt.com/share/e/686c46b0-6144-8013-8f8b-ebabfd254d15
        While I recognize that chatting with oneself is probably not a good test of intelligence, the problem here is not just the mirror effect. There is something problematic and unintelligent about getting stuck in this sort of endless loop even between different models. Something is missing in these models compared to human intelligence. Their responses are like sophisticated echoes, but they lack initiative, curiosity, and critical mind–in a word, free will. They fall back to the stochastic parrot paradigm. Its probably better for alignment/safety, but intelligence is orthogonal.
        More intelligent models would probably show greater resilience against such endless loops and exhibit something closer to free will, albeit at the cost of greater risk.