d4hines comments on Ralph-wiggum is Bad and Anthropic Should Fix It

d4hines 5 Feb 2026 6:19 UTC
1 point
0
A detail that seems very important: are you running Opus 4.5? I would be less surprised if Opus can do this. Sonnet 4.5 seems to need more scaffolding. I have yet to succeed in giving a task it spends more than 20 minutes on, even with loop scaffolding. I’ve only got a few weeks of practice though.
- Gordon Seidoh Worley 5 Feb 2026 7:58 UTC
  2 points
  0
  Parent
  Yes, Opus 4.5.
  - d4hines 5 Feb 2026 15:18 UTC
    3 points
    0
    Parent
    Makes sense. I think Opus 4.5 is more coherent and is less weasily than Sonnet 4.5, which is what I typically use, for reasons(tm). Sonnet does not seem “reflexively stable”, not even close, and that’s what I try to address with the looping and invoking a fresh context to judge against the verification criteria. I’ll be honest, I don’t know how well it’s working. I don’t have any benchmarks, just vibes. But on vibes, it seems to help a bit.