d4hines comments on Ralph-wiggum is Bad and Anthropic Should Fix It

d4hines 5 Feb 2026 15:18 UTC
3 points
0
Makes sense. I think Opus 4.5 is more coherent and is less weasily than Sonnet 4.5, which is what I typically use, for reasons(tm). Sonnet does not seem “reflexively stable”, not even close, and that’s what I try to address with the looping and invoking a fresh context to judge against the verification criteria. I’ll be honest, I don’t know how well it’s working. I don’t have any benchmarks, just vibes. But on vibes, it seems to help a bit.