StanislavKrym comments on 2025 Year in Review

StanislavKrym 31 Dec 2025 20:29 UTC
4 points
0
Anthropic then gave us the big one, Claude Opus 4.5, which is for now the clear best model available, and remains my daily driver, both for chat and also in Claude Code.
Claude Opus 4.5 felt like a large practical leap, some like Dean Ball going so far as to call it AGI. I don’t agree but I understand where they are coming from.
Alas, Claude Opus 4.5 is likely on track to force METR to add new tasks to the benchmark because comments like this and this indicate that the benchmark itself is no longer as reliable as it once was…
P.S. If you inserted a video in the original post, then why did it become lost in cross-posting?