ryan_greenblatt comments on ryan_greenblatt’s Shortform

ryan_greenblatt 6 Aug 2025 15:04 UTC
5 points
−2
Seems plausible, I was just looking at benchmarks/evals at the time and it could be that it’s much less capable than it appears.

(Worth noting that I think (many? most? all?) other competitive open source models are also at least somewhat benchmark maxed, so this alters the comparison a bit, but doesn’t alter the absolute level of capability question.)