Lee.aao comments on On GPT-4.5

Lee.aao 5 Mar 2025 14:31 UTC
1 point
0
TLDR my reaction is I don’t really know how good these models are right now.
I felt exactly the same after the Claude 3.7 post.

But actually.. hasn’t LiveBench solved the evals crisis?
It is specifically targeted a “subjective” and “cheating/hacking” problems.
It also cover a pretty broad set of capabilities.