Cole Wyeth comments on Inverse Scaling in Test-Time Compute

Cole Wyeth 22 Jul 2025 22:45 UTC
3 points
0
The most interesting part of this for me is that performance drops with increasing reasoning length. I’m not sure how to interpret this fact though… is it just very unnatural for reasoning models to spend so many tokens on a simple question?