The most interesting part of this for me is that performance drops with increasing reasoning length. I’m not sure how to interpret this fact though… is it just very unnatural for reasoning models to spend so many tokens on a simple question?
The most interesting part of this for me is that performance drops with increasing reasoning length. I’m not sure how to interpret this fact though… is it just very unnatural for reasoning models to spend so many tokens on a simple question?