I’m not sure this constraint (single forward pass as opposed to reasoning-enabled) is very meaningful. If it is still doing the reasoning as it answers the question, does it make a big difference whether that is enclosed by <thinking></thinking> tags? I would be interested to see a version of this where the answer has to be the first token output. That would show how much thinking it can do in one step.
I’m not sure this constraint (single forward pass as opposed to reasoning-enabled) is very meaningful. If it is still doing the reasoning as it answers the question, does it make a big difference whether that is enclosed by <thinking></thinking> tags? I would be interested to see a version of this where the answer has to be the first token output. That would show how much thinking it can do in one step.
The model can not output any tokens before answering, it has to respond immediately. This is what I mean by “no CoT”.