Agreed, but I’d guess this also hits capabilities unless you have some clever diffusion+reasoning approach which might recover guarantees which aren’t wildly worse than normal CoT guarantees. (Unless you directly generate blocks reasoning in diffusion neuralese or similar.)
That said, I’m surprised it gets 23% on AIME given this, so I think they must have found some reasoning strategy which works well enough in practice. I wonder how it solves these problems.
Based on how it appears to solve math problems, I’d guess the guarantees you get based on looking at the CoT aren’t wildly worse than what you get from autoregressive models, but probably somewhat more confusing to analyze and there might be a faster path to a particularly bad sort of neuralese. They show a video of it solving a math problem on the (desktop version of the) website, here is the final reasoning:
At the moment they seem to just make it imitate normal-ish CoT, which would presumably improve accuracy because the model has more token-positions/space/capacity to do things like check for self-consistency. You’re still scaling up a compute dimension that the model can use for solving things, and you can still do normal RL things to it from that point.
It’s just maybe worse in this case because the causality from reasoning chains → the part of the response containing the answer is worse (it was bad before, but now it is horrible).
Agreed, but I’d guess this also hits capabilities unless you have some clever diffusion+reasoning approach which might recover guarantees which aren’t wildly worse than normal CoT guarantees. (Unless you directly generate blocks reasoning in diffusion neuralese or similar.)
That said, I’m surprised it gets 23% on AIME given this, so I think they must have found some reasoning strategy which works well enough in practice. I wonder how it solves these problems.
Based on how it appears to solve math problems, I’d guess the guarantees you get based on looking at the CoT aren’t wildly worse than what you get from autoregressive models, but probably somewhat more confusing to analyze and there might be a faster path to a particularly bad sort of neuralese. They show a video of it solving a math problem on the (desktop version of the) website, here is the final reasoning:
Note, the video doesn’t show up for me.
Why is this your intuition?
At the moment they seem to just make it imitate normal-ish CoT, which would presumably improve accuracy because the model has more token-positions/space/capacity to do things like check for self-consistency. You’re still scaling up a compute dimension that the model can use for solving things, and you can still do normal RL things to it from that point.
It’s just maybe worse in this case because the causality from reasoning chains → the part of the response containing the answer is worse (it was bad before, but now it is horrible).