ryan_greenblatt comments on peterbarnett’s Shortform

ryan_greenblatt 20 May 2025 23:45 UTC
9 points
0
Agreed, but I’d guess this also hits capabilities unless you have some clever diffusion+reasoning approach which might recover guarantees which aren’t wildly worse than normal CoT guarantees. (Unless you directly generate blocks reasoning in diffusion neuralese or similar.)

That said, I’m surprised it gets 23% on AIME given this, so I think they must have found some reasoning strategy which works well enough in practice. I wonder how it solves these problems.
- ryan_greenblatt 21 May 2025 0:15 UTC
  4 points
  0
  Parent
  Based on how it appears to solve math problems, I’d guess the guarantees you get based on looking at the CoT aren’t wildly worse than what you get from autoregressive models, but probably somewhat more confusing to analyze and there might be a faster path to a particularly bad sort of neuralese. They show a video of it solving a math problem on the (desktop version of the) website, here is the final reasoning:
  - cubefox 21 May 2025 8:34 UTC
    13 points
    10
    Parent
    Note, the video doesn’t show up for me.
- Aidan Ewart 22 May 2025 14:09 UTC
  1 point
  0
  Parent
  Why is this your intuition?
  
  At the moment they seem to just make it imitate normal-ish CoT, which would presumably improve accuracy because the model has more token-positions/space/capacity to do things like check for self-consistency. You’re still scaling up a compute dimension that the model can use for solving things, and you can still do normal RL things to it from that point.
  
  It’s just maybe worse in this case because the causality from reasoning chains → the part of the response containing the answer is worse (it was bad before, but now it is horrible).