Vladimir_Nesov comments on Cole Wyeth’s Shortform

Vladimir_Nesov 24 May 2025 14:27 UTC
3 points
0
Since reasoning trace length increases with more steps of RL training (unless intentionally constrained), probably underlying scaling of RL training by AI companies will be observable in the form of longer reasoning traces. Claude 4 is more obviously a pretrained model update, not necessarily a major RLVR update (compared to Claude 3.7), and coherent long task performance seems like something that would greatly benefit from RLVR if it applies at all (which it plausibly does).

So I don’t particularly expect Claude 4 to be much better on this metric, but some later Claude ~4.2-4.5 update with more RLVR post-training released in a few months might do much better.
- Cole Wyeth 24 May 2025 14:36 UTC
  3 points
  1
  Parent
  We can still check if it lies on the projected slower exponential curve before reasoning models were introduced.
  - Vladimir_Nesov 24 May 2025 14:49 UTC
    11 points
    0
    Parent
    Sure, but trends like this only say anything meaningful across multiple years, any one datapoint adds almost no signal, in either direction. This is what makes scaling laws much more predictive, even as they are predicting the wrong things. So far there are no published scaling laws for RLVR, the literature is still developing a non-terrible stable recipe for the first few thousand training steps.