jdp comments on Varieties Of Doom

jdp 18 Nov 2025 10:02 UTC
13 points
4
Reinforcement learning is not the same kind of thing as pretraining because it involves training on your own randomly sampled rollouts, and RL is generally speaking more self reinforcing and biased than other neural net training methods. It’s more likely to get stuck in local maxima (it’s infamous for getting stuck in local maxima, in fact) and doesn’t have quite the same convergence properties as “pretraining on giant dataset”.
- Adrià Garriga-alonso 19 Nov 2025 23:05 UTC
  7 points
  8
  Parent
  The rise and rise of RL as a fraction of compute should therefore makes us less likely to think that the convergent representation hypothesis will apply to AGI. (Thought it clearly applies to LLMs now).
- anaguma 18 Nov 2025 16:04 UTC
  7 points
  2
  Parent
  Yep, this is probabaly true for pretraining but this seems less and less relevant these days. For example, according to the Grok 4 presentation the model used as much compute in pretraining as in RL. I’d expect this trend to continue.