anaguma comments on Varieties Of Doom

anaguma 18 Nov 2025 16:04 UTC
7 points
2
Yep, this is probabaly true for pretraining but this seems less and less relevant these days. For example, according to the Grok 4 presentation the model used as much compute in pretraining as in RL. I’d expect this trend to continue.