Pretraining (GPT-4.5, Grok 4, but also counterfactual large runs which weren’t done) disappointed people this year. It’s probably not because it wouldn’t work; it was just ~30 times more efficient to do post-training instead, on the margin. This should change, yet again, soon, if RL scales even worse.
IMO this should be edited to say Grok 3 instead of Grok 4. Grok 3 was mostly pre-training, and Grok 4 was mostly Grok 3 with more post-training.
IMO this should be edited to say Grok 3 instead of Grok 4. Grok 3 was mostly pre-training, and Grok 4 was mostly Grok 3 with more post-training.
You’re saying they’re the same base model? Cite?
Elon changed the planned name of Grok 3.5 to Grok 4 shortly before release:
https://x.com/elonmusk/status/1936333964693885089?s=20
Then used this image during Grok 4 release announcement:
They don’t confirm it outright, but it’s heavily implied and it was widely understood at the time to be the same pre-train.
Thanks!