Alas, it might have been the consequence not just of scaling up RLVR, but of something else. Nine days ago I remarked that “The time horizon of base LLMs experienced a slowdown or plateau[1] between GPT-4 (5 minutes, Mar′23) and GPT-4o (9 min, May ’24),” and a similar semi-plateau was experienced by DeepSeek, implying that acceleration could be driven by another undisclosed breakthrough.
Daniel Kokotajlo also thinks that “we should have some credence on new breakthroughs e.g. neuralese,[2] online learning,[3] whatever. Maybe like 8%/yr? Of a breakthrough that would lead to superhuman coders within a year or two, after being appropriately scaled up and tinkered with.”
There is also Gemini Diffusion previewed around three months ago, already known to be OOMs faster[4] and likely having interpretability problems. What if Gemini Diffusion is released to the public in about 1-3 months[5] and beats lots of models of similar compute class on various benchmarks?
Or ideas like Knight Lee’s proposal which make the model more interpretable and nudgeable than neuralese while offering less capabilities boost than neuralese. What if Lee-like architectures are used in Agent-2-level systems and neuralese is used in Agent-3+?
For comparison, o1 was previewed on Sep 12, 2024 and released on Dec 5, 2024. o3 was previewed on Dec 20, 2024 and released on Apr 16, 2025, meaning that a model is likely to be released 3-4 months after the preview. There is also GPT-5-thinking, which Zvi, quoting VictorTaelin, compares with o4. If it’s true, then o4-mini is released 4 months ahead of o4. o3-mini was released about a month after o3′s preview, implying that o4 could’ve been previewed 5 months before the release. If Gemini Diffusion is released 6 months after the preview, then it will be released 3 months from now.
Alas, it might have been the consequence not just of scaling up RLVR, but of something else. Nine days ago I remarked that “The time horizon of base LLMs experienced a slowdown or plateau[1] between GPT-4 (5 minutes, Mar′23) and GPT-4o (9 min, May ’24),” and a similar semi-plateau was experienced by DeepSeek, implying that acceleration could be driven by another undisclosed breakthrough.
Daniel Kokotajlo also thinks that “we should have some credence on new breakthroughs e.g. neuralese,[2] online learning,[3] whatever. Maybe like 8%/yr? Of a breakthrough that would lead to superhuman coders within a year or two, after being appropriately scaled up and tinkered with.”
There is also Gemini Diffusion previewed around three months ago, already known to be OOMs faster[4] and likely having interpretability problems. What if Gemini Diffusion is released to the public in about 1-3 months[5] and beats lots of models of similar compute class on various benchmarks?
While GPT-4.5 has a time horizon between 30 and 40 mins, it, unlike GPT-4o, was a MoE and was trained on CoTs.
Or ideas like Knight Lee’s proposal which make the model more interpretable and nudgeable than neuralese while offering less capabilities boost than neuralese. What if Lee-like architectures are used in Agent-2-level systems and neuralese is used in Agent-3+?
However, I fail to understand how online learning boosts capabilities.
If diffusion models use OOMs less compute than traditional LLMs, then can one make training runs of diffusion models similarly cheaper?
For comparison, o1 was previewed on Sep 12, 2024 and released on Dec 5, 2024. o3 was previewed on Dec 20, 2024 and released on Apr 16, 2025, meaning that a model is likely to be released 3-4 months after the preview. There is also GPT-5-thinking, which Zvi, quoting VictorTaelin, compares with o4. If it’s true, then o4-mini is released 4 months ahead of o4. o3-mini was released about a month after o3′s preview, implying that o4 could’ve been previewed 5 months before the release. If Gemini Diffusion is released 6 months after the preview, then it will be released 3 months from now.