If my suspicions are true, then the bubble will pop after it becomes clear that the METR law[1] reverted to its original trend of doubling the time horizon every 7 months along with training compute costs (and do inference compute costs grow even faster?)
However, my take at scaling laws could be invalidated in a few days if it mispredicts things like the METR-measured time horizon of Claude Haiku 4.5 (which I forecast to be ~96 minutes) or performance of Gemini 3[2] on the ARC-AGI-1 benchmark. (Since o4-mini, o3, GPT-5 form a nearly straight line, while Claude Sonnet 4.5[3] produces results on the line or a bit under the line, I don’t expect Gemini 3 to land above the line).
Yes, it could be a plausible scenario. But the project can in theory be directly sponsored by the government. Or a Chinese project could be sponsored by the CCP. What I suspect is that creating superhuman coders or researchers is infeasible due to problems not just with economy, but with scaling laws and quantity of training data unless someone does make a bold move and apply some new architectures.
My other predictions of progress on benchmarks
If my suspicions are true, then the bubble will pop after it becomes clear that the METR law[1] reverted to its original trend of doubling the time horizon every 7 months along with training compute costs (and do inference compute costs grow even faster?)
However, my take at scaling laws could be invalidated in a few days if it mispredicts things like the METR-measured time horizon of Claude Haiku 4.5 (which I forecast to be ~96 minutes) or performance of Gemini 3[2] on the ARC-AGI-1 benchmark. (Since o4-mini, o3, GPT-5 form a nearly straight line, while Claude Sonnet 4.5[3] produces results on the line or a bit under the line, I don’t expect Gemini 3 to land above the line).
However, METR will likely have to create a new set of tasks in order to measure the horizon.
Which is rumored to appear on October 22.
It is also true for Claude Haiku 4.5, but I made the conjecture before learning about Haiku’s performance.