IF we got and will keep on having strong scaling law improvements, then:
openai’s plan to continue to acquire way more training compute even into 2029 is either lies or a mistake
we’ll get very interesting times quite soon
offense-defense balances and multi-agent-system dynamics seem like good research directions, if you can research fast and have reason to believe your research will be implemented in a useful way
EDIT: I no longer fully endorse the crossed-out bullet point. Details in replies to this comment.
Disagree on pursuit of compute being a mistake in one of those worlds but not the other. Either way you are going to want as much inference as possible during key strategic moments.
This seems even more critically important if you are worried your competitors will have algorithms nearly as good as yours.
if the frontier models are commoditized, compute concentration matters even more
if you can train better models for fewer flops, compute concentration matters even more
compute is the primary means of production of the future and owning more will always be good
12:57 AM · Jan 25, 2025
roon
@tszzl
imo, open source models are a bit of a red herring on the path to acceptable asi futures. free model weights still don’t distribute power to all of humanity, they distribute it to the compute rich
Since R1 came out, people are talking like the massive compute farms deployed by Western labs are a waste, BUT THEY’RE NOT — don’t you see? This just means that once the best of DeepSeek’s clever cocktail of new methods are adopted by GPU-rich orgs, they’ll reach ASI even faster.
]
Agreed. However, in the fast world the game is extremely likely to end before you get to use 2029 compute. EDIT: I’d be very interested to hear an argument against this proposition, though.
I don’t know if the plan is to have the compute from Stargate become available in incremental stages, or all at once in 2029.
I expect timelines are shorter than that, but I’m not certain. If I were in OpenAI’s shoes, I’d want to hedge my bets. 2026 seems plausible. So does 2032. My peak expectation is sometime in 2027, but I wouldn’t want to go all-in on that.
I am almost totally positive that the plan is not that.
If planning for 2029 is cheap, then it probably makes sense under a very broad class of timelines expectations. If it is expensive, then the following applies to the hypothetical presented by the tweet:
The timeline evoked in the tweet seems extremely fast and multipolar. I’d expect planning for 2029 compute scaling to make sense only if the current paradigm gets stuck at ~AGI capabilities level (ie a very good scaffolding for a model similar to but a bit smarter than o3). This is because if it scales further than that it will do so fast (requiring little compute, as the tweet suggests). If capabilities arbitrarily better than o4-with-good-scaffolding are compute-cheap to develop, then things almost certainly get very unpredictable before 2029.
IF we got and will keep on having strong scaling law improvements, then:
openai’s plan to continue to acquire way more training compute even into 2029 is either lies or a mistakewe’ll get very interesting times quite soon
offense-defense balances and multi-agent-system dynamics seem like good research directions, if you can research fast and have reason to believe your research will be implemented in a useful way
EDIT: I no longer fully endorse the crossed-out bullet point. Details in replies to this comment.
Disagree on pursuit of compute being a mistake in one of those worlds but not the other. Either way you are going to want as much inference as possible during key strategic moments.
This seems even more critically important if you are worried your competitors will have algorithms nearly as good as yours.
[Edit: roon posted the same thought on xitter the next day https://x.com/tszzl/status/1883076766232936730
roon @tszzl
if the frontier models are commoditized, compute concentration matters even more
if you can train better models for fewer flops, compute concentration matters even more
compute is the primary means of production of the future and owning more will always be good
12:57 AM · Jan 25, 2025 roon @tszzl
imo, open source models are a bit of a red herring on the path to acceptable asi futures. free model weights still don’t distribute power to all of humanity, they distribute it to the compute rich
https://x.com/MikePFrank/status/1882999933126721617
Michael P. Frank @MikePFrank
Since R1 came out, people are talking like the massive compute farms deployed by Western labs are a waste, BUT THEY’RE NOT — don’t you see? This just means that once the best of DeepSeek’s clever cocktail of new methods are adopted by GPU-rich orgs, they’ll reach ASI even faster. ]
Agreed. However, in the fast world the game is extremely likely to end before you get to use 2029 compute.
EDIT: I’d be very interested to hear an argument against this proposition, though.
I don’t know if the plan is to have the compute from Stargate become available in incremental stages, or all at once in 2029.
I expect timelines are shorter than that, but I’m not certain. If I were in OpenAI’s shoes, I’d want to hedge my bets. 2026 seems plausible. So does 2032. My peak expectation is sometime in 2027, but I wouldn’t want to go all-in on that.
I am almost totally positive that the plan is not that.
If planning for 2029 is cheap, then it probably makes sense under a very broad class of timelines expectations.
If it is expensive, then the following applies to the hypothetical presented by the tweet:
The timeline evoked in the tweet seems extremely fast and multipolar. I’d expect planning for 2029 compute scaling to make sense only if the current paradigm gets stuck at ~AGI capabilities level (ie a very good scaffolding for a model similar to but a bit smarter than o3). This is because if it scales further than that it will do so fast (requiring little compute, as the tweet suggests). If capabilities arbitrarily better than o4-with-good-scaffolding are compute-cheap to develop, then things almost certainly get very unpredictable before 2029.