Jim Fan
@DrJimFan
Whether you like it or not, the future of AI will not be canned genies controlled by a “safety panel”. The future of AI is democratization. Every internet rando will run not just o1, but o8, o9 on their toaster laptop. It’s the tide of history that we should surf on, not swim against. Might as well start preparing now.
DeepSeek just topped Chatbot Arena, my go-to vibe checker in the wild, and two other independent benchmarks that couldn’t be hacked in advance (Artificial-Analysis, HLE).
Last year, there were serious discussions about limiting OSS models by some compute threshold. Turns out it was nothing but our Silicon Valley hubris. It’s a humbling wake-up call to us all that open science has no boundary. We need to embrace it, one way or another.
Many tech folks are panicking about how much DeepSeek is able to show with so little compute budget. I see it differently—with a huge smile on my face. Why are we not happy to see improvements in the scaling law? DeepSeek is unequivocal proof that one can produce unit intelligence gain at 10x less cost, which means we shall get 10x more powerful AI with the compute we have today and are building tomorrow. Simple math! The AI timeline just got compressed.
Here’s my 2025 New Year resolution for the community:
No more AGI/ASI urban myth spreading.
No more fearmongering.
Put our heads down and grind on code.
Open source, as much as you can.
IF we got and will keep on having strong scaling law improvements, then:
openai’s plan to continue to acquire way more training compute even into 2029 is either lies or a mistake
we’ll get very interesting times quite soon
offense-defense balances and multi-agent-system dynamics seem like good research directions, if you can research fast and have reason to believe your research will be implemented in a useful way
EDIT: I no longer fully endorse the crossed-out bullet point. Details in replies to this comment.
Disagree on pursuit of compute being a mistake in one of those worlds but not the other. Either way you are going to want as much inference as possible during key strategic moments.
This seems even more critically important if you are worried your competitors will have algorithms nearly as good as yours.
if the frontier models are commoditized, compute concentration matters even more
if you can train better models for fewer flops, compute concentration matters even more
compute is the primary means of production of the future and owning more will always be good
12:57 AM · Jan 25, 2025
roon
@tszzl
imo, open source models are a bit of a red herring on the path to acceptable asi futures. free model weights still don’t distribute power to all of humanity, they distribute it to the compute rich
Since R1 came out, people are talking like the massive compute farms deployed by Western labs are a waste, BUT THEY’RE NOT — don’t you see? This just means that once the best of DeepSeek’s clever cocktail of new methods are adopted by GPU-rich orgs, they’ll reach ASI even faster.
]
Agreed. However, in the fast world the game is extremely likely to end before you get to use 2029 compute. EDIT: I’d be very interested to hear an argument against this proposition, though.
I don’t know if the plan is to have the compute from Stargate become available in incremental stages, or all at once in 2029.
I expect timelines are shorter than that, but I’m not certain. If I were in OpenAI’s shoes, I’d want to hedge my bets. 2026 seems plausible. So does 2032. My peak expectation is sometime in 2027, but I wouldn’t want to go all-in on that.
I am almost totally positive that the plan is not that.
If planning for 2029 is cheap, then it probably makes sense under a very broad class of timelines expectations. If it is expensive, then the following applies to the hypothetical presented by the tweet:
The timeline evoked in the tweet seems extremely fast and multipolar. I’d expect planning for 2029 compute scaling to make sense only if the current paradigm gets stuck at ~AGI capabilities level (ie a very good scaffolding for a model similar to but a bit smarter than o3). This is because if it scales further than that it will do so fast (requiring little compute, as the tweet suggests). If capabilities arbitrarily better than o4-with-good-scaffolding are compute-cheap to develop, then things almost certainly get very unpredictable before 2029.
Bringing in a quote from Twitter/x: (Not my viewpoint, just trying to broaden the discussion.)
https://x.com/DrJimFan/status/1882799254957388010
Jim Fan @DrJimFan Whether you like it or not, the future of AI will not be canned genies controlled by a “safety panel”. The future of AI is democratization. Every internet rando will run not just o1, but o8, o9 on their toaster laptop. It’s the tide of history that we should surf on, not swim against. Might as well start preparing now.
DeepSeek just topped Chatbot Arena, my go-to vibe checker in the wild, and two other independent benchmarks that couldn’t be hacked in advance (Artificial-Analysis, HLE).
Last year, there were serious discussions about limiting OSS models by some compute threshold. Turns out it was nothing but our Silicon Valley hubris. It’s a humbling wake-up call to us all that open science has no boundary. We need to embrace it, one way or another.
Many tech folks are panicking about how much DeepSeek is able to show with so little compute budget. I see it differently—with a huge smile on my face. Why are we not happy to see improvements in the scaling law? DeepSeek is unequivocal proof that one can produce unit intelligence gain at 10x less cost, which means we shall get 10x more powerful AI with the compute we have today and are building tomorrow. Simple math! The AI timeline just got compressed.
Here’s my 2025 New Year resolution for the community:
No more AGI/ASI urban myth spreading. No more fearmongering. Put our heads down and grind on code. Open source, as much as you can.
Acceleration is the only way forward.
context: @DrJimFan works at nvidia
IF we got and will keep on having strong scaling law improvements, then:
openai’s plan to continue to acquire way more training compute even into 2029 is either lies or a mistakewe’ll get very interesting times quite soon
offense-defense balances and multi-agent-system dynamics seem like good research directions, if you can research fast and have reason to believe your research will be implemented in a useful way
EDIT: I no longer fully endorse the crossed-out bullet point. Details in replies to this comment.
Disagree on pursuit of compute being a mistake in one of those worlds but not the other. Either way you are going to want as much inference as possible during key strategic moments.
This seems even more critically important if you are worried your competitors will have algorithms nearly as good as yours.
[Edit: roon posted the same thought on xitter the next day https://x.com/tszzl/status/1883076766232936730
roon @tszzl
if the frontier models are commoditized, compute concentration matters even more
if you can train better models for fewer flops, compute concentration matters even more
compute is the primary means of production of the future and owning more will always be good
12:57 AM · Jan 25, 2025 roon @tszzl
imo, open source models are a bit of a red herring on the path to acceptable asi futures. free model weights still don’t distribute power to all of humanity, they distribute it to the compute rich
https://x.com/MikePFrank/status/1882999933126721617
Michael P. Frank @MikePFrank
Since R1 came out, people are talking like the massive compute farms deployed by Western labs are a waste, BUT THEY’RE NOT — don’t you see? This just means that once the best of DeepSeek’s clever cocktail of new methods are adopted by GPU-rich orgs, they’ll reach ASI even faster. ]
Agreed. However, in the fast world the game is extremely likely to end before you get to use 2029 compute.
EDIT: I’d be very interested to hear an argument against this proposition, though.
I don’t know if the plan is to have the compute from Stargate become available in incremental stages, or all at once in 2029.
I expect timelines are shorter than that, but I’m not certain. If I were in OpenAI’s shoes, I’d want to hedge my bets. 2026 seems plausible. So does 2032. My peak expectation is sometime in 2027, but I wouldn’t want to go all-in on that.
I am almost totally positive that the plan is not that.
If planning for 2029 is cheap, then it probably makes sense under a very broad class of timelines expectations.
If it is expensive, then the following applies to the hypothetical presented by the tweet:
The timeline evoked in the tweet seems extremely fast and multipolar. I’d expect planning for 2029 compute scaling to make sense only if the current paradigm gets stuck at ~AGI capabilities level (ie a very good scaffolding for a model similar to but a bit smarter than o3). This is because if it scales further than that it will do so fast (requiring little compute, as the tweet suggests). If capabilities arbitrarily better than o4-with-good-scaffolding are compute-cheap to develop, then things almost certainly get very unpredictable before 2029.