Hot (?) take, the USG shooting itself in the foot as it pertains to AI is good actually and we should not be risking interrupting them.
Like, okay, there are different ways the USG could shoot itself in the foot:
Something that doesn’t slow the US down but speeds up others. (E. g., OpenAI open-sourcing its AI.)
Something that slows down the US but speeds up others (warning away from using Huawai chips, selling chips off, discouraging immigration, misplacing data centers).
Something that just slow the US down (screwing up nuclear-energy buildout).
Something that slows everyone down.
(1) is obviously bad. But (3) and (4) are great[1].
And (2) is also potentially good: because the others would use the resources worse or won’t use them for AGI acceleration at all.
Like, take China. Its mindset is famously of a “fast follower”, with dicey attempts at innovation being internally unpopular[2]; and the Chinese AI researchers probably know as much about the world-dominantion fantasies motivating American CEOs as they do about the AGI doom (i. e., barely anything, reportedly). So there’s neither willingness nor motivation to race to AGI there. … Unless the US AGI labs succeed at manufacturing that race. Which, come to think of it, they would stop trying to do if they start believing they’d lose it.
So the US AGI labs losing beneficial access to the raw resources that could be converted into AI progress (chips, energy, talent) is good in my books.
There’s a potential argument here that the US AGI companies would be better to have ahead because they’re more likely to get AGI right due to being more safety-conscious, or offer us more opportunities to reform them into being properly safety-conscious. I don’t think much of that argument. I would rather have e. g. 3 more years until AGI than bet on fringe possibilities like those.
There’s also an argument that keeping the raw resources in Western hands would make it easier to ban AGI research (by controlling supply chains and/or negotiating an international ban with China) if we do manage to wake the USG up to the omnicide risk. This is a more solid argument… But still something I’d trade away for timelines a few years longer.
As they pertain to slowing down AI progress, I mean. Obviously they can be parts of overall-terrible-for-the-world policies like tariffs or restricting immigration.
I disagree on DeekSeek and innovation. Yes R1 is obviously a reaction to o1, but its MoE model is pretty innovative, and it is Llama 4 that obviously copied DeepSeek. But yes I agree innovation is unpopular in China. But from interviews of DeepSeek founder Liang Wenfeng, we know DeepSeek was explicitly an attempt to overcome China’s unwillingness to innovate.
DeepSeek-V3′s MoE architecture is unusual in having high granularity, 8 active experts rather than the usual 1-2. Llama 4 Maverick doesn’t do that[1]. The closest thing is the recent Qwen3-235B-A22B, which also has 8 active experts.
As an example, Llama 4 Maverick models have 17B active parameters and 400B total parameters. … MoE layers use 128 routed experts and a shared expert. Each token is sent to the shared expert and also to one of the 128 routed experts.
I would roughly punt it into the category of “optimization”, not “innovation”. “Innovation” is something like transformers, instruct-training, or RL-on-CoTs. MoE scaling is an incremental-ish improvement.
Or, to put it in other words: it’s an innovation in the field of compute-optimal algorithms/machine learning. It’s not an AI innovation.
But from interviews of DeepSeek founder Liang Wenfeng, we know DeepSeek was explicitly an attempt to overcome China’s unwillingness to innovate
Yes, and we’re yet to see them succeed. And with the CCP having apparently turned its sights on them, that attempt may be thoroughly murdered already.
Hot (?) take, the USG shooting itself in the foot as it pertains to AI is good actually and we should not be risking interrupting them.
Like, okay, there are different ways the USG could shoot itself in the foot:
Something that doesn’t slow the US down but speeds up others. (E. g., OpenAI open-sourcing its AI.)
Something that slows down the US but speeds up others (warning away from using Huawai chips, selling chips off, discouraging immigration, misplacing data centers).
Something that just slow the US down (screwing up nuclear-energy buildout).
Something that slows everyone down.
(1) is obviously bad. But (3) and (4) are great[1].
And (2) is also potentially good: because the others would use the resources worse or won’t use them for AGI acceleration at all.
Like, take China. Its mindset is famously of a “fast follower”, with dicey attempts at innovation being internally unpopular[2]; and the Chinese AI researchers probably know as much about the world-dominantion fantasies motivating American CEOs as they do about the AGI doom (i. e., barely anything, reportedly). So there’s neither willingness nor motivation to race to AGI there. … Unless the US AGI labs succeed at manufacturing that race. Which, come to think of it, they would stop trying to do if they start believing they’d lose it.
So the US AGI labs losing beneficial access to the raw resources that could be converted into AI progress (chips, energy, talent) is good in my books.
There’s a potential argument here that the US AGI companies would be better to have ahead because they’re more likely to get AGI right due to being more safety-conscious, or offer us more opportunities to reform them into being properly safety-conscious. I don’t think much of that argument. I would rather have e. g. 3 more years until AGI than bet on fringe possibilities like those.
There’s also an argument that keeping the raw resources in Western hands would make it easier to ban AGI research (by controlling supply chains and/or negotiating an international ban with China) if we do manage to wake the USG up to the omnicide risk. This is a more solid argument… But still something I’d trade away for timelines a few years longer.
As they pertain to slowing down AI progress, I mean. Obviously they can be parts of overall-terrible-for-the-world policies like tariffs or restricting immigration.
DeepSeek is not evidence against this vision, but rather, its confirmation: they did not innovate, only reverse-engineered and optimized.
I disagree on DeekSeek and innovation. Yes R1 is obviously a reaction to o1, but its MoE model is pretty innovative, and it is Llama 4 that obviously copied DeepSeek. But yes I agree innovation is unpopular in China. But from interviews of DeepSeek founder Liang Wenfeng, we know DeepSeek was explicitly an attempt to overcome China’s unwillingness to innovate.
DeepSeek-V3′s MoE architecture is unusual in having high granularity, 8 active experts rather than the usual 1-2. Llama 4 Maverick doesn’t do that[1]. The closest thing is the recent Qwen3-235B-A22B, which also has 8 active experts.
From the release blog post:
I would roughly punt it into the category of “optimization”, not “innovation”. “Innovation” is something like transformers, instruct-training, or RL-on-CoTs. MoE scaling is an incremental-ish improvement.
Or, to put it in other words: it’s an innovation in the field of compute-optimal algorithms/machine learning. It’s not an AI innovation.
Yes, and we’re yet to see them succeed. And with the CCP having apparently turned its sights on them, that attempt may be thoroughly murdered already.