Non-US Non-Chinese AI companies need a currently-implausible level of funding to keep scaling at the frontier level, while the Chinese AI companies need large scale-up systems that aren’t being sold to them. It wouldn’t matter if they have enough FLOP/s in total in the form of servers that can’t fit large models, H100/H200/B200 (NVL8) rather than GB200/GB300 Oberon (NVL72) or future Kyber racks. It remains unclear when Huawei will actually start producing their large scale-up systems (CloudMatrix/Atlas) in important quantities, let alone in quantities that would match the frontier US compute, while maintaining a similar level of system-level performance (for individual scale-up systems needed to run large models).
It is also unclear whether Chinese labs, or American open-source labs like Reflection and Arcee, can come up with algorithmic innovations to leapfrog the closed-source frontier.
You can’t sustainably leapfrog the closed-source frontier with algorithmic innovations, because the closed-source frontier will leapfrog you back with the same algorithmic innovations, only it’ll scale those innovations with more compute and to larger models, resulting in even higher quality.
Non-US Non-Chinese AI companies need a currently-implausible level of funding to keep scaling at the frontier level, while the Chinese AI companies need large scale-up systems that aren’t being sold to them. It wouldn’t matter if they have enough FLOP/s in total in the form of servers that can’t fit large models, H100/H200/B200 (NVL8) rather than GB200/GB300 Oberon (NVL72) or future Kyber racks. It remains unclear when Huawei will actually start producing their large scale-up systems (CloudMatrix/Atlas) in important quantities, let alone in quantities that would match the frontier US compute, while maintaining a similar level of system-level performance (for individual scale-up systems needed to run large models).
So far, the Chinese models remain at a roughly 1T total params scale, which fits in the older 8-chip servers. But the 2025-2026 Oberon racks enable 20-25T total param models, then 2028 Kyber racks enable 400T total param models, and 8x Kyber Feynman systems of 2029-2030 enable 1-3 quadrillion param models. These models absolutely can’t run on the 8-chip servers currently used to run the 1T total param models, no matter how many such servers you have. My guess is that it’s actually useful to scale models almost as far as that hardware allows, and it’s not as expensive as it sounds once the large scale-up systems are available.
You can’t sustainably leapfrog the closed-source frontier with algorithmic innovations, because the closed-source frontier will leapfrog you back with the same algorithmic innovations, only it’ll scale those innovations with more compute and to larger models, resulting in even higher quality.