Trainium is mostly a joke, but I do agree about the Chinese firms moving away from Nvidia dependence. They will also likely have sufficient capital, but will ultimately lack data (though may be able to make up for it with the insane talent they have? If timelines end up long I can easily see China pulling ahead simply due to their prior focus on education and talent paying off long-term)
I think it can help AWS with price-performance for the narrow goal of giant pretraining runs, where the capex on training systems might be the primary constraint on scaling soon. For reasoning training (if it does scale), building a single training system is less relevant, the usual geographically distributed inference buildout that hyperscalers are doing anyway would be about as suitable. And the 400K chip Rainier system indicates that it works well enough to ramp (serving as a datapoint in addition to on-paper specification).
Chinese firms … will ultimately lack data
I don’t think there is a meaningful distinction for data, all natural text data is running out anyway around 2027-2029 due to data inefficiency of MoE. No secret stashes at Google or Meta are going to substantially help, since even 10T-100T tokens won’t be changing the game.
Trainium is mostly a joke, but I do agree about the Chinese firms moving away from Nvidia dependence. They will also likely have sufficient capital, but will ultimately lack data (though may be able to make up for it with the insane talent they have? If timelines end up long I can easily see China pulling ahead simply due to their prior focus on education and talent paying off long-term)
I think it can help AWS with price-performance for the narrow goal of giant pretraining runs, where the capex on training systems might be the primary constraint on scaling soon. For reasoning training (if it does scale), building a single training system is less relevant, the usual geographically distributed inference buildout that hyperscalers are doing anyway would be about as suitable. And the 400K chip Rainier system indicates that it works well enough to ramp (serving as a datapoint in addition to on-paper specification).
I don’t think there is a meaningful distinction for data, all natural text data is running out anyway around 2027-2029 due to data inefficiency of MoE. No secret stashes at Google or Meta are going to substantially help, since even 10T-100T tokens won’t be changing the game.
You’re right about text, but Google has privileged access to YouTube (A significant % of all video ever recorded by humans)