I think it can help AWS with price-performance for the narrow goal of giant pretraining runs, where the capex on training systems might be the primary constraint on scaling soon. For reasoning training (if it does scale), building a single training system is less relevant, the usual geographically distributed inference buildout that hyperscalers are doing anyway would be about as suitable. And the 400K chip Rainier system indicates that it works well enough to ramp (serving as a datapoint in addition to on-paper specification).
Chinese firms … will ultimately lack data
I don’t think there is a meaningful distinction for data, all natural text data is running out anyway around 2027-2029 due to data inefficiency of MoE. No secret stashes at Google or Meta are going to substantially help, since even 10T-100T tokens won’t be changing the game.
I think it can help AWS with price-performance for the narrow goal of giant pretraining runs, where the capex on training systems might be the primary constraint on scaling soon. For reasoning training (if it does scale), building a single training system is less relevant, the usual geographically distributed inference buildout that hyperscalers are doing anyway would be about as suitable. And the 400K chip Rainier system indicates that it works well enough to ramp (serving as a datapoint in addition to on-paper specification).
I don’t think there is a meaningful distinction for data, all natural text data is running out anyway around 2027-2029 due to data inefficiency of MoE. No secret stashes at Google or Meta are going to substantially help, since even 10T-100T tokens won’t be changing the game.
You’re right about text, but Google has privileged access to YouTube (A significant % of all video ever recorded by humans)