Once a model is trained, it needs to be served. If both xAI and OpenAI have their ~6T total param models already trained in Jan 2026, xAI will have enough NVL72 systems to serve the model to all its users at a reasonable speed and price (and so they will), while OpenAI won’t have that option at all (without restricting demand somehow, probably with rate limits or higher prices).
A meaningful fraction of the buildout of a given system is usually online several months to a year before the bulk of it, that’s when the first news about cloud access appear. If new inference hardware makes more total params practical than previous hardware, and scaling of hardware amount (between hardware generations) is still underway, then even a fraction of the new hardware buildout will be comparable to the bulk of the old hardware (which is busy anyway) in FLOPs and training steps that RL can get out of it, adjusting for better efficiency. And slow rollouts for RL (on old hardware) increase batch sizes and decrease the total number of training steps that fit into a few months, this could also be important.
serving to users might not be that important of a dimension
What new hardware becomes available before the bulk of it is not uniquely useful for serving to users, because there isn’t enough of it to serve a flagship model with more total params to all users. So it does seem to make sense to use it for RL training of a new model that will use the bulk of the same hardware for serving the model to users once enough of it is built.
Once a model is trained, it needs to be served. If both xAI and OpenAI have their ~6T total param models already trained in Jan 2026, xAI will have enough NVL72 systems to serve the model to all its users at a reasonable speed and price (and so they will), while OpenAI won’t have that option at all (without restricting demand somehow, probably with rate limits or higher prices).
A meaningful fraction of the buildout of a given system is usually online several months to a year before the bulk of it, that’s when the first news about cloud access appear. If new inference hardware makes more total params practical than previous hardware, and scaling of hardware amount (between hardware generations) is still underway, then even a fraction of the new hardware buildout will be comparable to the bulk of the old hardware (which is busy anyway) in FLOPs and training steps that RL can get out of it, adjusting for better efficiency. And slow rollouts for RL (on old hardware) increase batch sizes and decrease the total number of training steps that fit into a few months, this could also be important.
What new hardware becomes available before the bulk of it is not uniquely useful for serving to users, because there isn’t enough of it to serve a flagship model with more total params to all users. So it does seem to make sense to use it for RL training of a new model that will use the bulk of the same hardware for serving the model to users once enough of it is built.