Google TPUs have been competitive with Nvidia GPUs for years, but Google did not sell its TPUs, instead only renting them out via GCP, until very recently as it is now starting to actually sell them too.
Other GPUs and custom silicon like Trainium are used for inference these days, but training is almost exclusively done with Nvidia GPUs and Google TPUs. It’s pretty easy to take a trained model and make it possible to do inference with it using different chips, as we see for example with open-weight models being used on Apple silicon, and DeepSeek models being served on Huawei Ascend GPUs.
I still expect the majority of inference being done today to be on Nvidia GPUs, a notable portion on TPUs, and then some non-negligible amount on other chips. (I haven’t actually estimated this though.) Very roughly, I think in 1-3 years Nvidia will have a lot of competition for inference compute, though maybe not that much competition for training compute apart from TPUs, since CUDA is somewhat of a hard-to-overcome moat.
Google TPUs have been competitive with Nvidia GPUs for years, but Google did not sell its TPUs, instead only renting them out via GCP, until very recently as it is now starting to actually sell them too.
Other GPUs and custom silicon like Trainium are used for inference these days, but training is almost exclusively done with Nvidia GPUs and Google TPUs. It’s pretty easy to take a trained model and make it possible to do inference with it using different chips, as we see for example with open-weight models being used on Apple silicon, and DeepSeek models being served on Huawei Ascend GPUs.
I still expect the majority of inference being done today to be on Nvidia GPUs, a notable portion on TPUs, and then some non-negligible amount on other chips. (I haven’t actually estimated this though.) Very roughly, I think in 1-3 years Nvidia will have a lot of competition for inference compute, though maybe not that much competition for training compute apart from TPUs, since CUDA is somewhat of a hard-to-overcome moat.