Vladimir_Nesov comments on Musings on Reported Cost of Compute (Oct 2025)

Vladimir_Nesov 25 Oct 2025 22:16 UTC
3 points
0
The Anthropic announcement says “up to one million TPUs”, and the Ironwood announcement claims 4.6e15 FP8 FLOP/s per chip. A 2-die GB200 chip produces 5e15 dense FP8 FLOP/s, and there are about 400K chips in the 1 GW phase of the Abilene system.

Thus if the Anthropic contract is for TPUv7 Ironwood, their 1 GW system will have about 2x the FLOP/s of the Abilene 1 GW system (probably because Ironwood is 3nm, while Blackwell is 4nm, which is a minor refinement of 5nm). Though it’s not clear that the Anthropic contract is for one system, unlike the case with Abilene, that is datacenters with sufficient bandwidth between them. But Google had a lot of time to set up inter-datacenter networking, so this is plausible even for collections of somewhat distant datacenter buildings. If this isn’t the case, then it’s only good for RLVR and inference, not for the largest pretraining runs.

The reason things like this could happen is that OpenAI needed to give the go-ahead for the Abilene system in 2024, when securing a 1 GW Ironwood system from Google plausibly wasn’t in the cards, and in any case they wouldn’t want to depend on Google too much, because GDM is a competitor (and the Microsoft relationship was already souring). On the other hand, Anthropic still has enough AWS backing to make some dependence on Google less crucial, and they only needed to learn recently about the feasibility of a 1 GW system from Google. Perhaps OpenAI will be getting a 1-2 GW system from Google as well at some point, but then Nvidia Rubin (not to mention Rubin Ultra) is not necessarily worse than Google’s next thing.