Musings on Reported Cost of Compute (Oct 2025)
There are many ways in which costs of compute get reported. A 1 GW datacenter site costs $10-15bn in the infrastructure (buildings, cooling, power), plus $30-35bn in compute hardware (servers, networking, labor), assuming Nvidia GPUs. Useful life of the infrastructure is about 10-15 years, and with debt financing a developer only needs to ensure it’s paid off over those 10-15 years, which comes out at $1-2bn per year. For the compute hardware, the useful life is taken as about 5 years, which gives $6-7bn per year. Operational expenses (electricity, maintenance) are about $2.0-2.5bn per year.
In total, 1 GW of compute costs about $9-11bn per year, but whoever paid the compute hardware capex needs to ensure the payments continue for 5 years, so a contract for 1 GW of compute will be 5 years long, which makes it a single contract for at least $45-55bn, which might become $55-65bn to allow a profit for the cloud provider.
Thus 1 GW of compute could get reported as $10bn (infrastructure capex; or alternatively the compute costs for the AI company in a calendar year), as $30bn (compute hardware capex without labor costs), as $45bn (infrastructure plus compute hardware capex), or as $60bn (the total cost of the contract between the AI company and the cloud provider over 5 years).
A $300bn contract then might mean about 5 GW of total capacity, while a $27bn datacenter site could simultaneously mean 2 GW of total capacity. And a $300bn contract doesn’t mean that the 5 GW of capacity will be built immediately, for example if only 2 GW is built initially, that requires the AI company to be capable of paying about $25bn per year (for 5 years), with the other 3 GW being contingent on the AI company’s continuing growth.
Non-Nvidia Hardware
The Nvidia servers (1 GW all-in is about 5500 GB200 NVL72 servers with 72 chips each, or 400K chips in total) take up about $20bn of capex, so if Nvidia’s margin of about 70% applies to this part (it’s probably less, since GPUs are not all of the server), it comes out to $14bn per GW, or $2.8bn per year in a 5-year contract between an AI company and a cloud provider, about 25% of it. This suggests that non-Nvidia compute might only cost up to 25% less, all else equal (which it isn’t), even though GPUs are usually portrayed as the majority of the cost of compute, and Nvidia’s margin as the majority of the cost of the GPUs.
Thus a TPU contract for “tens of billions of dollars” and “over a gigawatt of capacity” admits an interpretation where it’s a ~5-year contract at ~$12bn per year for ~1.2 GW of compute (in total power, not just IT power).
Model Sizes in 2026
If the new contract is for TPUv7 Ironwood, in 2026 Anthropic will have 1 GW of compute with 49 TB of HBM per 256-chip pod. This is comparable to OpenAI’s Abilene site, which is 1 GW of compute with 14-20 TB of HBM per GB200/GB300 NVL72 rack, and will also be ready at this capacity in 2026. Currently Anthropic has access to Trainium 2 Ultra servers of AWS’s Project Rainier with 6 TB of HBM per rack, while OpenAI’s capacity is probably mostly in 8-chip Nvidia servers with 0.64-1.44 TB of HBM per server.
Feasible model size scales with HBM per server/rack/pod that serves as a scale-up world (especially for reasoning models and their training), so 2026 brings not just 10x more compute than 2024 (400K chips in GB200/GB300 NVL72 servers instead of 100K H100s), but also 10x larger models (in total parameter count). As the active params for compute optimal models scale with the square root of compute, this enables more sparsity in MoE models than 2024 compute did, and the number of active params for the larger models will no longer be constrained in practice by the 8-chip Nvidia servers (with 100K H100s, compute optimality asked for about 1T active params, which is almost too much for servers with 1 TB of HBM, and MoE models ask for multiple times more total params).
I don’t understand how energy is still an appropriate unit for measuring compute capacity when there are two different chip paradigms. Do Nvidia cards and Ironwood TPU’s give the exact same performance for the same energy input? What exactly are the differences in capacity to train/deploy models between the 1 GW capacity Anthropic will have and the 1GW OpenAI will have? I looked into this a bit and it seems like TPU’s are explicitly designed for inference only, is that accurate? I feel like compiling this kind of information somewhere would be a good idea since its all rather opaque, technical, and obfuscated by press releases that seek to push a “look at our awesome 11 figure chip deal” narrative rather than provide actual transparency about capacity.
I think it’s a fair assumption that they are close enough. If they weren’t, why on Earth would someone still be using whichever happened to be the vastly more inefficient option?