Musings on Reported Cost of Compute (Oct 2025)

There are many ways in which costs of compute get reported. A 1 GW datacenter site costs $10-15bn in the infrastructure (buildings, cooling, power), plus $30-35bn in compute hardware (servers, networking, labor), assuming Nvidia GPUs. Useful life of the infrastructure is about 10-15 years, and with debt financing a developer only needs to ensure it’s paid off over those 10-15 years, which comes out at $1-2bn per year. For the compute hardware, the useful life is taken as about 5 years, which gives $6-7bn per year. Operational expenses (electricity, maintenance) are about $2.0-2.5bn per year.

In total, 1 GW of compute costs about $9-11bn per year, but whoever paid the compute hardware capex needs to ensure the payments continue for 5 years, so a contract for 1 GW of compute will be 5 years long, which makes it a single contract for at least $45-55bn, which might become $55-65bn to allow a profit for the cloud provider.

Thus 1 GW of compute could get reported as $10bn (infrastructure capex; or alternatively the compute costs for the AI company in a calendar year), as $30bn (compute hardware capex without labor costs), as $45bn (infrastructure plus compute hardware capex), or as $60bn (the total cost of the contract between the AI company and the cloud provider over 5 years).

A $300bn contract then might mean about 5 GW of total capacity, while a $27bn datacenter site could simultaneously mean 2 GW of total capacity. And a $300bn contract doesn’t mean that the 5 GW of capacity will be built immediately, for example if only 2 GW is built initially, that requires the AI company to be capable of paying about $25bn per year (for 5 years), with the other 3 GW being contingent on the AI company’s continuing growth.

Non-Nvidia Hardware

The Nvidia servers (1 GW all-in is about 5500 GB200 NVL72 servers with 72 chips each, or 400K chips in total) take up about $20bn of capex, so if Nvidia’s margin of about 70% applies to this part (it’s probably less, since GPUs are not all of the server), it comes out to $14bn per GW, or $2.8bn per year in a 5-year contract between an AI company and a cloud provider, about 25% of it. This suggests that non-Nvidia compute might only cost up to 25% less, all else equal (which it isn’t), even though GPUs are usually portrayed as the majority of the cost of compute, and Nvidia’s margin as the majority of the cost of the GPUs.

Thus a TPU contract for “tens of billions of dollars” and “over a gigawatt of capacity” admits an interpretation where it’s a ~5-year contract at ~$12bn per year for ~1.2 GW of compute (in total power, not just IT power).

Model Sizes in 2026

If the new contract is for TPUv7 Ironwood, in 2026 Anthropic will have 1 GW of compute with 49 TB of HBM per 256-chip pod. This is comparable to OpenAI’s Abilene site, which is 1 GW of compute with 14-20 TB of HBM per GB200/GB300 NVL72 rack, and will also be ready at this capacity in 2026. Currently Anthropic has access to Trainium 2 Ultra servers of AWS’s Project Rainier with 6 TB of HBM per rack, while OpenAI’s capacity is probably mostly in 8-chip Nvidia servers with 0.64-1.44 TB of HBM per server.

Feasible model size scales with HBM per server/rack/pod that serves as a scale-up world (especially for reasoning models and their training), so 2026 brings not just 10x more compute than 2024 (400K chips in GB200/GB300 NVL72 servers instead of 100K H100s), but also 10x larger models (in total parameter count). As the active params for compute optimal models scale with the square root of compute, this enables more sparsity in MoE models than 2024 compute did, and the number of active params for the larger models will no longer be constrained in practice by the 8-chip Nvidia servers (with 100K H100s, compute optimality asked for about 1T active params, which is almost too much for servers with 1 TB of HBM, and MoE models ask for multiple times more total params).