Building frontier AI datacenters costs significantly more than their servers and networking. The buildings and the power aren’t a minor cost because older infrastructure mostly can’t be reused, similarly to how a training system needs to be built before we can talk about the much lower cost of 4 months of its time.
Apparently Crusoe’s part in the Stargate Abilene datacenters is worth $15bn, which is only the buildings, power (substations and gas generators), and cooling, but not the servers and networking (Oracle is taking care of that). With 400K chips in GB200 NVL72 racks (which is 5.6K racks), at maybe $4M per rack or $5M per rack together with external-to-racks networking[1] ($70K per chip all-in on compute hardware), that’s about $27bn, a figure that’s comparable to the $15bn for the non-compute parts of the datacenters.
This makes the funding burden significantly higher ($7.5M per rack or $105K per chip), so that the Stargate Abilene site alone would cost about $40-45bn and not only $25-30bn. I’m guessing the buildings and the power infrastructure are not usually counted because they last a long time, so the relatively small time cost of using them (such as paying for electricity, not for building power plants) becomes somewhat insignificant compared to the cost of compute hardware, which also needs to be refreshed more frequently. But the new datacenters have a much higher power density (power and cooling requirements per rack), so can’t use a lot of the existing long-lived infrastructure, and it becomes necessary to build it at the same time, securing enough funding not only for the unprecedented amount of compute hardware, but also simultaneously for all the rest.
The implications for compute scaling slowdown timeline (no AGI and merely $2-4 trillion AI companies) is that funding constraints would result in about 30% less compute in the short term (2025-2030), but as power requirements stop growing and the buildings/cooling/power part again becomes only a small fraction of the overall cost of refreshing the compute hardware, the feasible amount of compute will gradually fill in those 30% back in the medium term (perhaps 2030-2035), leaving the longer term projections (2035-2045) unchanged (meaning ~2000x of scaling in 2029-2045, on top of the current much faster funding-fueled ~2000x of scaling in 2022-2028).
Anchoring to the reference design for a 1024-chip HGX H100 system, where the 8-chip servers are priced at $33.8K per chip, while external-to-servers networking is $8.2K per chip, or about 25% on top of the price of servers.
I found this analysis refreshing and would like to see more on the GPU depreciation costs.
If better GPUs are developed, these will go down in value quickly. Perhaps by 25% to 50% per year. This seems like a really tough expense and supply chain to manage.
I’d expect most of the other infrastructure costs to depreciate much more slowly, as you mention.
This means that straightforward comparison of flops-per-USD between home computer GPU cards and data center flops-per-USD is incorrect. If someone already has a GPU card, they already have a computer and house where this computer stays “for free.” But if someone needs to scale, they have to pay for housing and mainframes.
Such comparisons of old 2010s GPUs with more modern ones are used to show the slow rate of hardware advances, but they don’t take into account the hidden costs of owning older GPUs.
Building frontier AI datacenters costs significantly more than their servers and networking. The buildings and the power aren’t a minor cost because older infrastructure mostly can’t be reused, similarly to how a training system needs to be built before we can talk about the much lower cost of 4 months of its time.
Apparently Crusoe’s part in the Stargate Abilene datacenters is worth $15bn, which is only the buildings, power (substations and gas generators), and cooling, but not the servers and networking (Oracle is taking care of that). With 400K chips in GB200 NVL72 racks (which is 5.6K racks), at maybe $4M per rack or $5M per rack together with external-to-racks networking[1] ($70K per chip all-in on compute hardware), that’s about $27bn, a figure that’s comparable to the $15bn for the non-compute parts of the datacenters.
This makes the funding burden significantly higher ($7.5M per rack or $105K per chip), so that the Stargate Abilene site alone would cost about $40-45bn and not only $25-30bn. I’m guessing the buildings and the power infrastructure are not usually counted because they last a long time, so the relatively small time cost of using them (such as paying for electricity, not for building power plants) becomes somewhat insignificant compared to the cost of compute hardware, which also needs to be refreshed more frequently. But the new datacenters have a much higher power density (power and cooling requirements per rack), so can’t use a lot of the existing long-lived infrastructure, and it becomes necessary to build it at the same time, securing enough funding not only for the unprecedented amount of compute hardware, but also simultaneously for all the rest.
The implications for compute scaling slowdown timeline (no AGI and merely $2-4 trillion AI companies) is that funding constraints would result in about 30% less compute in the short term (2025-2030), but as power requirements stop growing and the buildings/cooling/power part again becomes only a small fraction of the overall cost of refreshing the compute hardware, the feasible amount of compute will gradually fill in those 30% back in the medium term (perhaps 2030-2035), leaving the longer term projections (2035-2045) unchanged (meaning ~2000x of scaling in 2029-2045, on top of the current much faster funding-fueled ~2000x of scaling in 2022-2028).
Anchoring to the reference design for a 1024-chip HGX H100 system, where the 8-chip servers are priced at $33.8K per chip, while external-to-servers networking is $8.2K per chip, or about 25% on top of the price of servers.
I found this analysis refreshing and would like to see more on the GPU depreciation costs.
If better GPUs are developed, these will go down in value quickly. Perhaps by 25% to 50% per year. This seems like a really tough expense and supply chain to manage.
I’d expect most of the other infrastructure costs to depreciate much more slowly, as you mention.
Why does the building cost so much? Is this more than other buildings of similar size?
This means that straightforward comparison of flops-per-USD between home computer GPU cards and data center flops-per-USD is incorrect. If someone already has a GPU card, they already have a computer and house where this computer stays “for free.” But if someone needs to scale, they have to pay for housing and mainframes.
Such comparisons of old 2010s GPUs with more modern ones are used to show the slow rate of hardware advances, but they don’t take into account the hidden costs of owning older GPUs.