The 50M H100 equivalent compute by 2030 figure tweeted by Musk is on trend (assuming a 2028 slowdown), might cost about $300bn in total (for the training systems built in 2025-2030 for one AI company, including the buildings and power infrastructure).
If the current trend of compute scaling continues to 2028, there will be 160x more compute per training system than the 100K H100s of 2024. It will require 5 GW of power and cost about $140bn in compute hardware and an additional $60bn in buildings, power, and cooling infrastructure[1].
However, if the slowdown starts earlier while still targeting an eventual spend of $100bn per year, and a 5 GW frontier AI training system isn’t yet built in 2028-2029 (which seems plausible), building it in 2030 would use the next generation of compute hardware, which will be about 2x more performant for an approximately unchanged cost. This means 320x more compute than the 100K H100s systems of 2024, or 32M H100 equivalent compute. If we sum it up with the preceding generations of frontier AI training systems built for the same company, say 2 GW in 2028 and 1 GW in 2026, this gives us 40M H100 equivalents, which is the same as 50M given the error bars on these estimates (or we get that directly if the slowdown only starts between 2028 and 2030). Summing up the costs for the older systems as well, we get to about $300bn (or $450bn if a 5 GW system is built in 2028, and then another one in 2030).
Rubin Ultra racks of 2028 are 600 kW per rack, 4.5x up from the current 130 kW per rack, so the total area needed to build a 5 GW training system in 2028 might only be 2x greater than that of the 1 GW training systems from 2026. Between $30bn from building area and $70bn from power is my guess of $60bn.
Rubin Ultra racks of 2028 are 600 kW per rack, 4.5x up from the current 130 kW per rack, so the total area needed to build a 5 GW training system in 2028 might only be 2x greater than that of the 1 GW training systems from 2026. Between $30bn from building area and $70bn from power is my guess of $60bn.
By power, do you mean the cost of electrical equipment etc.? The cost the of energy itself is a relatively small. The average price of electricity in the US is $0.13/kWh, which is $36.11/GJ. So even if you had a 5 GW datacenter running continuously for a year, the energy cost is only $5.7bn.
Power infrastructure that might need to be built is gas generators or power plants, substations, whatever the buildings themselves need. Generators are apparently added even when not on-paper strictly necessary, as backup power. They are also faster to setup than GW-scale grid interconnection, so could be important for these sudden giant factories where nobody is quite sure 4 years in advance that they will be actually built at a given scale.
Datacenter infrastructure friction and cost will probably both smooth out the slowdown and disappear as a funding constraint for AI companies in the years following the slowdown. Compute hardware is rotated every few years, so at some point you don’t need new datacenters and accompanying infrastructure to setup a new generation of compute hardware, you just reuse an existing datacenter site that hosted old hardware. Also, any related datacenters that didn’t have excessive inter-site dark fiber will at some point set it up, so even increasing the scale will be less dependent on having everything at one site. This makes the infrastructure costs a much smaller fraction of the cost of a frontier AI training system, and there will no longer be friction.
The infrastructure or even hardware costs in principle don’t need to be paid by the AI company upfront, but either the market as a whole or the specific AI company (as a tenant) need to sufficiently assure the developer (that builds and owns the non-IT infrastructure) and the cloud provider (that installs and owns compute hardware) to commit to the project. My sense is that the estimates for the cost of a year of GPU-time for frontier compute end up at about a third of the cost of compute hardware. So access to a new $200bn training system that has $140bn worth of compute hardware (which only remains cutting edge for 2 years) will cost the tenant $45bn per year, even though the total capital expenditure is $100bn per year during the initial infrastructure buildout, and in later years after slowdown (when new infrastructure no longer needs to be built as much) it’s still $70bn per year to keep installing the newest hardware somewhere, so that some datacenter site will end up having it available.
Thus a few years after slowdown, we get about 2x more compute supported by the same level of funding (from $100bn per year to $45bn per year for the same compute, or keeping to $100bn per year for 2x the compute). But since 2x in compute corresponds to 2 years of compute hardware price-performance progress, and the relevant anchor is the 2000x of 2022-2028 training compute scale-up, that is just playing with about 2 years in the 2028-2045 period when another 2000x compute scaleup happens, mostly due to increasing price-performance of compute, and a level of growth similar to that of the current tech giants in the past. So not a crucial update.
The 50M H100 equivalent compute by 2030 figure tweeted by Musk is on trend (assuming a 2028 slowdown), might cost about $300bn in total (for the training systems built in 2025-2030 for one AI company, including the buildings and power infrastructure).
If the current trend of compute scaling continues to 2028, there will be 160x more compute per training system than the 100K H100s of 2024. It will require 5 GW of power and cost about $140bn in compute hardware and an additional $60bn in buildings, power, and cooling infrastructure[1].
However, if the slowdown starts earlier while still targeting an eventual spend of $100bn per year, and a 5 GW frontier AI training system isn’t yet built in 2028-2029 (which seems plausible), building it in 2030 would use the next generation of compute hardware, which will be about 2x more performant for an approximately unchanged cost. This means 320x more compute than the 100K H100s systems of 2024, or 32M H100 equivalent compute. If we sum it up with the preceding generations of frontier AI training systems built for the same company, say 2 GW in 2028 and 1 GW in 2026, this gives us 40M H100 equivalents, which is the same as 50M given the error bars on these estimates (or we get that directly if the slowdown only starts between 2028 and 2030). Summing up the costs for the older systems as well, we get to about $300bn (or $450bn if a 5 GW system is built in 2028, and then another one in 2030).
Let’s start with the anchor of $15bn of Stargate Abilene in 2026 for 1.2 GW (which seems consistent in cost per MW with other similar announcements). The power that seems actually necessary for its 400K Blackwell chips together with everything else looks more like 900 MW.
Rubin Ultra racks of 2028 are 600 kW per rack, 4.5x up from the current 130 kW per rack, so the total area needed to build a 5 GW training system in 2028 might only be 2x greater than that of the 1 GW training systems from 2026. Between $30bn from building area and $70bn from power is my guess of $60bn.
By power, do you mean the cost of electrical equipment etc.? The cost the of energy itself is a relatively small. The average price of electricity in the US is $0.13/kWh, which is $36.11/GJ. So even if you had a 5 GW datacenter running continuously for a year, the energy cost is only $5.7bn.
Power infrastructure that might need to be built is gas generators or power plants, substations, whatever the buildings themselves need. Generators are apparently added even when not on-paper strictly necessary, as backup power. They are also faster to setup than GW-scale grid interconnection, so could be important for these sudden giant factories where nobody is quite sure 4 years in advance that they will be actually built at a given scale.
Datacenter infrastructure friction and cost will probably both smooth out the slowdown and disappear as a funding constraint for AI companies in the years following the slowdown. Compute hardware is rotated every few years, so at some point you don’t need new datacenters and accompanying infrastructure to setup a new generation of compute hardware, you just reuse an existing datacenter site that hosted old hardware. Also, any related datacenters that didn’t have excessive inter-site dark fiber will at some point set it up, so even increasing the scale will be less dependent on having everything at one site. This makes the infrastructure costs a much smaller fraction of the cost of a frontier AI training system, and there will no longer be friction.
The infrastructure or even hardware costs in principle don’t need to be paid by the AI company upfront, but either the market as a whole or the specific AI company (as a tenant) need to sufficiently assure the developer (that builds and owns the non-IT infrastructure) and the cloud provider (that installs and owns compute hardware) to commit to the project. My sense is that the estimates for the cost of a year of GPU-time for frontier compute end up at about a third of the cost of compute hardware. So access to a new $200bn training system that has $140bn worth of compute hardware (which only remains cutting edge for 2 years) will cost the tenant $45bn per year, even though the total capital expenditure is $100bn per year during the initial infrastructure buildout, and in later years after slowdown (when new infrastructure no longer needs to be built as much) it’s still $70bn per year to keep installing the newest hardware somewhere, so that some datacenter site will end up having it available.
Thus a few years after slowdown, we get about 2x more compute supported by the same level of funding (from $100bn per year to $45bn per year for the same compute, or keeping to $100bn per year for 2x the compute). But since 2x in compute corresponds to 2 years of compute hardware price-performance progress, and the relevant anchor is the 2000x of 2022-2028 training compute scale-up, that is just playing with about 2 years in the 2028-2045 period when another 2000x compute scaleup happens, mostly due to increasing price-performance of compute, and a level of growth similar to that of the current tech giants in the past. So not a crucial update.