When funding stops increasing, the current pace of 3.55x per year (fueled by increasing funding) regresses to the pace of improvement in price-performance of compute of 1.4x per year, which is 3.7x slower. If the $140bn training systems of 2028 do get built, they’ll each produce about 1.5e22 BF16 FLOP/s of compute, enough to train models for about 5e28 BF16 FLOPs.
This is a nice way to break it down, but I think it might have weird dependencies e.g., chip designer profit margins.
Instead of:
training run investment ($) x hardware price performance (FLOP/$) = training compute (FLOP)
Another possible breakdown is:
hardware efficiency per unit area (FLOP/s/mm^2) x global chip production (mm^2) x global share of chips used in training run (%) * training time (s) = training compute (FLOP)
This gets directly at the supply side of compute. It’s basically ‘moore’s law x AI chip production x share of chips used’. In my model for the next three years are 1.35x1.65x1.5 ~= 3.4x, so matches your 3.55x/year pretty closely. Where we differ slightly I think would be in the later compute slowdown.
Under my model there are also one-time gains happening in AI chip production and share of chips used (as a result of the one-time spending gains in your model). Chip production has one-time gains because AI only uses 5-10% of TSMC leading nodes and is using up spare capacity as fast as packaging/memory can be produced. Once this caps out, I think the 1.65x will default to being something like 1.2-4x as it gets bottlenecked on fab expansion (assuming like you said an investment slowdown). ‘Share of chips used’ growth goes to 1x by definition.
Even taking the lower end of the estimate would mean that ‘moore’s law’ hardware gains would have to slow down ~2x to 1.16x/year to match your 1.4x number. I do think hardware gains will slow somewhat but 1.16x is below what I would bet. Taking my actual medians, I think i’m at like 1.3x (production) x 1.2x (hardware) = 1.56x/year so more like a 2.8x slowdown, not 3.7x slowdown.
So resolving the discrepancy, it seems like my model is basically saying that your model overestimates the slowdown because it assumes profit margins stay fixed, but instead under slowing investment growth these should collapse? That feels like it doesn’t fully explain it though since it seems like it should be a one time fall (albeit a big one). Maybe in the longer term (i.e. post-2035) I agree with you more and my 1.3x production number is too bullish.
Avoidable chip margins are maybe 60% at most (of all-in training system cost)[1], which saves 2.8 years out of 22.5 years, to reach 2000x at 1.4x per year (and Google with their TPUs is already somewhat past this, so it should be concerning if this tries to become a crux of an argument). If the cost reduction happens quickly, it instead concentrates the faster scaling into 2022-2029, making the contrast with 2030-2050 starker.
The wafers are getting more complicated to produce, so the cost per unit area probably won’t be going down, I don’t see changes in total wafer supply contributing to the size of individual training systems constrained by cost.
Introduction of packaging plausibly helps short term with non-chip costs that scale with the number of chips (we get fewer chips per FLOP/s), similarly for higher power and chip density within a rack (Rubin Ultra is 600 KW per rack). But this also precedes 2030-2050.
(I worry about accounting for any specific “one-time” factors that affect Moore’s law, because a lot of it is probably built out of such factors. Different people will be aware of different factors at different times, therefore considering them “one-time”.)
The reference design for a 1024-chip HGX H100 system has $33.8 per chip for the actual servers, and $8.2K per chip for networking. So if various non-IT expenses on a 100K H100 campus are on the order of $500M (land, buildings, cooling, power, construction), and networking 100K H100s costs a bit more than letting the 1K parts remain unconnected, let’s say another $2K per chip, we get about $49K per chip (which is to say, $4.9bn per 100K H100s). If 90% (!) of the H100 server is gross margin, it saves us $30.9K per chip, or 62%.
To be clear I don’t think the profit margin is the only thing that explains the discrepancy.
I think the relevant question is more like: under my method, is 1.3x (production) x 1.2x (hardware) = 1.56x/year realistic over 2-3 decades or am I being too bullish? You could ask an analagous thing about your method (i.e., is 1x investment and 1.4x price performance realistic over the next 2-3 decades?) Two different ways of looking at it that should converge.
If i’m not being too bullish with my numbers (which is very plausible, e.g., it could easily be 1.2x, 1.2x), then i’d guess the discrepancy with your method comes from it being muddled with economic factors (not just chip designer profit margins but supply/demand factors affecting costs across the entire supply chain, e.g., down to like how much random equipment costs and salaries for engineers). Maybe 1x investment is too low, maybe should be multiplied with inflation and GDP growth?
The largest capex projects (credible-at-the-time announcements / money raised / assets minus debts) of dot-com bubble were about $10-14bn nominal around 2000 (Global Crossing, Level 3 Communications), or $19-26bn in 2025 dollars. Now that it’s 25 years later, we have hyperscalers that can afford ~$80bn capex per year, which is 3x-4x more than $19-26bn.
Similarly, a 2029-2030 slowdown would turn a lot of AI companies or AI cloud companies irrelevant or bankrupt, and the largest projects of 2027-2029 won’t be the baseline for gradual growth towards 2050 for the most powerful companies found in 2050 (except possibly for Google). I’m looking at 22 years instead of 25, it’s unclear how well the AI profits can be captured by a single company, and because of inference the cost of revenue is higher than for software. So let’s say capex per year (CPI-adjusted, in 2025 dollars) for the largest AI companies in 2050 is 3x the largest projects of 2027-2029, which I was estimating at $140bn. Anchoring to Nvidia’s process of 2 years per major hardware refresh, a training system can eat two years’ worth of capex, so in total we get $840bn for a 2050 training system (in 2025 dollars).
This is 5 years ahead of the zero change in cost of training system projection from the post (at 1.4x per year for CPI-adjusted price-performance), so maybe the second 2,000x of scaling should be reached by 2045 instead.
With the caveat that I’m not expecting gradual increase in investment at 1.1x/year, but instead wailing and gnashing of teeth in the early 2030s, followed by a faster increase in investment from a lower baseline. And this is all happening within the hypothetical where RLVR doesn’t work out at scale and there are no other transformative advancements all the way through 2045. I don’t particulary expect this hypothetical to become reality, but can’t confidently rule it out (hence this post).
I don’t have a sense of the supply-side picture, it seems more relevant for global inference buildout, I don’t see how it anchors capex of an individual company. The fraction an individual company contributes to the global chip market doesn’t seem like a meaningful number to me, more like a ratio of two unrelated numbers (as long as it’s not running towards the extremes, which it doesn’t in this case).
Interesting point. But could I check why you and @Vladimir_M are confident Moore’s law continues at all?
I’d have guessed maintaining the rate of gains in hardware efficiency will require exponentially increasing chip R&D spending and researcher hours.
But if total spending on chips has plateaued, then Nvidia etc.’s R&D spending will have also plateaued, which I think would imply hardware efficiency gains drop to near zero.
It’s an anchor, something concrete to adjust predictions around, the discussion in this thread is about implications of the anchor rather than its strength (so being confident in it isn’t really implied). Moore’s law about transistor count per die mostly stopped, but the historical trend seems to be surviving in its price-performance form (which should really be about compute per datacenter-level total cost of ownership). So maybe it keeps going as it did for decades, and specific predictions for what would keep Moore’s law going at any given time were always hard, even as it did continue. Currently this might be about advanced packaging (making the parts of a datacenter outside the chips cheaper per transistor).
If Moore’s law stops even for price-performance, then AI scaling slowdown gets even stronger in 2030-2050 than what this post explores. Also, growth in compute spending probably doesn’t completely plateau (progress in adoption alone would feed growth for many years), and that to some extent compensates for compute not getting cheaper as fast as it used to (if that happens).
This is a nice way to break it down, but I think it might have weird dependencies e.g., chip designer profit margins.
Instead of:
training run investment ($) x hardware price performance (FLOP/$) = training compute (FLOP)
Another possible breakdown is:
hardware efficiency per unit area (FLOP/s/mm^2) x global chip production (mm^2) x global share of chips used in training run (%) * training time (s) = training compute (FLOP)
This gets directly at the supply side of compute. It’s basically ‘moore’s law x AI chip production x share of chips used’. In my model for the next three years are 1.35x1.65x1.5 ~= 3.4x, so matches your 3.55x/year pretty closely. Where we differ slightly I think would be in the later compute slowdown.
Under my model there are also one-time gains happening in AI chip production and share of chips used (as a result of the one-time spending gains in your model). Chip production has one-time gains because AI only uses 5-10% of TSMC leading nodes and is using up spare capacity as fast as packaging/memory can be produced. Once this caps out, I think the 1.65x will default to being something like 1.2-4x as it gets bottlenecked on fab expansion (assuming like you said an investment slowdown). ‘Share of chips used’ growth goes to 1x by definition.
Even taking the lower end of the estimate would mean that ‘moore’s law’ hardware gains would have to slow down ~2x to 1.16x/year to match your 1.4x number. I do think hardware gains will slow somewhat but 1.16x is below what I would bet. Taking my actual medians, I think i’m at like 1.3x (production) x 1.2x (hardware) = 1.56x/year so more like a 2.8x slowdown, not 3.7x slowdown.
So resolving the discrepancy, it seems like my model is basically saying that your model overestimates the slowdown because it assumes profit margins stay fixed, but instead under slowing investment growth these should collapse? That feels like it doesn’t fully explain it though since it seems like it should be a one time fall (albeit a big one). Maybe in the longer term (i.e. post-2035) I agree with you more and my 1.3x production number is too bullish.
Avoidable chip margins are maybe 60% at most (of all-in training system cost)[1], which saves 2.8 years out of 22.5 years, to reach 2000x at 1.4x per year (and Google with their TPUs is already somewhat past this, so it should be concerning if this tries to become a crux of an argument). If the cost reduction happens quickly, it instead concentrates the faster scaling into 2022-2029, making the contrast with 2030-2050 starker.
The wafers are getting more complicated to produce, so the cost per unit area probably won’t be going down, I don’t see changes in total wafer supply contributing to the size of individual training systems constrained by cost.
Introduction of packaging plausibly helps short term with non-chip costs that scale with the number of chips (we get fewer chips per FLOP/s), similarly for higher power and chip density within a rack (Rubin Ultra is 600 KW per rack). But this also precedes 2030-2050.
(I worry about accounting for any specific “one-time” factors that affect Moore’s law, because a lot of it is probably built out of such factors. Different people will be aware of different factors at different times, therefore considering them “one-time”.)
The reference design for a 1024-chip HGX H100 system has $33.8 per chip for the actual servers, and $8.2K per chip for networking. So if various non-IT expenses on a 100K H100 campus are on the order of $500M (land, buildings, cooling, power, construction), and networking 100K H100s costs a bit more than letting the 1K parts remain unconnected, let’s say another $2K per chip, we get about $49K per chip (which is to say, $4.9bn per 100K H100s). If 90% (!) of the H100 server is gross margin, it saves us $30.9K per chip, or 62%.
To be clear I don’t think the profit margin is the only thing that explains the discrepancy.
I think the relevant question is more like: under my method, is 1.3x (production) x 1.2x (hardware) = 1.56x/year realistic over 2-3 decades or am I being too bullish? You could ask an analagous thing about your method (i.e., is 1x investment and 1.4x price performance realistic over the next 2-3 decades?) Two different ways of looking at it that should converge.
If i’m not being too bullish with my numbers (which is very plausible, e.g., it could easily be 1.2x, 1.2x), then i’d guess the discrepancy with your method comes from it being muddled with economic factors (not just chip designer profit margins but supply/demand factors affecting costs across the entire supply chain, e.g., down to like how much random equipment costs and salaries for engineers). Maybe 1x investment is too low, maybe should be multiplied with inflation and GDP growth?
The largest capex projects (credible-at-the-time announcements / money raised / assets minus debts) of dot-com bubble were about $10-14bn nominal around 2000 (Global Crossing, Level 3 Communications), or $19-26bn in 2025 dollars. Now that it’s 25 years later, we have hyperscalers that can afford ~$80bn capex per year, which is 3x-4x more than $19-26bn.
Similarly, a 2029-2030 slowdown would turn a lot of AI companies or AI cloud companies irrelevant or bankrupt, and the largest projects of 2027-2029 won’t be the baseline for gradual growth towards 2050 for the most powerful companies found in 2050 (except possibly for Google). I’m looking at 22 years instead of 25, it’s unclear how well the AI profits can be captured by a single company, and because of inference the cost of revenue is higher than for software. So let’s say capex per year (CPI-adjusted, in 2025 dollars) for the largest AI companies in 2050 is 3x the largest projects of 2027-2029, which I was estimating at $140bn. Anchoring to Nvidia’s process of 2 years per major hardware refresh, a training system can eat two years’ worth of capex, so in total we get $840bn for a 2050 training system (in 2025 dollars).
This is 5 years ahead of the zero change in cost of training system projection from the post (at 1.4x per year for CPI-adjusted price-performance), so maybe the second 2,000x of scaling should be reached by 2045 instead.
Yeah sounds reasonable, that would match up with my 1.56x/year number, so to summarize, we both think this is roughly plausible for 2028-2045?
1.3x/year (compute production) x 1.2x/year (compute efficiency) ~= 1.55x/year (compute available)
1.1x/year (investment) x 1.4x/year (price performance) ~= 1.55x/year (compute available)
So a 3x slowdown compared to the 2022-2028 trend (~3.5x/year).
With the caveat that I’m not expecting gradual increase in investment at 1.1x/year, but instead wailing and gnashing of teeth in the early 2030s, followed by a faster increase in investment from a lower baseline. And this is all happening within the hypothetical where RLVR doesn’t work out at scale and there are no other transformative advancements all the way through 2045. I don’t particulary expect this hypothetical to become reality, but can’t confidently rule it out (hence this post).
I don’t have a sense of the supply-side picture, it seems more relevant for global inference buildout, I don’t see how it anchors capex of an individual company. The fraction an individual company contributes to the global chip market doesn’t seem like a meaningful number to me, more like a ratio of two unrelated numbers (as long as it’s not running towards the extremes, which it doesn’t in this case).
Interesting point. But could I check why you and @Vladimir_M are confident Moore’s law continues at all?
I’d have guessed maintaining the rate of gains in hardware efficiency will require exponentially increasing chip R&D spending and researcher hours.
But if total spending on chips has plateaued, then Nvidia etc.’s R&D spending will have also plateaued, which I think would imply hardware efficiency gains drop to near zero.
It’s an anchor, something concrete to adjust predictions around, the discussion in this thread is about implications of the anchor rather than its strength (so being confident in it isn’t really implied). Moore’s law about transistor count per die mostly stopped, but the historical trend seems to be surviving in its price-performance form (which should really be about compute per datacenter-level total cost of ownership). So maybe it keeps going as it did for decades, and specific predictions for what would keep Moore’s law going at any given time were always hard, even as it did continue. Currently this might be about advanced packaging (making the parts of a datacenter outside the chips cheaper per transistor).
If Moore’s law stops even for price-performance, then AI scaling slowdown gets even stronger in 2030-2050 than what this post explores. Also, growth in compute spending probably doesn’t completely plateau (progress in adoption alone would feed growth for many years), and that to some extent compensates for compute not getting cheaper as fast as it used to (if that happens).
You should actually tag @Vladimir_Nesov instead of Vladimir M, as Vladimir Nesov was the original author.