Vladimir_Nesov comments on Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall

Vladimir_Nesov 2 May 2025 22:39 UTC
5 points
0
Avoidable chip margins are maybe 60% at most (of all-in training system cost)^[1], which saves 2.8 years out of 22.5 years, to reach 2000x at 1.4x per year (and Google with their TPUs is already somewhat past this, so it should be concerning if this tries to become a crux of an argument). If the cost reduction happens quickly, it instead concentrates the faster scaling into 2022-2029, making the contrast with 2030-2050 starker.

The wafers are getting more complicated to produce, so the cost per unit area probably won’t be going down, I don’t see changes in total wafer supply contributing to the size of individual training systems constrained by cost.

Introduction of packaging plausibly helps short term with non-chip costs that scale with the number of chips (we get fewer chips per FLOP/s), similarly for higher power and chip density within a rack (Rubin Ultra is 600 KW per rack). But this also precedes 2030-2050.

(I worry about accounting for any specific “one-time” factors that affect Moore’s law, because a lot of it is probably built out of such factors. Different people will be aware of different factors at different times, therefore considering them “one-time”.)
1. ↩︎
  The reference design for a 1024-chip HGX H100 system has $33.8 per chip for the actual servers, and $8.2K per chip for networking. So if various non-IT expenses on a 100K H100 campus are on the order of $500M (land, buildings, cooling, power, construction), and networking 100K H100s costs a bit more than letting the 1K parts remain unconnected, let’s say another $2K per chip, we get about $49K per chip (which is to say, $4.9bn per 100K H100s). If 90% (!) of the H100 server is gross margin, it saves us $30.9K per chip, or 62%.
What links here?
- Vladimir_Nesov's comment on Mitchell_Porter’s Shortform by Mitchell_Porter (15 May 2025 15:10 UTC; 11 points)
- romeo 3 May 2025 1:54 UTC
  3 points
  0
  Parent
  To be clear I don’t think the profit margin is the only thing that explains the discrepancy.
  
  I think the relevant question is more like: under my method, is 1.3x (production) x 1.2x (hardware) = 1.56x/year realistic over 2-3 decades or am I being too bullish? You could ask an analagous thing about your method (i.e., is 1x investment and 1.4x price performance realistic over the next 2-3 decades?) Two different ways of looking at it that should converge.
  
  If i’m not being too bullish with my numbers (which is very plausible, e.g., it could easily be 1.2x, 1.2x), then i’d guess the discrepancy with your method comes from it being muddled with economic factors (not just chip designer profit margins but supply/demand factors affecting costs across the entire supply chain, e.g., down to like how much random equipment costs and salaries for engineers). Maybe 1x investment is too low, maybe should be multiplied with inflation and GDP growth?
  - Vladimir_Nesov 3 May 2025 10:53 UTC
    3 points
    1
    Parent
    The largest capex projects (credible-at-the-time announcements / money raised / assets minus debts) of dot-com bubble were about $10-14bn nominal around 2000 (Global Crossing, Level 3 Communications), or $19-26bn in 2025 dollars. Now that it’s 25 years later, we have hyperscalers that can afford ~$80bn capex per year, which is 3x-4x more than $19-26bn.
    
    Similarly, a 2029-2030 slowdown would turn a lot of AI companies or AI cloud companies irrelevant or bankrupt, and the largest projects of 2027-2029 won’t be the baseline for gradual growth towards 2050 for the most powerful companies found in 2050 (except possibly for Google). I’m looking at 22 years instead of 25, it’s unclear how well the AI profits can be captured by a single company, and because of inference the cost of revenue is higher than for software. So let’s say capex per year (CPI-adjusted, in 2025 dollars) for the largest AI companies in 2050 is 3x the largest projects of 2027-2029, which I was estimating at $140bn. Anchoring to Nvidia’s process of 2 years per major hardware refresh, a training system can eat two years’ worth of capex, so in total we get $840bn for a 2050 training system (in 2025 dollars).
    
    This is 5 years ahead of the zero change in cost of training system projection from the post (at 1.4x per year for CPI-adjusted price-performance), so maybe the second 2,000x of scaling should be reached by 2045 instead.
    What links here?
    Vladimir_Nesov's comment on Vladimir_Nesov’s Shortform by Vladimir_Nesov (13 Jun 2025 11:26 UTC; 62 points)
    - romeo 3 May 2025 17:41 UTC
      3 points
      0
      Parent
      so maybe the second 2,000x of scaling should be reached by 2045 instead.
      Yeah sounds reasonable, that would match up with my 1.56x/year number, so to summarize, we both think this is roughly plausible for 2028-2045?
      
      1.3x/year (compute production) x 1.2x/year (compute efficiency) ~= 1.55x/year (compute available)
      1.1x/year (investment) x 1.4x/year (price performance) ~= 1.55x/year (compute available)
      So a 3x slowdown compared to the 2022-2028 trend (~3.5x/year).
      - Vladimir_Nesov 3 May 2025 18:28 UTC
        4 points
        0
        Parent
        With the caveat that I’m not expecting gradual increase in investment at 1.1x/year, but instead wailing and gnashing of teeth in the early 2030s, followed by a faster increase in investment from a lower baseline. And this is all happening within the hypothetical where RLVR doesn’t work out at scale and there are no other transformative advancements all the way through 2045. I don’t particulary expect this hypothetical to become reality, but can’t confidently rule it out (hence this post).
        
        I don’t have a sense of the supply-side picture, it seems more relevant for global inference buildout, I don’t see how it anchors capex of an individual company. The fraction an individual company contributes to the global chip market doesn’t seem like a meaningful number to me, more like a ratio of two unrelated numbers (as long as it’s not running towards the extremes, which it doesn’t in this case).