All relevant algorithmic techniques will remain scalable, doing better than the established techniques doesn’t oppose the bitter lesson because the newer more efficient techniques that defeat established techniques with less compute will do even better with more compute. And giant datacenter campuses (enabling more training compute) will remain scarce, even if inference compute becomes more plentiful. It would still take 1-2 years to convert even country budget level funding into available compute, to catch up with the compute available to the incumbents, who by that time have already reimplemented your new algorithmic technique at a grander scale.
Introduction of RLVR and long reasoning (OpenAI o1, DeepSeek R1) illustrates how a new technique is quickly taken up by all the players (they aren’t meaningfully locked into the older ways of making AIs), and most effectively gets turned into capabilities by the AI companies with the most compute. The 1 GW datacenter campuses that will be completed in 2026 already cost about $45bn (non-compute infrastructure plus capital cost of compute hardware) and take 1-2 years to build, which gives enough time for the AI companies who already have this kind of compute to copy and scale any new algorithmic innovations. The hypothetical new mountains that make transformer obsolete or architectural improvements that use little compute yet leapfrog capabilities from leading models are liable to follow the same path, getting scaled by the incumbents to their levels of available compute, faster than it would take the smaller innovators to build comparable compute and keep competing.
So this already locks out almost all potential entrants in 2025-2026, and the barrier will approach the limits of capabilities of tech giants and global capital in the next few years, as the datacenter campuses serving individual AI companies approach 5 GW in scale (about $200bn in total, $140bn in compute hardware). And then possibly about 10 GW in 2030s (if there’s still no intelligence explosion, merely multi-trillion dollar AI companies), as their physical scale stops increasing, and so the cost of non-compute infrastructure no longer needs to be paid again and again every 2 years with every new datacenter campus built for the next generation of compute hardware at a greater scale than the one from 2 years ago.
All relevant algorithmic techniques will remain scalable, doing better than the established techniques doesn’t oppose the bitter lesson because the newer more efficient techniques that defeat established techniques with less compute will do even better with more compute. And giant datacenter campuses (enabling more training compute) will remain scarce, even if inference compute becomes more plentiful. It would still take 1-2 years to convert even country budget level funding into available compute, to catch up with the compute available to the incumbents, who by that time have already reimplemented your new algorithmic technique at a grander scale.
Introduction of RLVR and long reasoning (OpenAI o1, DeepSeek R1) illustrates how a new technique is quickly taken up by all the players (they aren’t meaningfully locked into the older ways of making AIs), and most effectively gets turned into capabilities by the AI companies with the most compute. The 1 GW datacenter campuses that will be completed in 2026 already cost about $45bn (non-compute infrastructure plus capital cost of compute hardware) and take 1-2 years to build, which gives enough time for the AI companies who already have this kind of compute to copy and scale any new algorithmic innovations. The hypothetical new mountains that make transformer obsolete or architectural improvements that use little compute yet leapfrog capabilities from leading models are liable to follow the same path, getting scaled by the incumbents to their levels of available compute, faster than it would take the smaller innovators to build comparable compute and keep competing.
So this already locks out almost all potential entrants in 2025-2026, and the barrier will approach the limits of capabilities of tech giants and global capital in the next few years, as the datacenter campuses serving individual AI companies approach 5 GW in scale (about $200bn in total, $140bn in compute hardware). And then possibly about 10 GW in 2030s (if there’s still no intelligence explosion, merely multi-trillion dollar AI companies), as their physical scale stops increasing, and so the cost of non-compute infrastructure no longer needs to be paid again and again every 2 years with every new datacenter campus built for the next generation of compute hardware at a greater scale than the one from 2 years ago.
Really thoughtful and super agree!