There are very significant efficiency gains from larger scale-up world sizes. It’s 2-3x faster generation per request (and so 2-3x more training steps in RLVR), or 2-3x higher throughput per chip at the same speed per request (which is like having 2-3x more chips), for the same chip but with different scale-up world size (8xB200 vs. GB200 NVL72).
So Anthropic’s access to Trainium 2 Ultra racks plausibly gave them more access to compute in some regimes (such as for experimenting with RLVR on larger models) than OpenAI had with their 8-chip Nvidia servers, at least starting late 2025, probably months earlier at a meaningful scale for R&D than when they got to flagship model inference scale and reduced prices for Opus 4. (Though your point is probably more about what happened prior to late or even not-late 2025.)
There are very significant efficiency gains from larger scale-up world sizes. It’s 2-3x faster generation per request (and so 2-3x more training steps in RLVR), or 2-3x higher throughput per chip at the same speed per request (which is like having 2-3x more chips), for the same chip but with different scale-up world size (8xB200 vs. GB200 NVL72).
So Anthropic’s access to Trainium 2 Ultra racks plausibly gave them more access to compute in some regimes (such as for experimenting with RLVR on larger models) than OpenAI had with their 8-chip Nvidia servers, at least starting late 2025, probably months earlier at a meaningful scale for R&D than when they got to flagship model inference scale and reduced prices for Opus 4. (Though your point is probably more about what happened prior to late or even not-late 2025.)