ryan_greenblatt comments on Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro

ryan_greenblatt 3 Sep 2025 15:25 UTC
LW: 6 AF: 5
0
AF
- I expect lower cost per rollout on average due to AI companies doing RL on a bunch of smaller tasks and from companies not necessarily using tons of reasoning tokens on most envs. Also, API prices are marked up relative to what companies actually pay on compute which can easily add a factor of 3. If we are just looking at the hardest agentic software engineering environments, then this closes the gap a decent amount.
- I expect spending on RL enviroments to be more like 10x lower than RL training compute rather than similar (and I wouldn’t be surprised by a large gap) because it’s hard to massively scale up spending on RL envs effectively in a short period of time while we already have an scaled up industrial process for buying more compute.
I’m more sympathetic to “companies will spend this much on some high quality RL envs” than “the typical RL env will be very expensive”, but I think some disagreement remains.