Logan Zoellner comments on Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro

Logan Zoellner 9 Sep 2025 7:43 UTC
2 points
−2
There are no new ideas only new datasets
Currently all LLMs are terrible at computer-use. Part of this is an ergonomics problem (GPT agent is frequently blocked from viewing websites and I still don’t trust it enough to e.g. give it my street address and credit card number). But when I give graphically demanding task that is 100% doable in the browser, it still falls absolutely flat on its face.
What is needed for RL to succeed is something like: an internet-scale dataset of graphically demanding tasks with objective success criteria. Sooner or later someone is going to put together a dataset like “here are all 150k games on steam with a simple yes/no that tells us whether or not the AI beat the game.” And when that happens, I strongly suspect RL will suddenly start working.
Alternatively, companies like figure are planning to deploy 1000′s of robots in the real-world with more or less the same idea: create a huge training set of actual physical reality (as opposite to just text + multimedia).
Once a proper dataset is in place, I expect we will not see slow-gradual progress indicated by the METR chart, but rather a huge all-at-once leap (on par with when we first started properly applying RL to math).