Vladimir_Nesov comments on RL-as-a-Service will outcompete AGI companies (and that’s good)

Vladimir_Nesov 8 Sep 2025 15:46 UTC
8 points
1
LLMs don’t suffer from negative transfer, and might even have positive transfer between tasks (getting better at one task doesn’t make them worse at other tasks). Most negative transfer visible in practice is about opportunity cost, where focusing in one area leads to neglecting other areas. So it’s mostly about specialized data collection (including development of RLVR environments, or generation of synthetic “textbook” data), and that data can then be used in general models that can do all the tasks simultaneously.

In terms of business, the question is where the teams working on task-specific data are working. They could just be selling the data to the AI companies to be incorporated in the general models, and these teams might even become parts of those AI companies. Post-training open weights models for a single task mostly produces an inferior product, because the model will be worse than a general model at everything else, while the general model could do this particular task just as well (if it had the training data).

A better product might be possible with the smallest/cheapest task-specialized models where there actually does start to be negative transfer and you can get them at some level of capability in any one area, but not in multiple areas at the same time. It’s unclear if this remains a thing with models of 2026-2029 (when the “smallest/cheapest” models will be significantly larger than what is considered “smallest/cheapest” today), in particular because the prevailing standard of quality might grow into the lower cost of inferencing larger models, making the models that are small by today’s standards unappealing.

So if the smallest economically important models get large enough, negative transfer might disappear, and there won’t be a technical reason to specialize models, as long as you have all the task specific data for all the tasks in the hands of one company. AI companies that produce foundation models are necessarily quite rich, because they need access to large amounts of training compute (2026 training compute is already about $30bn per 1 GW system for compute hardware alone, which is at least $15bn per year in the long term, but likely more since AI growth is not yet done). So it’s likely that they’ll manage to get access to good task specific data for most of the economically important topics, by acquiring other companies if necessary, at which point the smaller task specific post-training companies mostly don’t have a moat, because their product is neither cheaper nor better than the general models of the big AI companies.
- harsimony 8 Sep 2025 16:20 UTC
  1 point
  0
  Parent
  These are good points. I’m uncertain about what models will form the foundation of RLaaS. But I think your point about where the task-specific data teams are working is more important. Off the top of my head, I think there’s 3 bins:
  1. For a lot of programming tasks, big AI companies already have lots of expertise and users in-house, so I expect them to dominate production of code generation.
  2. For some tasks like writing marketing copy, LLM’s are already good enough at this. There’s no business training models further here.
  3. Most interesting are tasks that require lots of tacit knowledge or iteration. For example, getting to self-driving cars required a decade plus of iterating on algorithms and data. I imagine lots of corporations will privately put a bunch of effort into making AI work on their specific problems. Physical tasks in specialized trades are another example.
  For tasks in #3, the question is whether to join up with the big AI companies, or develop your own solution to the problem and keep it private.