RL-as-a-Service will outcompete AGI companies (and that’s good)
Companies drive AI development today. There’s two stories you could tell about the mission of an AI company:
AGI: AI labs will stop at nothing short of Artificial General Intelligence. With enough training and iteration AI will develop a general ability to solve any (feasible) task. We can leverage this general intelligence to solve any problem, including how to make a profit.
Reinforcement Learning-as-a-Service (RLaaS)[1]: AI labs have an established process for training language models to attain high performance on clean datasets. By painstakingly creating benchmarks for problems of interest, they can solve any given problem with RL leveraging language models as a general-purpose prior. This is essentially a version of the CAIS model.
Both visions are ambitious in the sense that they aim to solve every problem. But RLaaS is more conservative because it tackles each problem separately and relies on some human effort to build datasets. RLaaS requires many models of limited capability honed on specific problems. AGI requires one model with high performance on many tasks.
So, which will dominate the market? I argue that RLaaS has both a better business case and creates less existential risk. It should be promoted.
Why RLaaS will win
RLaaS has proven performance
We already know that training a model on enough task data is sufficient to get high performance on that task. This has been true for the last few decades in machine learning, but has come into focus with language models. AI companies improved model performance across dozens of benchmarks using RL and data from related tasks. The RLaaS model is proven.
The argument that general-purpose reasoning ability will transfer to many domains is more tenuous. Performance gains on e.g. math problems does transfer to other domains, but only in a limited fashion. We’ve seen dramatic improvements on IMO performance, but this hasn’t translated into dramatic gains in other fields.
This is not to say that better reasoning can’t produce broad performance gains, just that for a conservative investor today, training a model on well-defined tasks is a safer bet.
RLaaS might cost less
It’s safe to assume that a general-purpose model will be more complicated than a specialized model. So inference costs will be higher per task. However, that may be not be a problem if the general-purpose model has much higher performance than the specialized model.[2]
Something I’m less certain about is the per-task R&D cost. To solve a particular task, building AGI requires substantially more investment than RLaaS, but amortized across enough tasks, the costs may be lower.[3]
RLaaS is harder to copy
If data scaling drives model performance, you can carve out a safe niche by building your own private dataset for task X. By fine-tuning a model on that data, you now have the best model for completing task X. Competitors would have to go through the same trial-and-error process, which may not be worthwhile if you have a first-mover advantage.
Building AGI, by contrast, is harder to control. For one, there’s always the risk that a more capable model is misaligned and simply escapes.
Even with an aligned model, it’s not clear that AGI can be kept under wraps. Consider how quickly things like model release dates and algorithms diffuse in the AI industry. If the key is a handful of clever tricks, those details can be leaked pretty easily. And by virtue of being so valuable, there are stronger incentives to steal AGI.
Merely knowing that you created AGI may be enough for others to retrace your steps. It’s a lot easier to invent something when you already know it’s possible.
AGI is inherently harder to control than a niche dataset.
RLaaS has lower misalignment risk
Present-day models trained on defined tasks are aligned with their users and creators. While these models may be used for malicious purposes, they pose little risk on their own. RLaaS is roughly aligned.[4]
However, the AGI model doesn’t have the same assurances. Future AI systems trained under a different paradigm and operating in an open-ended fashion may be misaligned. This adds a substantial downside to developing and using such models.
Conclusion: RLaaS is better and should be promoted
RLaaS has proven performance, lower costs, is more excludable, and is safer. If this holds, most AI companies will pivot away from pursuing AGI and towards RLaaS.
That’s good news because it promises a switch to a safer mode of AI development. To the degree that we can promote such a transition, RLaaS should be encouraged. In fact, I’m intentionally using the buzzword “RLaaS” for this reason.
Of course, RLaaS is not without risks; misuse of specialized models is a near term concern. In the future, the concatenation of specialized models may create or assist general intelligences.
But on balance, a transition to the RLaaS model would reduce AI risk and delay the arrival of AGI.
- ^This article is the first place I encountered the term. 
- ^Though during deployment, it may make more sense to train a specialized model on outputs of the general model to save on inference costs. 
- ^Another possible problem with this story is if AGI can complete tasks that aren’t composed of smaller subtasks, unlocking unforseen value that can’t be achieved with RLaaS. I’m skeptical, for example, I can’t think of a task that can’t be completed by organizing enough smart people to work on subproblems. But it’s worth mentioning. 
- ^Models are aligned in practice, but are they aligned in theory? I think we’re approaching an understanding of why neural networks generalize both in distribution and out of distribution. Informally, training chisels cognitive grooves into an agent. Results like the above make me hopeful that prosaic alignment is possible with models trained in the current paradigm. 
LLMs don’t suffer from negative transfer, and might even have positive transfer between tasks (getting better at one task doesn’t make them worse at other tasks). Most negative transfer visible in practice is about opportunity cost, where focusing in one area leads to neglecting other areas. So it’s mostly about specialized data collection (including development of RLVR environments, or generation of synthetic “textbook” data), and that data can then be used in general models that can do all the tasks simultaneously.
In terms of business, the question is where the teams working on task-specific data are working. They could just be selling the data to the AI companies to be incorporated in the general models, and these teams might even become parts of those AI companies. Post-training open weights models for a single task mostly produces an inferior product, because the model will be worse than a general model at everything else, while the general model could do this particular task just as well (if it had the training data).
A better product might be possible with the smallest/cheapest task-specialized models where there actually does start to be negative transfer and you can get them at some level of capability in any one area, but not in multiple areas at the same time. It’s unclear if this remains a thing with models of 2026-2029 (when the “smallest/cheapest” models will be significantly larger than what is considered “smallest/cheapest” today), in particular because the prevailing standard of quality might grow into the lower cost of inferencing larger models, making the models that are small by today’s standards unappealing.
So if the smallest economically important models get large enough, negative transfer might disappear, and there won’t be a technical reason to specialize models, as long as you have all the task specific data for all the tasks in the hands of one company. AI companies that produce foundation models are necessarily quite rich, because they need access to large amounts of training compute (2026 training compute is already about $30bn per 1 GW system for compute hardware alone, which is at least $15bn per year in the long term, but likely more since AI growth is not yet done). So it’s likely that they’ll manage to get access to good task specific data for most of the economically important topics, by acquiring other companies if necessary, at which point the smaller task specific post-training companies mostly don’t have a moat, because their product is neither cheaper nor better than the general models of the big AI companies.
These are good points. I’m uncertain about what models will form the foundation of RLaaS. But I think your point about where the task-specific data teams are working is more important. Off the top of my head, I think there’s 3 bins:
For a lot of programming tasks, big AI companies already have lots of expertise and users in-house, so I expect them to dominate production of code generation.
For some tasks like writing marketing copy, LLM’s are already good enough at this. There’s no business training models further here.
Most interesting are tasks that require lots of tacit knowledge or iteration. For example, getting to self-driving cars required a decade plus of iterating on algorithms and data. I imagine lots of corporations will privately put a bunch of effort into making AI work on their specific problems. Physical tasks in specialized trades are another example.
For tasks in #3, the question is whether to join up with the big AI companies, or develop your own solution to the problem and keep it private.
Can you provide some examples that you think are well-suited to RLaaS? Getting high-quality data to train on is a highly nontrivial task and one of the bottlenecks for general models too.
I can imagine a consulting service that helps companies turn their proprietary data into useful training data, which they then use to train a niche model. I guess you could call that RLaaS, though it’s likely to be more of a distilling and fine-tuning of a general model.
I would count your consulting service as RLaaS essentially. I’ll admit, RLaaS is a buzzword that obscures a lot. “Have AI researchers and domain experts iterate on current AI models until they are performant at a particular task” would be more accurate. Things I think this model will apply to:
Anything involving robots. Consider the journey to self driving cars with lots of human data collection, updating the hardware, cleaning the dataset, and tweaking algorithms. Any physical manipulation task that has to be economically competitive will need a lot of input from experts. Factory managers will need robots that operate under idiosyncratic requirements. It’ll take time to iron out the kinks.
To a lesser extent, repetitive internal company processes will need some fine tuning. Filling out forms specific to a company, filing reports in the local format, etc. Current LLM’s can probably do this with 90% success, but pushing that to 99% is valuable and will take a little work.
Research-heavy domains. The stuff covered in publications is 10% of the knowledge you need to do science. I expect LLM research assistants to need adjustment for things like “write code using all these niche software packages”, “this is the important information we need from this paper”, “results from this lab are BS so ignore them”.
My priors are that reality is detailed and getting a general purpose technology like modern AI to actually work in a particular domain takes some iteration. That’s my key takeaway from that METR study:
https://www.lesswrong.com/posts/m2QeMwD7mGKH6vDe2/?commentId=T5MNnpneEZho2CuZS
The world is not automatically divided up into lots of separate tasks.
If you divide tasks into too many small pieces, too many little buckets, many important problems can fall through the gaps.
For example. If you use RL to train a plumbing robot. And separately train an electrician robot. Then neither of these robots is can solve the problem that you get an electric shock whenever you turn on the tap.
If you train on a few huge buckets, then you have 1 robot that does everything, and that’s basically an AGI again.
And in this RL as a service model, wouldn’t there be people doing RL for AI research.
So, when this model gets good enough, someone can just say “build an AGI” and get one. Because all tasks are being automated, and that includes the task of building AGI.
Actually, RL is based on trial and error. It would be hard to train an AI researcher without giving it the opportunity to run arbitrary code in training.
A big one is any task that has a lot of abstraction leaks, where you can’t neatly use APIs to factor out problems:
While I don’t entirely unendorse the use of parallelism, and do tend to be more optimistic on how much can be parallelized and how useful parallelization is, I also don’t agree with the claim that you can organize enough smart people to work on subproblems for anywhere close to 100% of a problem, and serial speeds are still a bottleneck (but not too restrictive of a bottleneck) (at least if we don’t assume superhuman coordination).
My main reason why I don’t expect this world is that I expect data-rich but compute-poor models to by default only be useful on benchmarks, because it’s way too easy to overfit on small models, and I remember reading a paper that showed that scaled-up generalist models had far less overfitting/teaching to the test than small models.
Even for modern LLMs, data leakage is an issue, but it’s even worse for small models, so I expect much worse results for other companies trying to train their own small models by using lots of data (except in domains where this is easy to verify, but at that point a generalist model also works.)