Archimedes comments on RL-as-a-Service will outcompete AGI companies (and that’s good)

Archimedes 9 Sep 2025 3:25 UTC
4 points
1
Can you provide some examples that you think are well-suited to RLaaS? Getting high-quality data to train on is a highly nontrivial task and one of the bottlenecks for general models too.

I can imagine a consulting service that helps companies turn their proprietary data into useful training data, which they then use to train a niche model. I guess you could call that RLaaS, though it’s likely to be more of a distilling and fine-tuning of a general model.
- harsimony 9 Sep 2025 13:26 UTC
  2 points
  1
  Parent
  I would count your consulting service as RLaaS essentially. I’ll admit, RLaaS is a buzzword that obscures a lot. “Have AI researchers and domain experts iterate on current AI models until they are performant at a particular task” would be more accurate. Things I think this model will apply to:
  1. Anything involving robots. Consider the journey to self driving cars with lots of human data collection, updating the hardware, cleaning the dataset, and tweaking algorithms. Any physical manipulation task that has to be economically competitive will need a lot of input from experts. Factory managers will need robots that operate under idiosyncratic requirements. It’ll take time to iron out the kinks.
  2. To a lesser extent, repetitive internal company processes will need some fine tuning. Filling out forms specific to a company, filing reports in the local format, etc. Current LLM’s can probably do this with 90% success, but pushing that to 99% is valuable and will take a little work.
  3. Research-heavy domains. The stuff covered in publications is 10% of the knowledge you need to do science. I expect LLM research assistants to need adjustment for things like “write code using all these niche software packages”, “this is the important information we need from this paper”, “results from this lab are BS so ignore them”.
  My priors are that reality is detailed and getting a general purpose technology like modern AI to actually work in a particular domain takes some iteration. That’s my key takeaway from that METR study:
  https://www.lesswrong.com/posts/m2QeMwD7mGKH6vDe2/?commentId=T5MNnpneEZho2CuZS