Another possible problem with this story is if AGI can complete tasks that aren’t composed of smaller subtasks, unlocking unforseen value that can’t be achieved with RLaaS. I’m skeptical, for example, I can’t think of a task that can’t be completed by organizing enough smart people to work on subproblems. But it’s worth mentioning.
A big one is any task that has a lot of abstraction leaks, where you can’t neatly use APIs to factor out problems:
I’m generally skeptical that anything in the vicinity of factored cognition will achieve both sufficient safety and sufficient capability simultaneously, for reasons similar to Eliezer’s here. For example, I’ll grant that a team of 10 people can design a better and more complex widget than any one of them could by themselves. But my experience (from having been on many such teams) is that the 10 people all need to be explaining things to each other constantly, such that they wind up with heavily-overlapping understandings of the task, because all abstractions are leaky. And you can’t just replace the 10 people with 100 people spending 10× less time, or the project will absolutely collapse, crushed under the weight of leaky abstractions and unwise-in-retrospect task-splittings and task-definitions, with no one understanding what they’re supposed to be doing well enough to actually do it. In fact, at my last job, it was not at all unusual for me to find myself sketching out the algorithms on a project and sketching out the link budget and scrutinizing laser spec sheets and scrutinizing FPGA spec sheets and nailing down end-user requirements, etc. etc. Not because I’m individually the best person at each of those tasks—or even very good!—but because sometimes a laser-related problem is best solved by switching to a different algorithm, or an FPGA-related problem is best solved by recognizing that the real end-user requirements are not quite what we thought, etc. etc. And that kind of design work is awfully hard unless a giant heap of relevant information and knowledge is all together in a single brain.
While I don’t entirely unendorse the use of parallelism, and do tend to be more optimistic on how much can be parallelized and how useful parallelization is, I also don’t agree with the claim that you can organize enough smart people to work on subproblems for anywhere close to 100% of a problem, and serial speeds are still a bottleneck (but not too restrictive of a bottleneck) (at least if we don’t assume superhuman coordination).
My main reason why I don’t expect this world is that I expect data-rich but compute-poor models to by default only be useful on benchmarks, because it’s way too easy to overfit on small models, and I remember reading a paper that showed that scaled-up generalist models had far less overfitting/teaching to the test than small models.
Even for modern LLMs, data leakage is an issue, but it’s even worse for small models, so I expect much worse results for other companies trying to train their own small models by using lots of data (except in domains where this is easy to verify, but at that point a generalist model also works.)
A big one is any task that has a lot of abstraction leaks, where you can’t neatly use APIs to factor out problems:
While I don’t entirely unendorse the use of parallelism, and do tend to be more optimistic on how much can be parallelized and how useful parallelization is, I also don’t agree with the claim that you can organize enough smart people to work on subproblems for anywhere close to 100% of a problem, and serial speeds are still a bottleneck (but not too restrictive of a bottleneck) (at least if we don’t assume superhuman coordination).
My main reason why I don’t expect this world is that I expect data-rich but compute-poor models to by default only be useful on benchmarks, because it’s way too easy to overfit on small models, and I remember reading a paper that showed that scaled-up generalist models had far less overfitting/teaching to the test than small models.
Even for modern LLMs, data leakage is an issue, but it’s even worse for small models, so I expect much worse results for other companies trying to train their own small models by using lots of data (except in domains where this is easy to verify, but at that point a generalist model also works.)