We have seen LLMs scale to impressively general performance. This does not mean they will soon reach human level because intelligence is not just a knob that needs to get turned further, it comprises qualitatively distinct functions.
Deep learning AI research isn’t concerned with inventing AIs, it’s concerned with inventing AI training processes. For human minds, the relevant training process is natural selection, which doesn’t have nearly as many qualitatively distinct functions as human minds do. Scaling lets the same training process produce more qualitatively distinct functions in a resulting model.
For the straightforward thing to have a chance of working, training processes need to scale at all (which they historically often didn’t), they need to be able to represent the computations that would exhibit the capabilities (long reasoning traces are probably sufficient to bootstrap general intelligence, given the right model weights), and the feasible range of scaling needs to actually encounter new capabilities.
It’s unknown which capabilities will be encountered as we scale from 20 MW systems that trained GPT-4 to 5 GW training systems. This level of scaling will happen soon, by 2028-2030, regardless of its success in capabilities. If it fails to show significant progress, 2030-2040 will be slower without new ideas, though research funding is certainly skyrocketing and there are still whole orchards of low hanging fruit. So scaling of training systems concentrates the probability of significant capability progress into the next few years, whatever that probability is, relative to subsequent several years.
Scaling of pretraining makes System 1 thinking stronger, o1/R1-like training might end up stitching together a functional System 2 applicable in general that wouldn’t need to emerge fully formed as a result of iteratively applying System 1. If the resulting model is sufficiently competent to tinker with AI training processes, that might be all it takes for it to quickly fix remaining gaps in capability. In particular, if it’s able to generate datasets and run some RL post-training in order to make further progress on particular problems, this might be a good enough crutch while online learning is absent and original ideas can only form as a result of fiddly problem specific RL post-training that needs to be set up first.
It’s an argument about long reasoning traces having sufficient representational capacity to bootstrap general intelligence, not forecasting that the bootstrapping will actually occur. It’s about a necessary condition for straightforward scaling to have a chance of getting there, at an unknown level of scale.
Deep learning AI research isn’t concerned with inventing AIs, it’s concerned with inventing AI training processes. For human minds, the relevant training process is natural selection, which doesn’t have nearly as many qualitatively distinct functions as human minds do. Scaling lets the same training process produce more qualitatively distinct functions in a resulting model.
For the straightforward thing to have a chance of working, training processes need to scale at all (which they historically often didn’t), they need to be able to represent the computations that would exhibit the capabilities (long reasoning traces are probably sufficient to bootstrap general intelligence, given the right model weights), and the feasible range of scaling needs to actually encounter new capabilities.
It’s unknown which capabilities will be encountered as we scale from 20 MW systems that trained GPT-4 to 5 GW training systems. This level of scaling will happen soon, by 2028-2030, regardless of its success in capabilities. If it fails to show significant progress, 2030-2040 will be slower without new ideas, though research funding is certainly skyrocketing and there are still whole orchards of low hanging fruit. So scaling of training systems concentrates the probability of significant capability progress into the next few years, whatever that probability is, relative to subsequent several years.
Yes, I agree with almost all of that, particularly:
Technically it deep learning research is concerned with inventing AIs, but lately through inventing AI training processes.
The only part I either don’t understand or don’t agree with is:
Though a simple training process certainly find diverse functions, I don’t think the current paradigm will get all of the ones needed for AGI.
Scaling of pretraining makes System 1 thinking stronger, o1/R1-like training might end up stitching together a functional System 2 applicable in general that wouldn’t need to emerge fully formed as a result of iteratively applying System 1. If the resulting model is sufficiently competent to tinker with AI training processes, that might be all it takes for it to quickly fix remaining gaps in capability. In particular, if it’s able to generate datasets and run some RL post-training in order to make further progress on particular problems, this might be a good enough crutch while online learning is absent and original ideas can only form as a result of fiddly problem specific RL post-training that needs to be set up first.
It might go that way, but I don’t see strong reasons to expect it.
It’s an argument about long reasoning traces having sufficient representational capacity to bootstrap general intelligence, not forecasting that the bootstrapping will actually occur. It’s about a necessary condition for straightforward scaling to have a chance of getting there, at an unknown level of scale.