Scaling of pretraining makes System 1 thinking stronger, o1/R1-like training might end up stitching together a functional System 2 applicable in general that wouldn’t need to emerge fully formed as a result of iteratively applying System 1. If the resulting model is sufficiently competent to tinker with AI training processes, that might be all it takes for it to quickly fix remaining gaps in capability. In particular, if it’s able to generate datasets and run some RL post-training in order to make further progress on particular problems, this might be a good enough crutch while online learning is absent and original ideas can only form as a result of fiddly problem specific RL post-training that needs to be set up first.
It’s an argument about long reasoning traces having sufficient representational capacity to bootstrap general intelligence, not forecasting that the bootstrapping will actually occur. It’s about a necessary condition for straightforward scaling to have a chance of getting there, at an unknown level of scale.
Yes, I agree with almost all of that, particularly:
Technically it deep learning research is concerned with inventing AIs, but lately through inventing AI training processes.
The only part I either don’t understand or don’t agree with is:
Though a simple training process certainly find diverse functions, I don’t think the current paradigm will get all of the ones needed for AGI.
Scaling of pretraining makes System 1 thinking stronger, o1/R1-like training might end up stitching together a functional System 2 applicable in general that wouldn’t need to emerge fully formed as a result of iteratively applying System 1. If the resulting model is sufficiently competent to tinker with AI training processes, that might be all it takes for it to quickly fix remaining gaps in capability. In particular, if it’s able to generate datasets and run some RL post-training in order to make further progress on particular problems, this might be a good enough crutch while online learning is absent and original ideas can only form as a result of fiddly problem specific RL post-training that needs to be set up first.
It might go that way, but I don’t see strong reasons to expect it.
It’s an argument about long reasoning traces having sufficient representational capacity to bootstrap general intelligence, not forecasting that the bootstrapping will actually occur. It’s about a necessary condition for straightforward scaling to have a chance of getting there, at an unknown level of scale.