For me a specific crux is scaling laws of R1-like training, what happens when you try to do much more of it, which inputs to this process become important constraints and how much they matter. This working out was extensively brandished but not yet described quantitatively, all the reproductions of long reasoning training only had one iteration on top of some pretrained model, even o3 isn’t currently known to be based on the same pretrained model as o1.
The AI 2027 story heavily leans into RL training taking off promptly, and it’s possible they are resonating with some insider rumors grounded in reality, but from my point of view it’s too early to tell. I guess in a few months to a year there should be enough public data to tell something, but then again a quantitative model of scaling for MoE (compared to dense) was only published in Jan 2025, even though MoE was already key to original GPT-4 trained in 2022.
For me a specific crux is scaling laws of R1-like training, what happens when you try to do much more of it, which inputs to this process become important constraints and how much they matter. This working out was extensively brandished but not yet described quantitatively, all the reproductions of long reasoning training only had one iteration on top of some pretrained model, even o3 isn’t currently known to be based on the same pretrained model as o1.
The AI 2027 story heavily leans into RL training taking off promptly, and it’s possible they are resonating with some insider rumors grounded in reality, but from my point of view it’s too early to tell. I guess in a few months to a year there should be enough public data to tell something, but then again a quantitative model of scaling for MoE (compared to dense) was only published in Jan 2025, even though MoE was already key to original GPT-4 trained in 2022.