I have computed time horizon trends for more general software engineering tasks (i.e. with a bigger context) and my preliminary results point towards a logistic trend, i.e. the exponential is already tapering off. However, I am still pretty uncertain about that.
I predict this is basically due to noise, or at best is a very short-lived trend, similarly to the purported faster trend of RL scaling allowing a doubling of 4 months on certain tasks that is basically driven by good scaffolding (which is what RL-on-CoTs was mostly shown to be) and not a creation of new capabilities.
I predict this is basically due to noise, or at best is a very short-lived trend, similarly to the purported faster trend of RL scaling allowing a doubling of 4 months on certain tasks that is basically driven by good scaffolding (which is what RL-on-CoTs was mostly shown to be) and not a creation of new capabilities.
Very possible.
I plan to watch this a bit longer and also analyse how the trend changes with repo size.