I doubt that METR’s graph stays linear (on a log to date scale). I accomplish long tasks by tackling a series of small tasks. Both these small tasks and the administrative task of figuring out what to do next (and what context i need a s refresher on to accomplish it) are less than a day long. So at some point I expect a team of agents (disguised as a monolith) with small individual task length success to pass a critical mass of competence and become capable of much longer tasks.
It doesn’t follow from an AI being able to do the components of a task that the AI can do the task itself. This is because the ability to carry out the subcomponents of a task does not entail the knowledge that these are the subcomponents of the task.
I do think that each subsequent doubling of time-horizons will be in some sense easier than the last. But this is counteracted by the fact that RL becomes more difficult as task lengths increase. One can imagine it being difficult to have a doubling time of, say, 2 months, when the AI is learning to do tasks of length 8 months. It might take the AI longer than 2 months just to spit out an attempt at one of the tasks! I think it’s an open question which of these two forces is stronger.
Doesn’t the (relatively short) task my manager does, of breaking projects into component tasks for me to do entail knowledge of the specific subcomponents? Is there a particular reason to believe that this task won’t be solved by an AI that otherwise knows to accomplish tasks of similar length?
And automated adaptation (continual learning, test time training) should enable a lot of serial time that would overcome even issues with splitting a problem into subproblems (it’s not necessarily possible to solve a 10-year problem in 2 years with any number of competent researchers and managers). So to the extent in-context learning implements continual learning, presence of any visible bounds on time horizons in capabilities indicates and quantifies limitations of how well it actually does implement continual learning. A genuine advancement in continual learning might well immediately do away with any time horizons entirely.
I doubt that METR’s graph stays linear (on a log to date scale). I accomplish long tasks by tackling a series of small tasks. Both these small tasks and the administrative task of figuring out what to do next (and what context i need a s refresher on to accomplish it) are less than a day long. So at some point I expect a team of agents (disguised as a monolith) with small individual task length success to pass a critical mass of competence and become capable of much longer tasks.
It doesn’t follow from an AI being able to do the components of a task that the AI can do the task itself. This is because the ability to carry out the subcomponents of a task does not entail the knowledge that these are the subcomponents of the task.
I do think that each subsequent doubling of time-horizons will be in some sense easier than the last. But this is counteracted by the fact that RL becomes more difficult as task lengths increase. One can imagine it being difficult to have a doubling time of, say, 2 months, when the AI is learning to do tasks of length 8 months. It might take the AI longer than 2 months just to spit out an attempt at one of the tasks! I think it’s an open question which of these two forces is stronger.
Doesn’t the (relatively short) task my manager does, of breaking projects into component tasks for me to do entail knowledge of the specific subcomponents? Is there a particular reason to believe that this task won’t be solved by an AI that otherwise knows to accomplish tasks of similar length?
And automated adaptation (continual learning, test time training) should enable a lot of serial time that would overcome even issues with splitting a problem into subproblems (it’s not necessarily possible to solve a 10-year problem in 2 years with any number of competent researchers and managers). So to the extent in-context learning implements continual learning, presence of any visible bounds on time horizons in capabilities indicates and quantifies limitations of how well it actually does implement continual learning. A genuine advancement in continual learning might well immediately do away with any time horizons entirely.