I’ve never been compelled by talk about continual learning, but I do like thinking in terms of time horizons. One notion of singularity that we can think of in this context is
Escape velocity: The point at which models improve by more than unit dh/dt i.e. horizon h per wall-clock t.
Then by modeling some ability to regenerate, or continuously deploy improved models you can predict this point. Very surprised I haven’t seen this mentioned before, has someone written about this? The closest thing that comes to mind is T Davidson’s ASARA SIE.
Of course, the METR methodology is likely to break down well before this point, so it’s empirically not very useful. But, I like this framing! Conceptually there will be some point where models can robustly take over R&D work—including training, inventing of new infrastructure (skills, MCPs, cacheing protocols etc.). If they also know when to revisit their previous work they can then work productively over arbitrary time horizons. Escape velocity is a nice concept to have in our toolbox for thinking about R&D automation. It’s a weaker notion than Davidson SIE.
The time horizon metric is about measuring AIs on a scale of task difficulty, where the difficulty is calibrated according to how long it takes humans to complete these tasks. In principe there are 30-year tasks on that scale, the tasks that take humans 30 years to complete. If we were to ask about human ability to complete such tasks, it’ll turn out that they can. Thus the time horizon metric would say that human time horizon is at least 30 years. More generally, an idealized time horizon metric would rate (some) humans as having infinite time horizons (essentially tautologically), and it would similarly rate AIs if they performed uniformly at human level (without being spiky relative to humans).
(To expand the argument in response to the disagree react on the other comment. I don’t have concrete hypotheses for things one might disagree about here, so there must be a basic misunderstanding, hopefully mine.)
I won’t speak for Jacob Pfau, but the easy answer for why infinite time horizons don’t exist is simply due to the fact that we have a finite memory capacity, so tasks that require more than a certain amount of memory simply aren’t doable.
You can at the very best (though already I’m required to deviate from real humans by assuming infinite lifespans) have time horizons that are exponentially larger than the memory capacity that you have, and this is because once you go beyond 2^B time, where B is the bits of memory, you must repeat yourself in a loop, meaning that if a task requires longer than 2^B units of time to solve, you will never be able to complete the task.
Thought provoking, thanks! First off, whether or not humanity (or current humans, or some distribution over humans) has already achieved escape velocity does not directly undercut the utility of escape velocity as a useful concept for AI takeoff. Certainly, I’d say the collective of humans has achieved some leap to universality in a sense that the bag of current near-SotA LLMs has not! And in particular, it’s perfectly reasonable to take humans as our reference to define a unit dh/dt point for AI improvement.
Ok, now on to the hard part (speculating somewhat beyond the scope of my original post). Is there a nice notion of time horizon that generalizes METRs and lets us say something about when humanity has achieved escape velocity? I can think of two versions.
The easy way out is to punt to some stronger reference class of beings to get our time horizons, and measure dh/dt for humanity against this baseline. Now the human team is implicitly time limited by the stronger being’s bound, and we count false positives against the humans even if humanity could eventually self correct.
Another idea is to notice that there’s some class of intractable problems on which current human progress looks like either (1) random search or (2) entirely indirect, instrumental progress—e.g. self-improvement, general tool building etc. In these cases, there may be a sense in which we’re exponentially slower than task-capable agent(s). We should be considered incapable of completing such tasks. I imagine some millenium problems, astrological engineering, etc. would be reasonably considered beyond us on this view.
Overall, I’m not particularly happy with these generalizations. But I still like having ‘escape velocity for AI auto-R&D’ as a threshold!
Humans already have infinite horizon in this sense (under some infeasible idealized methodology). Horizon metrics quantify progress towards overcoming an obstruction that exists in current LLMs, so that we might see the gradual impact of even things like brute scaling.
Humans are not yet the singularity, and there might be more obstructions in LLMs that prevent them from doing better than humans. Perhaps at some point there will be a test time scaling method that doesn’t seem to have a bound on time horizons given enough test time compute, but even then it’s possible that feasible hardware wouldn’t make it either faster or more economical than humans at very long horizon lengths.
I’ve never been compelled by talk about continual learning, but I do like thinking in terms of time horizons. One notion of singularity that we can think of in this context is
Then by modeling some ability to regenerate, or continuously deploy improved models you can predict this point. Very surprised I haven’t seen this mentioned before, has someone written about this? The closest thing that comes to mind is T Davidson’s ASARA SIE.
Of course, the METR methodology is likely to break down well before this point, so it’s empirically not very useful. But, I like this framing! Conceptually there will be some point where models can robustly take over R&D work—including training, inventing of new infrastructure (skills, MCPs, cacheing protocols etc.). If they also know when to revisit their previous work they can then work productively over arbitrary time horizons. Escape velocity is a nice concept to have in our toolbox for thinking about R&D automation. It’s a weaker notion than Davidson SIE.
The time horizon metric is about measuring AIs on a scale of task difficulty, where the difficulty is calibrated according to how long it takes humans to complete these tasks. In principe there are 30-year tasks on that scale, the tasks that take humans 30 years to complete. If we were to ask about human ability to complete such tasks, it’ll turn out that they can. Thus the time horizon metric would say that human time horizon is at least 30 years. More generally, an idealized time horizon metric would rate (some) humans as having infinite time horizons (essentially tautologically), and it would similarly rate AIs if they performed uniformly at human level (without being spiky relative to humans).
(To expand the argument in response to the disagree react on the other comment. I don’t have concrete hypotheses for things one might disagree about here, so there must be a basic misunderstanding, hopefully mine.)
I won’t speak for Jacob Pfau, but the easy answer for why infinite time horizons don’t exist is simply due to the fact that we have a finite memory capacity, so tasks that require more than a certain amount of memory simply aren’t doable.
You can at the very best (though already I’m required to deviate from real humans by assuming infinite lifespans) have time horizons that are exponentially larger than the memory capacity that you have, and this is because once you go beyond 2^B time, where B is the bits of memory, you must repeat yourself in a loop, meaning that if a task requires longer than 2^B units of time to solve, you will never be able to complete the task.
Thought provoking, thanks! First off, whether or not humanity (or current humans, or some distribution over humans) has already achieved escape velocity does not directly undercut the utility of escape velocity as a useful concept for AI takeoff. Certainly, I’d say the collective of humans has achieved some leap to universality in a sense that the bag of current near-SotA LLMs has not! And in particular, it’s perfectly reasonable to take humans as our reference to define a unit dh/dt point for AI improvement.
Ok, now on to the hard part (speculating somewhat beyond the scope of my original post). Is there a nice notion of time horizon that generalizes METRs and lets us say something about when humanity has achieved escape velocity? I can think of two versions.
The easy way out is to punt to some stronger reference class of beings to get our time horizons, and measure dh/dt for humanity against this baseline. Now the human team is implicitly time limited by the stronger being’s bound, and we count false positives against the humans even if humanity could eventually self correct.
Another idea is to notice that there’s some class of intractable problems on which current human progress looks like either (1) random search or (2) entirely indirect, instrumental progress—e.g. self-improvement, general tool building etc. In these cases, there may be a sense in which we’re exponentially slower than task-capable agent(s). We should be considered incapable of completing such tasks. I imagine some millenium problems, astrological engineering, etc. would be reasonably considered beyond us on this view.
Overall, I’m not particularly happy with these generalizations. But I still like having ‘escape velocity for AI auto-R&D’ as a threshold!
(Thanks for expanding! Will return to write a proper response to this tomorrow)
Humans already have infinite horizon in this sense (under some infeasible idealized methodology). Horizon metrics quantify progress towards overcoming an obstruction that exists in current LLMs, so that we might see the gradual impact of even things like brute scaling.
Humans are not yet the singularity, and there might be more obstructions in LLMs that prevent them from doing better than humans. Perhaps at some point there will be a test time scaling method that doesn’t seem to have a bound on time horizons given enough test time compute, but even then it’s possible that feasible hardware wouldn’t make it either faster or more economical than humans at very long horizon lengths.