There’s something that I think is usually missing from time-horizon discussions, which is that the human brain seems to operate on a very long time horizon for entirely different reasons. The story for LLMs looks like this: LLMs become better at programming tasks, therefore they become capable of doing (in a relatively short amount of time) tasks that would take increasingly longer for humans to do. Humans, instead, can just do stuff for a lifetime, and we don’t know where the cap is, and our brain has ways to manage its memories depending on how often they are recalled, and probably other ways to keep itself coherent over long periods. It’s a completely different sort of thing! This makes me think that the trend here isn’t very “deep”. The line will continue to go up as LLMs become better and better at programming, and then it will slow down due to capability gains generally slowing down due to training compute bottlenecks and due to limited inference compute budgets. On the other hand, I think it’s pretty dang likely that we get a drastic trend break in the next few years (i.e., the graph essentially loses its relevance) when we crack the actual mechanisms and capabilities related to continuous operation. For example, continuous learning, clever memory management, and similar things that we might be completely missing at the moment even as concepts.
If in-context learning from long context can play the role of continual learning, AIs with essentially current architectures may soon match human capabilities in this respect. Long contexts will get more feasible as hardware gets more HBM in a scale-up world (a collection of chips with good networking that can act as a very large composite chip for some purposes). This year we are moving from 0.64-1.44 TB of the 8-chip servers to 14 TB of GB200 NVL72, next year it’s 20 TB with GB300 NVL72, and then in 2028 Rubin Ultra NVL576 will have 147 TB (100x what was feasible in early 2025, if you are not Google).
At 40K tokens per day (as a human might read or think), 1 month is only 1.2M tokens, and 3 years is only 44M tokens, which is merely 44x more than what isbeing currentlyoffered. One issue of course is that this gives up some key AI advantages, it doesn’t offer a straightforward way of combining learnings from many parallel instances into one mind. And it’s not completely clear that it will work at all, but there is also no particular reason that it won’t, or that it will even need anything new invented, other than more scale and some RLVR that incentivises making good use of contexts this long.
There’s something that I think is usually missing from time-horizon discussions, which is that the human brain seems to operate on a very long time horizon for entirely different reasons. The story for LLMs looks like this: LLMs become better at programming tasks, therefore they become capable of doing (in a relatively short amount of time) tasks that would take increasingly longer for humans to do. Humans, instead, can just do stuff for a lifetime, and we don’t know where the cap is, and our brain has ways to manage its memories depending on how often they are recalled, and probably other ways to keep itself coherent over long periods. It’s a completely different sort of thing! This makes me think that the trend here isn’t very “deep”. The line will continue to go up as LLMs become better and better at programming, and then it will slow down due to capability gains generally slowing down due to training compute bottlenecks and due to limited inference compute budgets. On the other hand, I think it’s pretty dang likely that we get a drastic trend break in the next few years (i.e., the graph essentially loses its relevance) when we crack the actual mechanisms and capabilities related to continuous operation. For example, continuous learning, clever memory management, and similar things that we might be completely missing at the moment even as concepts.
If in-context learning from long context can play the role of continual learning, AIs with essentially current architectures may soon match human capabilities in this respect. Long contexts will get more feasible as hardware gets more HBM in a scale-up world (a collection of chips with good networking that can act as a very large composite chip for some purposes). This year we are moving from 0.64-1.44 TB of the 8-chip servers to 14 TB of GB200 NVL72, next year it’s 20 TB with GB300 NVL72, and then in 2028 Rubin Ultra NVL576 will have 147 TB (100x what was feasible in early 2025, if you are not Google).
At 40K tokens per day (as a human might read or think), 1 month is only 1.2M tokens, and 3 years is only 44M tokens, which is merely 44x more than what is being currently offered. One issue of course is that this gives up some key AI advantages, it doesn’t offer a straightforward way of combining learnings from many parallel instances into one mind. And it’s not completely clear that it will work at all, but there is also no particular reason that it won’t, or that it will even need anything new invented, other than more scale and some RLVR that incentivises making good use of contexts this long.