If in-context learning from long context can play the role of continual learning, AIs with essentially current architectures may soon match human capabilities in this respect. Long contexts will get more feasible as hardware gets more HBM in a scale-up world (a collection of chips with good networking that can act as a very large composite chip for some purposes). This year we are moving from 0.64-1.44 TB of the 8-chip servers to 14 TB of GB200 NVL72, next year it’s 20 TB with GB300 NVL72, and then in 2028 Rubin Ultra NVL576 will have 147 TB (100x what was feasible in early 2025, if you are not Google).
At 40K tokens per day (as a human might read or think), 1 month is only 1.2M tokens, and 3 years is only 44M tokens, which is merely 44x more than what isbeing currentlyoffered. One issue of course is that this gives up some key AI advantages, it doesn’t offer a straightforward way of combining learnings from many parallel instances into one mind. And it’s not completely clear that it will work at all, but there is also no particular reason that it won’t, or that it will even need anything new invented, other than more scale and some RLVR that incentivises making good use of contexts this long.
If in-context learning from long context can play the role of continual learning, AIs with essentially current architectures may soon match human capabilities in this respect. Long contexts will get more feasible as hardware gets more HBM in a scale-up world (a collection of chips with good networking that can act as a very large composite chip for some purposes). This year we are moving from 0.64-1.44 TB of the 8-chip servers to 14 TB of GB200 NVL72, next year it’s 20 TB with GB300 NVL72, and then in 2028 Rubin Ultra NVL576 will have 147 TB (100x what was feasible in early 2025, if you are not Google).
At 40K tokens per day (as a human might read or think), 1 month is only 1.2M tokens, and 3 years is only 44M tokens, which is merely 44x more than what is being currently offered. One issue of course is that this gives up some key AI advantages, it doesn’t offer a straightforward way of combining learnings from many parallel instances into one mind. And it’s not completely clear that it will work at all, but there is also no particular reason that it won’t, or that it will even need anything new invented, other than more scale and some RLVR that incentivises making good use of contexts this long.