And newer approaches[8] seem to eliminate the problem entirely, so it is now a contingent engineering limitation rather than a fundamental critique of LLMs.
Only skimmed it, but I think the two main contributions are a way to generate useful synthetic text from a limited corpus so pretraining can keep improving even when new real data is scarce (paper), and an automated AI research loop design (paper), a bit more fancy and automated, but similar to what Karpathy has recently been posting about.
If you’re looking for examples of LLMs that do weight based continual learning/belief update, the Titans, and Nested Learning papers might be a better fit (I described them in a recent post), or, if you want something closer to a vanilla transformer architecture, E2E-TTT.
Only skimmed it, but I think the two main contributions are a way to generate useful synthetic text from a limited corpus so pretraining can keep improving even when new real data is scarce (paper), and an automated AI research loop design (paper), a bit more fancy and automated, but similar to what Karpathy has recently been posting about.
If you’re looking for examples of LLMs that do weight based continual learning/belief update, the Titans, and Nested Learning papers might be a better fit (I described them in a recent post), or, if you want something closer to a vanilla transformer architecture, E2E-TTT.