gwern comments on Prosaic Continual Learning

gwern 25 Feb 2026 19:00 UTC
9 points
1

Then, 50 or so days later, Claude 5.1 is launched, with improved capability by the usual process. Claude 5.1 inherits the existing memories and documentation and immediately works on improving and compressing them [11] . Combined with a longer context window, the new Claude 5.1 might buy another 50 days of memory [12] .

Repeat ad nauseam, or at least until Claude N solves true continual learning with parameter updates at runtime.

In this way, the lessons from a particular deployment (say, by a model that has been answering phones for a particular company) are trivially passed from one generation to the next while capabilities continue to improve via regular training. In practice, is there anything more we need true continual learning to do? [13]

This is a reason to “write for the LLMs”, as model release cadence accelerates and knowledge cutoffs gradually move to the present day: pretraining and ICL are additive (maybe even synergistic), quite aside from the inherent limitations of self-attention or the substantial cost of ever-growing context windows (and serious risks as you hand over ever more to the AI) so if you can ‘work in public’ and ensure all of your documentation/‘memories’ are written down in scrape-able places like LW2/GitHub/Gwern.net, then you are doing true continual learning, and so you will get better results out of the box with each new generation. And you can iteratively improve it by using the LLMs—there is an “unofficial gwern.net docs” site which uses Claude to try to reverse-engineer the entire backend+frontend design, which I would like to somehow incorporate into better source code docs, and then regenerate, etc.

(A larger model like Claude-4.6-opus has a knowledge cutoff of May 2025, less than a year ago, so for me, that means I can expect it to know some of my key writings on AI esthetics, some early LLM-assisted poems/stories like “October”, and some of the newer website features like /ref/, but not newer documentation/refactoring or the Gwern.net Manual of Style. Thus, it should be more useful for writing on AI, but I would still need to point context at the MoS or the HEAD for writing/coding.)