Playing chess at the level of the best humans, using a similar amount of data. Maybe learning to solve IMO level problems, in-context. More relevant is becoming fluent in novel specialized theory (or empirical facts) being developed as part of the current project, or gaining tacit knowledge in the form of skills that can’t be usefully written down as notes to be looked up, without effectively learning their contents as skills, as one would learn to play world class chess after looking at some games and learning the rules (as external training data, likely prompting things like generation of more internal synthetic training data or RLVR tasks and RL environments).
In-context learning is in principle sufficient, especially with true recurrence (I don’t mean SSMs, where activations are still climbing the layers and computations have bounded depth; as with chain of thought reasoning, recurrence needs to pass information back to the same layer as the sequence progresses; likely block-level recurrence with blocks of variable length is appropriate for this, to both retain fast prefill and pretrain updating of recurrent state for deep reasoning). But observing the first credible steps of scaling pretraining since original Mar 2023 GPT-4 (in Opus 4 and Gemini 3 Pro), LLMs don’t seem to be on track to start learning difficult skills in-context. They can learn the rules of chess in-context, but can’t learn to play at the level of the best human chess players in-context, and it doesn’t seem that they will get sufficiently better at this even as scaling of pretraining advances further, before it runs out of useful natural text data.
Playing chess at the level of the best humans, using a similar amount of data. Maybe learning to solve IMO level problems, in-context. More relevant is becoming fluent in novel specialized theory (or empirical facts) being developed as part of the current project, or gaining tacit knowledge in the form of skills that can’t be usefully written down as notes to be looked up, without effectively learning their contents as skills, as one would learn to play world class chess after looking at some games and learning the rules (as external training data, likely prompting things like generation of more internal synthetic training data or RLVR tasks and RL environments).
In-context learning is in principle sufficient, especially with true recurrence (I don’t mean SSMs, where activations are still climbing the layers and computations have bounded depth; as with chain of thought reasoning, recurrence needs to pass information back to the same layer as the sequence progresses; likely block-level recurrence with blocks of variable length is appropriate for this, to both retain fast prefill and pretrain updating of recurrent state for deep reasoning). But observing the first credible steps of scaling pretraining since original Mar 2023 GPT-4 (in Opus 4 and Gemini 3 Pro), LLMs don’t seem to be on track to start learning difficult skills in-context. They can learn the rules of chess in-context, but can’t learn to play at the level of the best human chess players in-context, and it doesn’t seem that they will get sufficiently better at this even as scaling of pretraining advances further, before it runs out of useful natural text data.