(I don’t mind saying this because it is obvious to anyone following the literature who has watched prior blessings of scale happen, and in particular, how each subfield copes with the realization that their problem was never a real one and all their clever ideas only mattered at scales which are quickly becoming OOMs irrelevant; and the continual-learning people already are going through the stages of grief, so a throwaway LW comment from me makes no difference.)
If you are trying to model DL capabilities, you should just assume continual-learning is already solved for all intents and purposes at GPT-4 scale (and note, for example, OA’s revealed preferences in terms of training models from scratch vs further training old checkpoints) until you see an extremely compelling empirical demonstration to the contrary. We don’t see it much overtly, simply because fullblown ‘finetuning’ is often not easy, and is much more expensive, and can be replaced to a considerable degree by tricks like retrieval or better prompts when your underlying model is really smart.
Fascinating, thanks for the research. Your analysis makes sense and seems to indicate that for most situations, prompt engineering is the always the first plan of attack and often works well enough. Then, a step up from there, OpenAI/etc would most likely experiment with fine-tuning or RLHF as it relates to a specific business need. To train a better chatbot and fill in any gaps, they probably would get more bang for their buck on simply fine-tuning it on a large dataset that matched their needs. For example, if they wanted to do better mathematical reasoning, they’d probably pay people to generate detailed scratchwork and fine-tune a whole dataset in batch, rather than set up an elaborate “tutor” framework. Continual learning itself would be mainly applicable for research into whether the thing spontaneously develops a sense of self, or seeing if this helps with the specific case of long term planning and agency. These are things the general public are fascinated with, but perhaps don’t seem to be the most promising direction for improving a company’s bottom line yet.
Continual learning is a blessing of scale: https://www.reddit.com/r/mlscaling/search?q=continual+learning&restrict_sr=on&include_over_18=on
(I don’t mind saying this because it is obvious to anyone following the literature who has watched prior blessings of scale happen, and in particular, how each subfield copes with the realization that their problem was never a real one and all their clever ideas only mattered at scales which are quickly becoming OOMs irrelevant; and the continual-learning people already are going through the stages of grief, so a throwaway LW comment from me makes no difference.)
If you are trying to model DL capabilities, you should just assume continual-learning is already solved for all intents and purposes at GPT-4 scale (and note, for example, OA’s revealed preferences in terms of training models from scratch vs further training old checkpoints) until you see an extremely compelling empirical demonstration to the contrary. We don’t see it much overtly, simply because fullblown ‘finetuning’ is often not easy, and is much more expensive, and can be replaced to a considerable degree by tricks like retrieval or better prompts when your underlying model is really smart.
Fascinating, thanks for the research. Your analysis makes sense and seems to indicate that for most situations, prompt engineering is the always the first plan of attack and often works well enough. Then, a step up from there, OpenAI/etc would most likely experiment with fine-tuning or RLHF as it relates to a specific business need. To train a better chatbot and fill in any gaps, they probably would get more bang for their buck on simply fine-tuning it on a large dataset that matched their needs. For example, if they wanted to do better mathematical reasoning, they’d probably pay people to generate detailed scratchwork and fine-tune a whole dataset in batch, rather than set up an elaborate “tutor” framework. Continual learning itself would be mainly applicable for research into whether the thing spontaneously develops a sense of self, or seeing if this helps with the specific case of long term planning and agency. These are things the general public are fascinated with, but perhaps don’t seem to be the most promising direction for improving a company’s bottom line yet.