1. I was commenting on your “Moreover, the diversity and comprehensiveness of the dataset a language model is trained on will limit the capabilities it can actually attain in deployment. I.e. that a particular upper bound exists in principle, does not mean it will be realised in practice.”: I think that in practice what’s realisable will be limited at least as much by the structure of the model as by how it’s trained. So it’s not just “no matter how fancy a model we build, some plausible training methods will not enable it to do this” but also “no matter how fancy a training method we use, some plausible architectures will not be able to do this”, and that seemed worth making explicit.
2. In between “current versions of GPT” and “absolutely anything that is in some sense trying to predict text” it seems like there’s an interesting category of “things with the same general sort of structure as current LLMs but maybe trained differently”.
(I worry a little that a definition of “language model” much less restrictive than that may end up including literally everything capable of using language, including us and hypothetical AGIs specifically designed to be AGIs.)
“no matter how fancy a training method we use, some plausible architectures will not be able to do this”, and that seemed worth making explicit.
Fair enough. I’ll try and add a fragment to the post making this argument (at a high level of generality, I’m too ignorant about LLM architecture details to describe such limitations in concrete terms).
(I worry a little that a definition of “language model” much less restrictive than that may end up including literally everything capable of using language, including us and hypothetical AGIs specifically designed to be AGIs.)
I’m using “language model” here to refer to systems optimised solely for the task of predicting text.
Agreed. But:
1. I was commenting on your “Moreover, the diversity and comprehensiveness of the dataset a language model is trained on will limit the capabilities it can actually attain in deployment. I.e. that a particular upper bound exists in principle, does not mean it will be realised in practice.”: I think that in practice what’s realisable will be limited at least as much by the structure of the model as by how it’s trained. So it’s not just “no matter how fancy a model we build, some plausible training methods will not enable it to do this” but also “no matter how fancy a training method we use, some plausible architectures will not be able to do this”, and that seemed worth making explicit.
2. In between “current versions of GPT” and “absolutely anything that is in some sense trying to predict text” it seems like there’s an interesting category of “things with the same general sort of structure as current LLMs but maybe trained differently”.
(I worry a little that a definition of “language model” much less restrictive than that may end up including literally everything capable of using language, including us and hypothetical AGIs specifically designed to be AGIs.)
Fair enough. I’ll try and add a fragment to the post making this argument (at a high level of generality, I’m too ignorant about LLM architecture details to describe such limitations in concrete terms).
I’m using “language model” here to refer to systems optimised solely for the task of predicting text.