Hmm, there might be relevant limitations based on the structure of the model, but those limitations seem to be peculiar to the model under consideration. They don’t seem to generalise to arbitrary systems selected for minimising predictive loss on text prediction.
That is, I don’t think they’re a fundamental limitation of language models, and it was the limits of language models I mostly wanted to explore in this post.
1. I was commenting on your “Moreover, the diversity and comprehensiveness of the dataset a language model is trained on will limit the capabilities it can actually attain in deployment. I.e. that a particular upper bound exists in principle, does not mean it will be realised in practice.”: I think that in practice what’s realisable will be limited at least as much by the structure of the model as by how it’s trained. So it’s not just “no matter how fancy a model we build, some plausible training methods will not enable it to do this” but also “no matter how fancy a training method we use, some plausible architectures will not be able to do this”, and that seemed worth making explicit.
2. In between “current versions of GPT” and “absolutely anything that is in some sense trying to predict text” it seems like there’s an interesting category of “things with the same general sort of structure as current LLMs but maybe trained differently”.
(I worry a little that a definition of “language model” much less restrictive than that may end up including literally everything capable of using language, including us and hypothetical AGIs specifically designed to be AGIs.)
“no matter how fancy a training method we use, some plausible architectures will not be able to do this”, and that seemed worth making explicit.
Fair enough. I’ll try and add a fragment to the post making this argument (at a high level of generality, I’m too ignorant about LLM architecture details to describe such limitations in concrete terms).
(I worry a little that a definition of “language model” much less restrictive than that may end up including literally everything capable of using language, including us and hypothetical AGIs specifically designed to be AGIs.)
I’m using “language model” here to refer to systems optimised solely for the task of predicting text.
Hmm, there might be relevant limitations based on the structure of the model, but those limitations seem to be peculiar to the model under consideration. They don’t seem to generalise to arbitrary systems selected for minimising predictive loss on text prediction.
That is, I don’t think they’re a fundamental limitation of language models, and it was the limits of language models I mostly wanted to explore in this post.
Agreed. But:
1. I was commenting on your “Moreover, the diversity and comprehensiveness of the dataset a language model is trained on will limit the capabilities it can actually attain in deployment. I.e. that a particular upper bound exists in principle, does not mean it will be realised in practice.”: I think that in practice what’s realisable will be limited at least as much by the structure of the model as by how it’s trained. So it’s not just “no matter how fancy a model we build, some plausible training methods will not enable it to do this” but also “no matter how fancy a training method we use, some plausible architectures will not be able to do this”, and that seemed worth making explicit.
2. In between “current versions of GPT” and “absolutely anything that is in some sense trying to predict text” it seems like there’s an interesting category of “things with the same general sort of structure as current LLMs but maybe trained differently”.
(I worry a little that a definition of “language model” much less restrictive than that may end up including literally everything capable of using language, including us and hypothetical AGIs specifically designed to be AGIs.)
Fair enough. I’ll try and add a fragment to the post making this argument (at a high level of generality, I’m too ignorant about LLM architecture details to describe such limitations in concrete terms).
I’m using “language model” here to refer to systems optimised solely for the task of predicting text.