The nearest thing to an argument seems to be this: “But texts do not in fact contain meaning within themselves. If they did, you’d be able to read texts in a foreign language and understand them perfectly.” … which seems to me to be missing a vital step, where you explain how to get from “texts contain meaning” to “anyone looking at a text will understand its meaning even if they don’t know the language it’s in.”
This is confused. Who’s saying “texts contain meaning”? It’s not me.
Perhaps you can take a look at the Wikipedia entry for the conduit metaphor. It explains why the idea of texts ‘containing’ meaning is incoherent.
No one is saying “texts contain meaning” (though I think something a bit like it is true), but you were saying that if texts contained meaning then we’d all be able to understand texts in languages we don’t know, and I’m saying that that seems Just Plain Wrong to me; I don’t see how you get from “texts contain meaning” (which you claim implies that we should all be able to understand everything) to “we should all be able to understand everything” (which is the thing you claim is implied). Those things seem to me to have nothing to do with one another.
I think texts have meaning. “Contain” is imprecise. I agree that the “conduit” metaphor can be misleading. If that Wikipedia page shows that it’s incoherent, as opposed to merely being a metaphor, then I am not sure where. I think that maybe when you say “if texts contain meaning then …” you mean something like “if it were literally the case that meaning is some sort of substance physically contained within texts then …”; something like that might be true, but it feels strawman-ish to me; who actually believes that “texts contain meaning” in any sense literal enough to justify that conclusion?
Again, “meaning” has too many meanings, so let me be more explicit about what I think about what texts do and don’t have/contain/express/etc.
For most purposes, “text T has meaning M” cashes out as something like “the intended audience, on reading text T, will come to understand M”.
Obviously this is audience-dependent and context-dependent.
In some circumstances, the same string of characters can convey very different meanings to different audiences, because it happens to mean very different things in different languages, or because of irony, or whatever.
Even for a particular audience in a particular context, “the” meaning of a text can be uncertain or vague.
When we ask about the meaning of a text, there are at least two related but different things we may be asking about. We may be considering the text as having a particular author and asking what they, specifically, intended on this occasion. Or we may be considering it more abstractly and asking what it means, which is roughly equivalent to asking what a typical person saying that thing would typically mean.
Because text can be ambiguous, “the” meaning of a text is really more like a probability distribution for the author’s intention (meaning the intention of the actual author for the first kind of meaning, and of something like all possible authors for the second kind).
How broad and uncertain that probability distribution is depends on both the text and its context.
Longer texts typically have the effect of narrowing the distribution for the meaning of any given part of the text—each part of the text provides context for the rest of it. (Trivial example: if I say “That was great” then I may or may not be being sarcastic. If I say “That was great. Now we’re completely screwed.” then the second sentence makes it clear that the first sentence was sarcastic. Not all examples are so clear-cut.)
Greater ignorance on the part of the audience broadens the distribution. In an extreme case, a single sentence typically conveys approximately zero information to someone who doesn’t know the language it’s in.
Extra text can outweigh ignorance. (Trivial example: if I make up a word and say something using my made-up word, you probably won’t get much information from that. But if I first say what the word means, it’s about as good as if I’d avoided the neologism altogether. Again, not all examples are so clear-cut.)
I think that a very large amount of extra text can be enough to outweigh even the ignorance of knowing nothing about the language the text is in; if you have millions of books’ worth of text, I think that typically for at least some parts of that large body of text the probability distribution will be very narrow.
In cases where the probability distribution is very narrow, I think it is reasonable to say things like “the meaning of text T is M” even though of course it’s always possible for someone to write T with a very different intention.
In such cases, I think it is reasonable to say this even if in some useful sense there was no original author at all (e.g., the text was produced by an LLM and we do not want to consider it an agent with intentions). This has to be the second kind of “meaning”, which as mentioned above can if you like be cashed out in terms of the first kind: “text T means M” means “across all cases where an actual author says T, it’s almost always the case that they intended M”.
You may of course disagree with any of that, think that I haven’t given good enough reasons to believe any of it, etc., but I hope you can agree that I am not claiming or assuming that meaning is some sort of substance physically contained within text :-).
...who actually believes that “texts contain meaning” in any sense literal enough to justify that conclusion?
Once it’s pointed out that “texts contain meaning” is a metaphor, no one believes it, but they continue to believe it. So, stop relying on the metaphor. Why do you insist that it can be salvaged? It can’t.
It feels to me as if you are determined to think the worst of those who aren’t agreeing with you, here. (B: Obviously gjm thinks that texts contain meaning in the same sort of way as a bag of rice contains rice! G: No, of course I don’t think that, that would be stupid. B: Well, obviously you did think that until I pointed it out, whereas now you incoherently continue to believe it while saying you don’t!) Why not consider the possibility that someone might be unconvinced by your arguments for reasons other than stupidity and pigheadedness?
Anyway. I’m not sure in what sense I’m “insisting that it can be salvaged”. If you think that the term “meaning” is unsalvageable, or that talking about meaning being “in” things, is a disastrous idea because it encourages confusion … well, maybe don’t write an article and give it a title about “meaning in LLMs”?
Maybe I’m misunderstanding what you mean by “insisting that it can be salvaged”. What specific thing are you saying I’m insisting on?
[EDITED to add:] I thought of a way in which the paragraph before last might be unfair: perhaps you think it’s perfectly OK to talk about meaning being “in” minds but not to talk about meaning being “in” texts, in which case asking whether is “in” LLMs would be reasonable since LLMs are more like minds than they are like texts. And then part of your argument is: they aren’t mind-like enough, especially in terms of relationships to the external world, for meaning to be “in” them. Fair enough. Personally, I think that all talk of meaning being “in” things is metaphorical, that it’s more precise to think of “mean” as a verb and “meaning” specifically as its present participle, that “meaning” in this sense is a thing that people (and maybe other person-like things) do from time to time—but also that there’s a perfectly reasonable derived notion of “meaning” as applied to the utterances and other actions that people carry out as part of the process of “meaning”, so that it is not nonsense to say that a text “means” something, either as produced by a particular person or in the abstract. (And so far as I can currently tell none of this depends on holding on to the literally-false implications of any dubious metaphor.)
It feels to me as if you are determined to think the worst of those who aren’t agreeing with you, here.
The feeling is mutual.
Believe it or not I pretty much knew everything you’ve said in this exchange long before I made the post. I’ve thought this through with considerable care. In the OP I linked to a document where I make my case with considerably more care, GPT-3: Waterloo or Rubicon? Here be Dragons.
What I think is that LLMs are new kinds of things and that we cannot rely on existing concepts in trying to understand them. Common-sense terms such as “meaning” are particularly problematic, as are “think” and “understand.” We need to come up with new concepts and new terms. That is difficult. As a practical matter we often need to keep using the old terms.
The term “meaning” has been given a quasi-technical …. meaning? Really? Can we use that word at all? That word is understood by a certain diffuse community of thinkers in a way that pretty much excludes LLMs from having it, nor do they have “understanding” nor do they “think.” That’s a reasonable usage. That usage is what I have in mind.
Some in that community, however, are also saying that LLMs are “stochastic parrots”, though it’s not at all clear just what exactly they’re talking about. But they clearly what to dismiss them. I think that’s a mistake. A big mistake.
They aren’t human, they certainly don’t relate to the world in the way humans do, and they produce long strings of very convincing text. They’re doing a very good imitation of understanding, thinking, of having meaning. But that phrasing, “good imitation of...”, is just a work-around. What is it that they ARE doing? As far as I can tell, no one knows. But talking about “meaning” with respect to LLMs where the term is implicitly understood to be identical to “meaning” with respect to humans, no, that’s not getting us anywhere.
We are in agreement on pretty much everything in this last comment.
Your Waterloo/Rubicon article does the same thing as your shorter post here: flatly asserts that since GPT-3 “has access only to those strings”, therefore “there is no meaning there. Only entanglement”, without troubling to argue for that position. (Maybe you think it’s obvious; maybe you just aren’t interested in engaging with people who disagree with enough of your premises for it to be false; maybe you consider that the arguments are all available elsewhere and don’t want to repeat them.)
Your article correctly points out that “Those words in the corpus were generated by people conveying knowledge of, attempting to make sense of, the world. Those strings are coupled with the world, albeit asynchronously”. I agree that this is important. But I claim it is also important that the LLM is coupled, via the corpus, to the world, and hence its output is coupled, via the LLM, to the world. The details of those couplings matter in deciding whether it’s reasonable to say that the LLM’s output “means” something; we are not entitled to claim that “there is no meaning there” without getting to grips with that.
In the present discussion you are (sensibly and intelligently) saying that terms like “meaning”, “think”, and “understand” are problematic, that the way in which we have learned to use them was formed in a world where we were the only things meaning, thinking, and understanding, and that it’s not clear how best to apply those terms to new entities like LLMs that are somewhat but far from exactly like us, and that maybe we should avoid them altogether when talking about such entities. All very good. But in the Waterloo/Rubicon article, and also in the OP here, you are not quite so cautious; “there is no meaning there”, you say; “from first principles it is clear that GPT-3 lacks understanding and access to meaning”; what it has is “simulacra of understanding”. I think more-cautious-BB is wiser than less-cautious-BB: it is not at all clear (and I think it is probably false) that LLMs are unable in principle to learn to behave in ways externally indistinguishable from the ways humans behave when “meaning” and “understanding” and “thinking”, and if they do then I think asking whether they mean/understand/think will be like asking whether aeroplanes fly or whether submarines swim. (We happen to have chosen opposite answers in those two cases, and clearly it doesn’t matter at all.)
But I claim it is also important that the LLM is coupled, via the corpus, to the world, and hence its output is coupled, via the LLM, to the world.
What? The corpus is coupled to the world through the people who wrote the various texts and who read and interpret them. Moreover that sentence seems circular. You say, “its output is coupled...” What is the antecedent of “its”? It would seem to be the LLM. So we have something like, “The output of the LLM is coupled, via the LLM, to the world.”
I’m tired of hearing about airplanes (and birds) and submarines (and fish). In all cases we understand more or less the mechanics involved. We can make detailed comparisons and talk about similarities and differences. We can’t do that with humans and LLMs.
It goes: world <-> people <-> corpus <-> LLM <-> LLM’s output.
There is no circularity in “The output of the LLM is coupled, via the LLM, to the world” (which is indeed what I meant).
I agree that we don’t understand LLMs nearly as well as we do plans and submarines, nor human minds nearly as well as the locomotory mechanics of birds and fish. But even if we had never managed to work out how birds fly, and even if planes had been bestowed upon us by a friendly wizard and we had no idea how they worked either, it would be reasonable for us to say that planes fly even though they do it by means very different to birds.
I care about many things, but one that’s important here is that I care about understanding the world. For instance, I am curious about the capabilities (present and future) of AI systems. You say that from first principles we can tell that LLMs trained on text can’t actually mean anything they “say”, that they can have only a simulacrum of understanding, etc. So I am curious about (1) whether this claim tells us anything about what they can actually do as opposed to what words you choose to use to describe them, and (2) if so whether it’s correct.
Another thing I care about is clarity of thought and communication (mine and, to a lesser but decidedly nonzero extent, other people’s). So to whatever extent your thoughts on “meaning”, “understanding”, etc., are about that more than they’re about what LLMs can actually do, I am still interested, because when thinking and talking about LLMs I would prefer not to use language in a systematically misleading way. (At present, I would generally avoid saying that an LLM “means” whatever strings of text it emits, because I don’t think there’s anything sufficiently mind-like in there, but I would not avoid saying that in many cases those strings of text “mean” something, for reasons I’ve already sketched above. My impression is that you agree about the first of those and disagree about the second. So presumably at least one of us is wrong, and if we can figure out who then that person can improve their thinking and/or communication a bit.)
Back to the actual discussion, such as it is. People do in fact typically read the output of LLMs, but there isn’t much information flow back to the LLMs after that, so that process is less relevant to the question of whether and how and how much LLMs’ output is coupled to the actual world. That coupling happens via the chain I described two comments up from this one: world → people → corpus → LLM weights → LLM output. It’s rather indirect, but the same goes for a lot of what humans say.
Several comments back, I asked what key difference you see between that chain (which in your view leaves no scope for the LLM’s output to “mean” anything) and one that goes world → person 1 → writing → person 2 → person 2′s words (which in your view permits person 2′s words to mean things even when person 2 is talking about something they know about only indirectly via other people’s writing). It seems to me that if the problem is a lack of “adhesion”—contact with the real world—then that afflicts person 2 in this scenario in the same way as it afflicts the LLM in the other scenario: both are emitting words whose only connection to the real world is an indirect one via other people’s writing. I assume you reckon I’m missing some important point here; what is it?
That’s a bunch of stuff, more than I can deal with at the moment.
On the meaning of “meaning,” it’s a mess and people in various discipline have been arguing it for 3/4s of a century or more at this point. You might want to take a look at a longish comment I posted above, if you haven’t already. It’s a passage from another article, where I make the point that terms like “think” don’t really tell us much at all. What matters to me at this point are the physical mechanisms, and those terms don’t convey much about those mechanisms.
On LLMs, GPT-4 now has plug-ins. I recently saw a YouTube video about the Wolfram Alpha plug-in. You ask GPT-4 a question, it decides to query Wolfram Alpha and sends a message. Alpha does something, sends the result back to GPT-4, which presents the result to you. So now we have Alpha interpreting messages from GPT-4 and GPT-4 interpreting messages from Alpha. How reliable is that circuit? Does it give the human user what they want? How does “meaning” work in that circuit.
I first encountered the whole business of meaning in philosophy and literary criticism. So, you read Dickens’ A Tale of Two Cities or Frank Herbert’s Dune, whatever. It’s easy to say those texts have meaning. But where does that meaning come from? When you read those texts, the meaning comes from you. When I read them, it comes from me. What about the meanings the authors put into them? You can see where I’m going with this. Meaning is not like wine, that can be poured from one glass to another and remain the same. Well, literary critics argued about that one for decades. The issue’s never really been settled. It’s just been dropped, more or less.
ChatGPT produces text, lots of it. When you read one of those texts, where does the meaning come from? Let’s ask a different question. People are now using output from LLMs as a medium for interacting with one another. How is that working out? Where can LLM text be useful and where not? What’s the difference? Those strike me as rather open-ended questions for which we do not have answers at the moment.
I think it’s clear that when you read a book the meaning is a product of both you and the book, because if instead you read a different book you’d arrive at different meaning, and different people reading the same book get to-some-extent-similar meanings from it. So “the meaning comes from you” / “the meaning comes from me” is too simple. It seems to me that generally you get more-similar meanings when you keep the book the same and change the reader than when you keep the reader the same and change the book, though of course it depends on how big a change you make in either case, so I would say more of the meaning is in the text than in the reader. (For the avoidance of doubt: no, I do not believe that there’s some literal meaning-stuff that we could distil from books and readers and measure. “In” there is a metaphor. Obviously.)
I agree that there are many questions to which we don’t have answers, and that more specific and concrete questions may be more illuminating than very broad and vague ones like “does the text emited by an LLM have meaning?”.
I don’t know how well the GPT/Wolfram|Alpha integration works (I seem to remember reading somewhere that it’s very flaky, but maybe they’ve made it better), but I suggest that to whatever extent it successfully results in users getting information that’s correct on account of Alpha’s databases having been filled with data derived from how the world actually is, and its algorithms having been designed to match how mathematics actually works, that’s an indication that in some useful sense some kind of meaning is being (yes, metaphorically) transmitted.
...the question of whether or not the language produced by LLMs is meaningful is up to us. Do you trust it? Do WE trust it? Why or why not?
That’s the position I’m considering. If you understand “WE” to mean society as a whole, then the answer is that the question is under discussion and is undetermined. But some individuals do seem to trust the text from certain LLMs at least under certain circumstances. For the most part I trust the output of ChatGPT and GPT-4, with which I have considerably less experience than I do with ChatGPT. I know that both systems make mistakes of various kinds, including what is called “hallucination.” It’s not clear to me that that differentiates them from ordinary humans, who make mistakes and often say things without foundation in reality.
This is confused. Who’s saying “texts contain meaning”? It’s not me.
Perhaps you can take a look at the Wikipedia entry for the conduit metaphor. It explains why the idea of texts ‘containing’ meaning is incoherent.
No one is saying “texts contain meaning” (though I think something a bit like it is true), but you were saying that if texts contained meaning then we’d all be able to understand texts in languages we don’t know, and I’m saying that that seems Just Plain Wrong to me; I don’t see how you get from “texts contain meaning” (which you claim implies that we should all be able to understand everything) to “we should all be able to understand everything” (which is the thing you claim is implied). Those things seem to me to have nothing to do with one another.
I think texts have meaning. “Contain” is imprecise. I agree that the “conduit” metaphor can be misleading. If that Wikipedia page shows that it’s incoherent, as opposed to merely being a metaphor, then I am not sure where. I think that maybe when you say “if texts contain meaning then …” you mean something like “if it were literally the case that meaning is some sort of substance physically contained within texts then …”; something like that might be true, but it feels strawman-ish to me; who actually believes that “texts contain meaning” in any sense literal enough to justify that conclusion?
Again, “meaning” has too many meanings, so let me be more explicit about what I think about what texts do and don’t have/contain/express/etc.
For most purposes, “text T has meaning M” cashes out as something like “the intended audience, on reading text T, will come to understand M”.
Obviously this is audience-dependent and context-dependent.
In some circumstances, the same string of characters can convey very different meanings to different audiences, because it happens to mean very different things in different languages, or because of irony, or whatever.
Even for a particular audience in a particular context, “the” meaning of a text can be uncertain or vague.
When we ask about the meaning of a text, there are at least two related but different things we may be asking about. We may be considering the text as having a particular author and asking what they, specifically, intended on this occasion. Or we may be considering it more abstractly and asking what it means, which is roughly equivalent to asking what a typical person saying that thing would typically mean.
Because text can be ambiguous, “the” meaning of a text is really more like a probability distribution for the author’s intention (meaning the intention of the actual author for the first kind of meaning, and of something like all possible authors for the second kind).
How broad and uncertain that probability distribution is depends on both the text and its context.
Longer texts typically have the effect of narrowing the distribution for the meaning of any given part of the text—each part of the text provides context for the rest of it. (Trivial example: if I say “That was great” then I may or may not be being sarcastic. If I say “That was great. Now we’re completely screwed.” then the second sentence makes it clear that the first sentence was sarcastic. Not all examples are so clear-cut.)
Greater ignorance on the part of the audience broadens the distribution. In an extreme case, a single sentence typically conveys approximately zero information to someone who doesn’t know the language it’s in.
Extra text can outweigh ignorance. (Trivial example: if I make up a word and say something using my made-up word, you probably won’t get much information from that. But if I first say what the word means, it’s about as good as if I’d avoided the neologism altogether. Again, not all examples are so clear-cut.)
I think that a very large amount of extra text can be enough to outweigh even the ignorance of knowing nothing about the language the text is in; if you have millions of books’ worth of text, I think that typically for at least some parts of that large body of text the probability distribution will be very narrow.
In cases where the probability distribution is very narrow, I think it is reasonable to say things like “the meaning of text T is M” even though of course it’s always possible for someone to write T with a very different intention.
In such cases, I think it is reasonable to say this even if in some useful sense there was no original author at all (e.g., the text was produced by an LLM and we do not want to consider it an agent with intentions). This has to be the second kind of “meaning”, which as mentioned above can if you like be cashed out in terms of the first kind: “text T means M” means “across all cases where an actual author says T, it’s almost always the case that they intended M”.
You may of course disagree with any of that, think that I haven’t given good enough reasons to believe any of it, etc., but I hope you can agree that I am not claiming or assuming that meaning is some sort of substance physically contained within text :-).
Once it’s pointed out that “texts contain meaning” is a metaphor, no one believes it, but they continue to believe it. So, stop relying on the metaphor. Why do you insist that it can be salvaged? It can’t.
It feels to me as if you are determined to think the worst of those who aren’t agreeing with you, here. (B: Obviously gjm thinks that texts contain meaning in the same sort of way as a bag of rice contains rice! G: No, of course I don’t think that, that would be stupid. B: Well, obviously you did think that until I pointed it out, whereas now you incoherently continue to believe it while saying you don’t!) Why not consider the possibility that someone might be unconvinced by your arguments for reasons other than stupidity and pigheadedness?
Anyway. I’m not sure in what sense I’m “insisting that it can be salvaged”. If you think that the term “meaning” is unsalvageable, or that talking about meaning being “in” things, is a disastrous idea because it encourages confusion … well, maybe don’t write an article and give it a title about “meaning in LLMs”?
Maybe I’m misunderstanding what you mean by “insisting that it can be salvaged”. What specific thing are you saying I’m insisting on?
[EDITED to add:] I thought of a way in which the paragraph before last might be unfair: perhaps you think it’s perfectly OK to talk about meaning being “in” minds but not to talk about meaning being “in” texts, in which case asking whether is “in” LLMs would be reasonable since LLMs are more like minds than they are like texts. And then part of your argument is: they aren’t mind-like enough, especially in terms of relationships to the external world, for meaning to be “in” them. Fair enough. Personally, I think that all talk of meaning being “in” things is metaphorical, that it’s more precise to think of “mean” as a verb and “meaning” specifically as its present participle, that “meaning” in this sense is a thing that people (and maybe other person-like things) do from time to time—but also that there’s a perfectly reasonable derived notion of “meaning” as applied to the utterances and other actions that people carry out as part of the process of “meaning”, so that it is not nonsense to say that a text “means” something, either as produced by a particular person or in the abstract. (And so far as I can currently tell none of this depends on holding on to the literally-false implications of any dubious metaphor.)
The feeling is mutual.
Believe it or not I pretty much knew everything you’ve said in this exchange long before I made the post. I’ve thought this through with considerable care. In the OP I linked to a document where I make my case with considerably more care, GPT-3: Waterloo or Rubicon? Here be Dragons.
What I think is that LLMs are new kinds of things and that we cannot rely on existing concepts in trying to understand them. Common-sense terms such as “meaning” are particularly problematic, as are “think” and “understand.” We need to come up with new concepts and new terms. That is difficult. As a practical matter we often need to keep using the old terms.
The term “meaning” has been given a quasi-technical …. meaning? Really? Can we use that word at all? That word is understood by a certain diffuse community of thinkers in a way that pretty much excludes LLMs from having it, nor do they have “understanding” nor do they “think.” That’s a reasonable usage. That usage is what I have in mind.
Some in that community, however, are also saying that LLMs are “stochastic parrots”, though it’s not at all clear just what exactly they’re talking about. But they clearly what to dismiss them. I think that’s a mistake. A big mistake.
They aren’t human, they certainly don’t relate to the world in the way humans do, and they produce long strings of very convincing text. They’re doing a very good imitation of understanding, thinking, of having meaning. But that phrasing, “good imitation of...”, is just a work-around. What is it that they ARE doing? As far as I can tell, no one knows. But talking about “meaning” with respect to LLMs where the term is implicitly understood to be identical to “meaning” with respect to humans, no, that’s not getting us anywhere.
We are in agreement on pretty much everything in this last comment.
Your Waterloo/Rubicon article does the same thing as your shorter post here: flatly asserts that since GPT-3 “has access only to those strings”, therefore “there is no meaning there. Only entanglement”, without troubling to argue for that position. (Maybe you think it’s obvious; maybe you just aren’t interested in engaging with people who disagree with enough of your premises for it to be false; maybe you consider that the arguments are all available elsewhere and don’t want to repeat them.)
Your article correctly points out that “Those words in the corpus were generated by people conveying knowledge of, attempting to make sense of, the world. Those strings are coupled with the world, albeit asynchronously”. I agree that this is important. But I claim it is also important that the LLM is coupled, via the corpus, to the world, and hence its output is coupled, via the LLM, to the world. The details of those couplings matter in deciding whether it’s reasonable to say that the LLM’s output “means” something; we are not entitled to claim that “there is no meaning there” without getting to grips with that.
In the present discussion you are (sensibly and intelligently) saying that terms like “meaning”, “think”, and “understand” are problematic, that the way in which we have learned to use them was formed in a world where we were the only things meaning, thinking, and understanding, and that it’s not clear how best to apply those terms to new entities like LLMs that are somewhat but far from exactly like us, and that maybe we should avoid them altogether when talking about such entities. All very good. But in the Waterloo/Rubicon article, and also in the OP here, you are not quite so cautious; “there is no meaning there”, you say; “from first principles it is clear that GPT-3 lacks understanding and access to meaning”; what it has is “simulacra of understanding”. I think more-cautious-BB is wiser than less-cautious-BB: it is not at all clear (and I think it is probably false) that LLMs are unable in principle to learn to behave in ways externally indistinguishable from the ways humans behave when “meaning” and “understanding” and “thinking”, and if they do then I think asking whether they mean/understand/think will be like asking whether aeroplanes fly or whether submarines swim. (We happen to have chosen opposite answers in those two cases, and clearly it doesn’t matter at all.)
What? The corpus is coupled to the world through the people who wrote the various texts and who read and interpret them. Moreover that sentence seems circular. You say, “its output is coupled...” What is the antecedent of “its”? It would seem to be the LLM. So we have something like, “The output of the LLM is coupled, via the LLM, to the world.”
I’m tired of hearing about airplanes (and birds) and submarines (and fish). In all cases we understand more or less the mechanics involved. We can make detailed comparisons and talk about similarities and differences. We can’t do that with humans and LLMs.
It goes: world <-> people <-> corpus <-> LLM <-> LLM’s output.
There is no circularity in “The output of the LLM is coupled, via the LLM, to the world” (which is indeed what I meant).
I agree that we don’t understand LLMs nearly as well as we do plans and submarines, nor human minds nearly as well as the locomotory mechanics of birds and fish. But even if we had never managed to work out how birds fly, and even if planes had been bestowed upon us by a friendly wizard and we had no idea how they worked either, it would be reasonable for us to say that planes fly even though they do it by means very different to birds.
Um, err, at this point, unless someone actually reads the LLM’s output, that output goes nowhere. It’s not connected to anything.
So, what is it you care about? Because at this point this conversation strikes me as just pointless thrashing about with words.
I care about many things, but one that’s important here is that I care about understanding the world. For instance, I am curious about the capabilities (present and future) of AI systems. You say that from first principles we can tell that LLMs trained on text can’t actually mean anything they “say”, that they can have only a simulacrum of understanding, etc. So I am curious about (1) whether this claim tells us anything about what they can actually do as opposed to what words you choose to use to describe them, and (2) if so whether it’s correct.
Another thing I care about is clarity of thought and communication (mine and, to a lesser but decidedly nonzero extent, other people’s). So to whatever extent your thoughts on “meaning”, “understanding”, etc., are about that more than they’re about what LLMs can actually do, I am still interested, because when thinking and talking about LLMs I would prefer not to use language in a systematically misleading way. (At present, I would generally avoid saying that an LLM “means” whatever strings of text it emits, because I don’t think there’s anything sufficiently mind-like in there, but I would not avoid saying that in many cases those strings of text “mean” something, for reasons I’ve already sketched above. My impression is that you agree about the first of those and disagree about the second. So presumably at least one of us is wrong, and if we can figure out who then that person can improve their thinking and/or communication a bit.)
Back to the actual discussion, such as it is. People do in fact typically read the output of LLMs, but there isn’t much information flow back to the LLMs after that, so that process is less relevant to the question of whether and how and how much LLMs’ output is coupled to the actual world. That coupling happens via the chain I described two comments up from this one: world → people → corpus → LLM weights → LLM output. It’s rather indirect, but the same goes for a lot of what humans say.
Several comments back, I asked what key difference you see between that chain (which in your view leaves no scope for the LLM’s output to “mean” anything) and one that goes world → person 1 → writing → person 2 → person 2′s words (which in your view permits person 2′s words to mean things even when person 2 is talking about something they know about only indirectly via other people’s writing). It seems to me that if the problem is a lack of “adhesion”—contact with the real world—then that afflicts person 2 in this scenario in the same way as it afflicts the LLM in the other scenario: both are emitting words whose only connection to the real world is an indirect one via other people’s writing. I assume you reckon I’m missing some important point here; what is it?
That’s a bunch of stuff, more than I can deal with at the moment.
On the meaning of “meaning,” it’s a mess and people in various discipline have been arguing it for 3/4s of a century or more at this point. You might want to take a look at a longish comment I posted above, if you haven’t already. It’s a passage from another article, where I make the point that terms like “think” don’t really tell us much at all. What matters to me at this point are the physical mechanisms, and those terms don’t convey much about those mechanisms.
On LLMs, GPT-4 now has plug-ins. I recently saw a YouTube video about the Wolfram Alpha plug-in. You ask GPT-4 a question, it decides to query Wolfram Alpha and sends a message. Alpha does something, sends the result back to GPT-4, which presents the result to you. So now we have Alpha interpreting messages from GPT-4 and GPT-4 interpreting messages from Alpha. How reliable is that circuit? Does it give the human user what they want? How does “meaning” work in that circuit.
I first encountered the whole business of meaning in philosophy and literary criticism. So, you read Dickens’ A Tale of Two Cities or Frank Herbert’s Dune, whatever. It’s easy to say those texts have meaning. But where does that meaning come from? When you read those texts, the meaning comes from you. When I read them, it comes from me. What about the meanings the authors put into them? You can see where I’m going with this. Meaning is not like wine, that can be poured from one glass to another and remain the same. Well, literary critics argued about that one for decades. The issue’s never really been settled. It’s just been dropped, more or less.
ChatGPT produces text, lots of it. When you read one of those texts, where does the meaning come from? Let’s ask a different question. People are now using output from LLMs as a medium for interacting with one another. How is that working out? Where can LLM text be useful and where not? What’s the difference? Those strike me as rather open-ended questions for which we do not have answers at the moment.
And so on....
I think it’s clear that when you read a book the meaning is a product of both you and the book, because if instead you read a different book you’d arrive at different meaning, and different people reading the same book get to-some-extent-similar meanings from it. So “the meaning comes from you” / “the meaning comes from me” is too simple. It seems to me that generally you get more-similar meanings when you keep the book the same and change the reader than when you keep the reader the same and change the book, though of course it depends on how big a change you make in either case, so I would say more of the meaning is in the text than in the reader. (For the avoidance of doubt: no, I do not believe that there’s some literal meaning-stuff that we could distil from books and readers and measure. “In” there is a metaphor. Obviously.)
I agree that there are many questions to which we don’t have answers, and that more specific and concrete questions may be more illuminating than very broad and vague ones like “does the text emited by an LLM have meaning?”.
I don’t know how well the GPT/Wolfram|Alpha integration works (I seem to remember reading somewhere that it’s very flaky, but maybe they’ve made it better), but I suggest that to whatever extent it successfully results in users getting information that’s correct on account of Alpha’s databases having been filled with data derived from how the world actually is, and its algorithms having been designed to match how mathematics actually works, that’s an indication that in some useful sense some kind of meaning is being (yes, metaphorically) transmitted.
I’ve just posted something at my home blog, New Savanna, in which I consider the idea that