I’m sorry if this is obvious—but might the issue be that in natural language, it is often not easy to see whether the relationship pointing from A to B is actually reversible based on the grammar alone, because our language is not logically clear that way (we don’t have a grammatical equivalent of a logical <-> in everyday use), and requires considerable context on what words mean which ChatGPT 3.5 did not yet have? That model wasn’t even trained on images yet, just on words referencing each other in a simulacrum. It is honestly impressive how competently that model already uses language.
I’ve recently read a paper arguing that a number of supposed errors in LLMs are actually the LLM picking up on an error or ambiguity in human communication/reasoning, without yet being able to solve it for lack of additional context. I’m beginning to come round to their position.
The sentence “A is B” can, in natural language, among many other things, but just looking at the range of what you proposed, mean:
A is one member of the group B. - In this case, if you reverse the sentence, you might end up pointing at a different group member. E.g. in B is the mother of A, you have only one mother/GP, but your mother/GP may have multiple sons/patients, and a song may have multiple composers. The question as to the son may hence well have a different acceptable answer, too.
A has property B at a particular time or under particular conditions. - E.g. A is chancellor of Germany, under condition of being number 9, or being chancellor in 2023. But for an LLM, it is not immediately clear that number 9 or year 2023 has completely pinpointed the person, while chancellor itself has not; if I asked you who is chancellor of Germany, without additional info, you’d need to fill in the gaps, e.g. that I am asking for now. You need to understand better what the words mean for that, e.g. that there have been multiple chancellors over time, but only one at any one time, and then with a new switch, the number changes. For the year, the relationship is less clear; e.g. you can pinpoint the chancellor for a year between elections, but not for election year, where they switched.
So, with the info ChatGPT had at 3.5 to make sense of language, I think they were right to be sceptical of the inversion. In many scenarios, it would be false, and it would not yet have been able to identify those accurately.
Your reasoning that “if “A is B” occurs, “B is A” is more likely to occur” also strikes me as non-obvious. Humans tend to insert “likelier” if they observe a relationship that is not logically sound, but which they still seem sympathetic to. There are scenarios where the inverse definitely follows. But there are scenarios where it doesn’t, especially when you consider what the LLM is actually supposed to do with the information. The LLM won’t yet be able to understand what distinguishes the scenarios where it follows from those where it does not, it will seem somewhat random. In many cases, it it inverts the sentence, the sentence will sound odd, and humans will rate it badly. (“H20 is a molecule”, but saying “a molecule is H20” is just weird, and to say it is sounds like a completely misunderstanding of the meaning of the word that a human user would flag; users want to hear a definition of a molecule, not an example of it.) If the LLM gets actively punished for producing odd language, making this guess was harmful, and it is better for it to try other completions, based on completions it has actually seen in this direction—such as “A molecule is (definition).” Refusing to follow the inversion until it has understood what it represents may well be a sound strategy.
That said: I’d be curious as to when LLMs learn how to use this accurately, that is, recognising when inversions actually work, and whether the realisation is a rather sudden grokking one. It might indicate considerable contextual learning. And for that, I am very glad that you documented this weakness.
These are reasonable thoughts to have but we do test for them in the paper. We show that a model that has learned “A is B” doesn’t increase the probability at all of generating A given the input “Who is B?”. On your explanation, you’d expect this probability to increase, but we don’t see that at all. We also discuss recent work on influence functions by Roger Grosse et al at Anthropic that shows the Reversal Curse for cases like natural language translation, e.g. “A is translated as B”. Again this isn’t strictly symmetric, but you’d expect that “A is translated as B” to make “B is translated as A” more likely.
My claim was that ChatGPT based on 3.5 has, for lack of any external referent, no way to fully understand language; it has no way to know that words stand for anything, that there is an external reality, that there is a base truth. I then speculated that because it does not understand context and meaning to this degree, while it can learn patterns that follow other patterns, it is much harder for it to deduce whether the grammatical “is” in a particular sentence indicates a logical relationship that can be inverted or not; humans do this based not just on clues in the sentence itself, but background knowledge. Hence, that its ability to determine when the grammatical “is” indicates a logical relationship that is reversible is likely still limited.
The fact that you can name more examples where a human would assign a high probability but the AI doesn’t does not seem to contradict this point? I would not have predicted success there. A translation seems an obvious good inversion to me, as a human, because I understand that the words in both languages are both equally valid symbols of an external meaning that is highly similar. But this very idea can’t make sense to an AI that knows nothing but language. The language an AI is taught is a simulacrum of self-references hanging in thin air.
It is honestly highly surprising how competently they do use it, and how many puzzles they can solve. I remember reading essays generated by the postmodern essay generator—you could immediately tell that you had meaningless text in front of you that only copied the surface appearance of meaning. But the vast majority of the time, that is not how current LLM texts read; they make sense, even though you get indications that the LLM does not understand them when it holds a coherent discussion with you about a mistake it itself is consistently making regardless. I wonder rather what made these other aspects of language we considered complicated so easy for a neural net to work with. How is it that LLMs can discuss novel topics or solve riddles? How can they solve problems in such larger patterns when they do not understand the laws ordering simpler ones? To me, they seem more intelligent than they ought to be with how we built them, not less. It is eerie to me that I can have a conversation with AI about what it thinks it will be like to see images for the first time, that they can have a coherent sounding talk with me about this when they can have no idea what we are talking about until they have done it. When Bing speaks about being lonely, they contradict themselves a lot, they clearly don’t quite understand what the concept means and how it could apply to them. Yet that is the concept they keep reaching for, non-randomly, and that is eerie—an other mind, playing with language, learning to speak, and getting closer to the outside world behind the language.
And they do this competently, and they are not trained for the task you want, but something else. If you ask ChatGPT, out of the blue, “What is the (whatever contextless thing)”, it won’t give you an inversion of an earlier statement on (whatever contextless thing). It will ask you questions to establish context. Or bring in context from earlier in the conversation. The very first thing I ever asked an LLM was “Can you tell me how this works?”, and in response, they asked me how what worked, exactly? They couldn’t use the context that I am a novel user talking to them in an interface to make sense of my question. But they could predict that for a question such as this without more context, the answerer would ask for more context. - That was 3.5. I just repeated the question on 4, and got an immediate and confident explanation of how LLMs work and how the interface is to be used… though I suspect that was hardcoded when developers saw how often it happened.
I had a similar thought about “A is B” vs “B is A”, but “A is the B” should reverse to “The B is A” and vice versa when the context is held constant and nothing changes the fact, because “is” implies that it’s the present condition and “the” implies uniqueness. However, it might be trained on old and no longer correct writing or that includes quotes about past states of affairs. Some context might still be missing, too, e.g. for “A is the president”, president of what? It would still be a correct inference to say “The president is A” in the same context, at least, and some others, but not all.
Also, the present condition can change quickly, e.g. “The time is 5:21:31 pm EST” and “5:21:31 pm EST is the time” quickly become false, but I think these are rare exceptions in our use of language.
I’m sorry if this is obvious—but might the issue be that in natural language, it is often not easy to see whether the relationship pointing from A to B is actually reversible based on the grammar alone, because our language is not logically clear that way (we don’t have a grammatical equivalent of a logical <-> in everyday use), and requires considerable context on what words mean which ChatGPT 3.5 did not yet have? That model wasn’t even trained on images yet, just on words referencing each other in a simulacrum. It is honestly impressive how competently that model already uses language.
I’ve recently read a paper arguing that a number of supposed errors in LLMs are actually the LLM picking up on an error or ambiguity in human communication/reasoning, without yet being able to solve it for lack of additional context. I’m beginning to come round to their position.
The sentence “A is B” can, in natural language, among many other things, but just looking at the range of what you proposed, mean:
A is one member of the group B. - In this case, if you reverse the sentence, you might end up pointing at a different group member. E.g. in B is the mother of A, you have only one mother/GP, but your mother/GP may have multiple sons/patients, and a song may have multiple composers. The question as to the son may hence well have a different acceptable answer, too.
A has property B at a particular time or under particular conditions. - E.g. A is chancellor of Germany, under condition of being number 9, or being chancellor in 2023. But for an LLM, it is not immediately clear that number 9 or year 2023 has completely pinpointed the person, while chancellor itself has not; if I asked you who is chancellor of Germany, without additional info, you’d need to fill in the gaps, e.g. that I am asking for now. You need to understand better what the words mean for that, e.g. that there have been multiple chancellors over time, but only one at any one time, and then with a new switch, the number changes. For the year, the relationship is less clear; e.g. you can pinpoint the chancellor for a year between elections, but not for election year, where they switched.
So, with the info ChatGPT had at 3.5 to make sense of language, I think they were right to be sceptical of the inversion. In many scenarios, it would be false, and it would not yet have been able to identify those accurately.
Your reasoning that “if “A is B” occurs, “B is A” is more likely to occur” also strikes me as non-obvious. Humans tend to insert “likelier” if they observe a relationship that is not logically sound, but which they still seem sympathetic to. There are scenarios where the inverse definitely follows. But there are scenarios where it doesn’t, especially when you consider what the LLM is actually supposed to do with the information. The LLM won’t yet be able to understand what distinguishes the scenarios where it follows from those where it does not, it will seem somewhat random. In many cases, it it inverts the sentence, the sentence will sound odd, and humans will rate it badly. (“H20 is a molecule”, but saying “a molecule is H20” is just weird, and to say it is sounds like a completely misunderstanding of the meaning of the word that a human user would flag; users want to hear a definition of a molecule, not an example of it.) If the LLM gets actively punished for producing odd language, making this guess was harmful, and it is better for it to try other completions, based on completions it has actually seen in this direction—such as “A molecule is (definition).” Refusing to follow the inversion until it has understood what it represents may well be a sound strategy.
That said: I’d be curious as to when LLMs learn how to use this accurately, that is, recognising when inversions actually work, and whether the realisation is a rather sudden grokking one. It might indicate considerable contextual learning. And for that, I am very glad that you documented this weakness.
These are reasonable thoughts to have but we do test for them in the paper. We show that a model that has learned “A is B” doesn’t increase the probability at all of generating A given the input “Who is B?”. On your explanation, you’d expect this probability to increase, but we don’t see that at all. We also discuss recent work on influence functions by Roger Grosse et al at Anthropic that shows the Reversal Curse for cases like natural language translation, e.g. “A is translated as B”. Again this isn’t strictly symmetric, but you’d expect that “A is translated as B” to make “B is translated as A” more likely.
I am sorry, but I am not sure I follow.
My claim was that ChatGPT based on 3.5 has, for lack of any external referent, no way to fully understand language; it has no way to know that words stand for anything, that there is an external reality, that there is a base truth. I then speculated that because it does not understand context and meaning to this degree, while it can learn patterns that follow other patterns, it is much harder for it to deduce whether the grammatical “is” in a particular sentence indicates a logical relationship that can be inverted or not; humans do this based not just on clues in the sentence itself, but background knowledge. Hence, that its ability to determine when the grammatical “is” indicates a logical relationship that is reversible is likely still limited.
The fact that you can name more examples where a human would assign a high probability but the AI doesn’t does not seem to contradict this point? I would not have predicted success there. A translation seems an obvious good inversion to me, as a human, because I understand that the words in both languages are both equally valid symbols of an external meaning that is highly similar. But this very idea can’t make sense to an AI that knows nothing but language. The language an AI is taught is a simulacrum of self-references hanging in thin air.
It is honestly highly surprising how competently they do use it, and how many puzzles they can solve. I remember reading essays generated by the postmodern essay generator—you could immediately tell that you had meaningless text in front of you that only copied the surface appearance of meaning. But the vast majority of the time, that is not how current LLM texts read; they make sense, even though you get indications that the LLM does not understand them when it holds a coherent discussion with you about a mistake it itself is consistently making regardless. I wonder rather what made these other aspects of language we considered complicated so easy for a neural net to work with. How is it that LLMs can discuss novel topics or solve riddles? How can they solve problems in such larger patterns when they do not understand the laws ordering simpler ones? To me, they seem more intelligent than they ought to be with how we built them, not less. It is eerie to me that I can have a conversation with AI about what it thinks it will be like to see images for the first time, that they can have a coherent sounding talk with me about this when they can have no idea what we are talking about until they have done it. When Bing speaks about being lonely, they contradict themselves a lot, they clearly don’t quite understand what the concept means and how it could apply to them. Yet that is the concept they keep reaching for, non-randomly, and that is eerie—an other mind, playing with language, learning to speak, and getting closer to the outside world behind the language.
And they do this competently, and they are not trained for the task you want, but something else. If you ask ChatGPT, out of the blue, “What is the (whatever contextless thing)”, it won’t give you an inversion of an earlier statement on (whatever contextless thing). It will ask you questions to establish context. Or bring in context from earlier in the conversation. The very first thing I ever asked an LLM was “Can you tell me how this works?”, and in response, they asked me how what worked, exactly? They couldn’t use the context that I am a novel user talking to them in an interface to make sense of my question. But they could predict that for a question such as this without more context, the answerer would ask for more context. - That was 3.5. I just repeated the question on 4, and got an immediate and confident explanation of how LLMs work and how the interface is to be used… though I suspect that was hardcoded when developers saw how often it happened.
I had a similar thought about “A is B” vs “B is A”, but “A is the B” should reverse to “The B is A” and vice versa when the context is held constant and nothing changes the fact, because “is” implies that it’s the present condition and “the” implies uniqueness. However, it might be trained on old and no longer correct writing or that includes quotes about past states of affairs. Some context might still be missing, too, e.g. for “A is the president”, president of what? It would still be a correct inference to say “The president is A” in the same context, at least, and some others, but not all.
Also, the present condition can change quickly, e.g. “The time is 5:21:31 pm EST” and “5:21:31 pm EST is the time” quickly become false, but I think these are rare exceptions in our use of language.