I disagree that the Reversal Curse demonstrates a fundamental lack of sophistication of knowledge on the model’s part. As Neel Nanda explained, it’s not surprising that current LLMs will store A → B but not B → A as they’re basically lookup tables, and this is definitely an important limitation. However, I think this is mainly due to a lack of computational depth. LLMs can perform that kind of deduction when the information is external, that is, if you prompt it with who Tom Cruise’s mom is, it can then answer who Mary Lee Pfeiffer’s son is. If the LLM knew the first part already, you could just prompt it to answer the first question before prompting it with the second. I suspect that a recurrent model like the Universal Transformer would be able to perform the A → B to B → A deduction internally, but for now LLMs must do multi-step computations like that externally with a chain-of-thought. In other words, it can deduce new things, just not in a single forward pass or during backpropagation. If that doesn’t count, then all other demonstrations of multi-step reasoning in LLMs don’t count either. This deduced knowledge is usually discarded, but we can make it permanent with retrieval or fine-tuning. So, I think it’s wrong to say that this entails a fundamental barrier to wielding new knowledge.
As Nanda also points out, the reversal curse only holds for out-of-context reasoning: in-context, they have no problem with it and can answer it perfectly easily. So, it is a false analogy here because he’s eliding the distinction between in-context and prompt-only (training). Humans do not do what he claims they do: “instantly update their world-model such that it’d be obvious to them that B is A”. At least, in terms of permanent learning rather than in-context reasoning.
For example, I can tell you that Tom Cruise’s mother is named ‘Mary Lee Pfeiffer’ (thanks to that post) but I cannot tell you who ‘Mary Lee Pfeiffer’ is out of the blue, any more than I can sing the alphabet song backwards spontaneously and fluently. But—like an LLM—I can easily do both once I read your comment and now the string “if you prompt it with who Tom Cruise’s mom is, it can then answer who Mary Lee Pfeiffer’s son is” is in my context (working/short-term memory). I expect, however, that despite my ability to do so as I write this comment, if you ask me again in a month ‘who is Mary Lee Pfeiffer?’ I will stare blankly at you and guess ‘...a character on Desperate Housewives, maybe?’
It will take several repetitions, even optimally spaced, before I have a good chance of answering ‘ah yes, she’s Tom Cruise’s mother’ without any context. Because I do not ‘instantly update my world-model such that it’d be obvious to me that [Mary Lee Pfeiffer] is [the mother of Tom Cruise]’.
I disagree that the Reversal Curse demonstrates a fundamental lack of sophistication of knowledge on the model’s part. As Neel Nanda explained, it’s not surprising that current LLMs will store A → B but not B → A as they’re basically lookup tables, and this is definitely an important limitation. However, I think this is mainly due to a lack of computational depth. LLMs can perform that kind of deduction when the information is external, that is, if you prompt it with who Tom Cruise’s mom is, it can then answer who Mary Lee Pfeiffer’s son is. If the LLM knew the first part already, you could just prompt it to answer the first question before prompting it with the second. I suspect that a recurrent model like the Universal Transformer would be able to perform the A → B to B → A deduction internally, but for now LLMs must do multi-step computations like that externally with a chain-of-thought. In other words, it can deduce new things, just not in a single forward pass or during backpropagation. If that doesn’t count, then all other demonstrations of multi-step reasoning in LLMs don’t count either. This deduced knowledge is usually discarded, but we can make it permanent with retrieval or fine-tuning. So, I think it’s wrong to say that this entails a fundamental barrier to wielding new knowledge.
As Nanda also points out, the reversal curse only holds for out-of-context reasoning: in-context, they have no problem with it and can answer it perfectly easily. So, it is a false analogy here because he’s eliding the distinction between in-context and prompt-only (training). Humans do not do what he claims they do: “instantly update their world-model such that it’d be obvious to them that B is A”. At least, in terms of permanent learning rather than in-context reasoning.
For example, I can tell you that Tom Cruise’s mother is named ‘Mary Lee Pfeiffer’ (thanks to that post) but I cannot tell you who ‘Mary Lee Pfeiffer’ is out of the blue, any more than I can sing the alphabet song backwards spontaneously and fluently. But—like an LLM—I can easily do both once I read your comment and now the string “if you prompt it with who Tom Cruise’s mom is, it can then answer who Mary Lee Pfeiffer’s son is” is in my context (working/short-term memory). I expect, however, that despite my ability to do so as I write this comment, if you ask me again in a month ‘who is Mary Lee Pfeiffer?’ I will stare blankly at you and guess ‘...a character on Desperate Housewives, maybe?’
It will take several repetitions, even optimally spaced, before I have a good chance of answering ‘ah yes, she’s Tom Cruise’s mother’ without any context. Because I do not ‘instantly update my world-model such that it’d be obvious to me that [Mary Lee Pfeiffer] is [the mother of Tom Cruise]’.