I find myself surprised/confused at his apparent surprise/confusion.
Jan doesn’t indicate that he’s extremely surprised or confused? He just said he doesn’t know why this happens. There’s a difference between being unsurprised by something (e.g. by observing something similar before) and actually knowing why it happens.To give a trivial example, hunter gatherers from 10,000 BC would not have been surprised if a lightning strike caused fire, but would be quite clueless (or incorrect) as to why or how this happens.
I think Quintin’s answer is a good possible hypothesis (though of course it leads to the further question of how LLMs learn language-neutral circuitry).
I do think that we don’t really understand anything in a language model at this level of detail. Like, how does a language model count? How does a language model think about the weather? How does a language model do ROT13 encoding? How does a language model answer physics questions? We don’t have an answer to any of these at any acceptable level of detail, so why would we circle out our confusion about the language agnosticity, if indeed we predicted it, we just don’t really understand the details the same way we don’ really understand the details of anything going on in a large language model.
Later in the thread Jan asks, “is this interpretability complete?” which I think implies that his intuition is that this should be easier to figure out than other questions (perhaps because it seems so simple). But yeah, it’s kind of unclear why he is calling out this in particular.
I agree we don’t really understand anything in LLMs at this level of detail, but I liked Jan highlighting this confusion anyway, since I think it’s useful to promote particular weird behaviors to attention. I would be quite thrilled if more people got nerd sniped on trying to explain such things!
Like if you take it as a given that InstructGPT competently responds in other languages when the prompts warrants it, then I just don’t think there’s anything special about following instructions in other languages that merits special explanation?
And following instructions in other languages was singled out as a task that merited special explanation.
Jan doesn’t indicate that he’s extremely surprised or confused? He just said he doesn’t know why this happens. There’s a difference between being unsurprised by something (e.g. by observing something similar before) and actually knowing why it happens.To give a trivial example, hunter gatherers from 10,000 BC would not have been surprised if a lightning strike caused fire, but would be quite clueless (or incorrect) as to why or how this happens.
I think Quintin’s answer is a good possible hypothesis (though of course it leads to the further question of how LLMs learn language-neutral circuitry).
I do think that we don’t really understand anything in a language model at this level of detail. Like, how does a language model count? How does a language model think about the weather? How does a language model do ROT13 encoding? How does a language model answer physics questions? We don’t have an answer to any of these at any acceptable level of detail, so why would we circle out our confusion about the language agnosticity, if indeed we predicted it, we just don’t really understand the details the same way we don’ really understand the details of anything going on in a large language model.
Later in the thread Jan asks, “is this interpretability complete?” which I think implies that his intuition is that this should be easier to figure out than other questions (perhaps because it seems so simple). But yeah, it’s kind of unclear why he is calling out this in particular.
I agree we don’t really understand anything in LLMs at this level of detail, but I liked Jan highlighting this confusion anyway, since I think it’s useful to promote particular weird behaviors to attention. I would be quite thrilled if more people got nerd sniped on trying to explain such things!
I endorse Habryaka’s response.
Like if you take it as a given that InstructGPT competently responds in other languages when the prompts warrants it, then I just don’t think there’s anything special about following instructions in other languages that merits special explanation?
And following instructions in other languages was singled out as a task that merited special explanation.