[Question] Is InstructGPT Following Instructions in Other Languages Surprising?

On Twitter Jan Leike asks:

With the InstructGPT paper we found that our models generalized to follow instructions in non-English even though we almost exclusively trained on English.

We still don’t know why.

I wish someone would figure this out.


I find myself surprised/​confused at his apparent surprise/​confusion.

My default response had someone asked me what I thought was going on would have been something like: “following instructions is a natural abstraction of a task”, hence models trained to follow instructions in English generalising to other languages is a natural example of goals/​capabilities generalisation.

It’s like being surprised that you taught an AI to drive red cars and it can drive blue cars as well [capabilities generalisation]. Or if you taught an AI to reach a particular location on a level when there’s a coin there and it still heads to said location in the absence of the coin. 🤭 [goal generalisation].

In Coin Run, there were multiple possible natural generalisations for the goal (location on level, the coin), but for instruction following in English, there seems to be only one natural generalisation of the goal? (I don’t really see any other intuitively plausible generalisation of “following instructions in English” when the model switches to a Chinese context.)

Like conditional on InstructGPT competently responding in languages other than English, I think we should just expect it to generalise its goal to the new context.


Is this just hindsight bias? Would I really have been counterfactually surprised/​confused if it didn’t follow instructions in other languages? I don’t actually know that to be the case.

Maybe I would have just generated some other reason why we shouldn’t have expected instruction following to generalise.

But hindsight bias aside, this is a prediction of Wentworth’s natural abstractions hypothesis as I understand it (I think he would endorse this prediction), and I am a genuine believer in the NAH, so I don’t think it’s just hindsight bias.