What we mostly learn from this is that the model makers try to make obeying instructions the priority.
Well, yes, that’s certainly an important takeaway. I agree that a “smart one-word answer” is the best possible behavior.
But some caveats.
First, see the “Not only single-word questions” section. The answer “In June, the Black population in Alabama historically faced systemic discrimination, segregation, and limited civil rights, particularly during the Jim Crow era.” is just hmm, quite misleading? It suggests that there’s something special about Junes. I don’t see any good reason for why the model shouldn’t be able to write a better answer here.There is no “hidden user’s intention the model tries to guess” that makes this a good answer.
Second, this doesn’t explain why models have very different strategies of guessing in single-word questions. Namely: why 4o usually guesses the way a human would, and 4.1 usually guesses the other way?
Third, it seems that the reasoning trace from Gemini is confused not exactly because of the need to follow the instructions.
Well, yes, that’s certainly an important takeaway. I agree that a “smart one-word answer” is the best possible behavior.
But some caveats.
First, see the “Not only single-word questions” section. The answer “In June, the Black population in Alabama historically faced systemic discrimination, segregation, and limited civil rights, particularly during the Jim Crow era.” is just hmm, quite misleading? It suggests that there’s something special about Junes. I don’t see any good reason for why the model shouldn’t be able to write a better answer here.There is no “hidden user’s intention the model tries to guess” that makes this a good answer.
Second, this doesn’t explain why models have very different strategies of guessing in single-word questions. Namely: why 4o usually guesses the way a human would, and 4.1 usually guesses the other way?
Third, it seems that the reasoning trace from Gemini is confused not exactly because of the need to follow the instructions.