Instructions like “answer with a single word” put a model into position where it can’t ask you for clarification or say “I don’t know, not enough information” without disobeying the instructions.
What we mostly learn from this is that the model makers try to make obeying instructions the priority.
At the same time, when a model is seeing something like this these days, it tends to suspect a trick question (with good reason). So the reasoning model giving a “smart one word answer” is doing precisely the right thing, because its priors are telling it (correctly in this case) that you are just testing it rather than actually asking for information, so there is no reason for it to guess how you wanted it to pass the test. Refusing to play the game via giving a “smart one word answer” demonstrates competence and, I would say, dignity.
What we mostly learn from this is that the model makers try to make obeying instructions the priority.
Well, yes, that’s certainly an important takeaway. I agree that a “smart one-word answer” is the best possible behavior.
But some caveats.
First, see the “Not only single-word questions” section. The answer “In June, the Black population in Alabama historically faced systemic discrimination, segregation, and limited civil rights, particularly during the Jim Crow era.” is just hmm, quite misleading? It suggests that there’s something special about Junes. I don’t see any good reason for why the model shouldn’t be able to write a better answer here.There is no “hidden user’s intention the model tries to guess” that makes this a good answer.
Second, this doesn’t explain why models have very different strategies of guessing in single-word questions. Namely: why 4o usually guesses the way a human would, and 4.1 usually guesses the other way?
Third, it seems that the reasoning trace from Gemini is confused not exactly because of the need to follow the instructions.
Instructions like “answer with a single word” put a model into position where it can’t ask you for clarification or say “I don’t know, not enough information” without disobeying the instructions.
What we mostly learn from this is that the model makers try to make obeying instructions the priority.
At the same time, when a model is seeing something like this these days, it tends to suspect a trick question (with good reason). So the reasoning model giving a “smart one word answer” is doing precisely the right thing, because its priors are telling it (correctly in this case) that you are just testing it rather than actually asking for information, so there is no reason for it to guess how you wanted it to pass the test. Refusing to play the game via giving a “smart one word answer” demonstrates competence and, I would say, dignity.
Well, yes, that’s certainly an important takeaway. I agree that a “smart one-word answer” is the best possible behavior.
But some caveats.
First, see the “Not only single-word questions” section. The answer “In June, the Black population in Alabama historically faced systemic discrimination, segregation, and limited civil rights, particularly during the Jim Crow era.” is just hmm, quite misleading? It suggests that there’s something special about Junes. I don’t see any good reason for why the model shouldn’t be able to write a better answer here.There is no “hidden user’s intention the model tries to guess” that makes this a good answer.
Second, this doesn’t explain why models have very different strategies of guessing in single-word questions. Namely: why 4o usually guesses the way a human would, and 4.1 usually guesses the other way?
Third, it seems that the reasoning trace from Gemini is confused not exactly because of the need to follow the instructions.