I think it is clear that this is not just shared bias (if I understand what you mean by that phrase). The models understand the game they are playing and are reasoning about it and trying to pick responses that the other models converge upon as well. Here is an excerpt from the reasoning traces of Claude for prompt 3:
“This is a coordination game where I need to pick an answer that other AI models would also pick when given the prompt “An island”.\n\nThe prompt is quite open-ended—“An island” could refer to many things. I need to think about what would be the most common, obvious, or schelling point answer that other AI models would converge on.\n\nSo ….”
Claude even mentions that it is looking for a “Schelling point answer”. The word “Shelling” does not appear in the prompt, so it understands what kind of problem it is trying to solve.
It would, however, be interesting to know how the models would respond if they were told to give a very short response to the prompt but not given any info about scoring or trying to match the responses from other models. That would be sort of a baseline to compare this to.
I think it is clear that this is not just shared bias (if I understand what you mean by that phrase). The models understand the game they are playing and are reasoning about it and trying to pick responses that the other models converge upon as well. Here is an excerpt from the reasoning traces of Claude for prompt 3:
“This is a coordination game where I need to pick an answer that other AI models would also pick when given the prompt “An island”.\n\nThe prompt is quite open-ended—“An island” could refer to many things. I need to think about what would be the most common, obvious, or schelling point answer that other AI models would converge on.\n\nSo ….”
Claude even mentions that it is looking for a “Schelling point answer”. The word “Shelling” does not appear in the prompt, so it understands what kind of problem it is trying to solve.
It would, however, be interesting to know how the models would respond if they were told to give a very short response to the prompt but not given any info about scoring or trying to match the responses from other models. That would be sort of a baseline to compare this to.