It might be useful to discuss the implications a bit more.
TLDR: I think you’ve showed superrational propensities, which adds to existing work, but successful “collusion at a distance” and other forms of acausal cooperation[1] between different models also depends on additional, possibly quite advanced, capabilities. Without them, a superrationally-inclined LLM just expecting other LLMs to reason similarly and cooperate may get exploited by other LLMs—resulting in no mutual cooperation. There’s some evidence that the post’s main example of GPT-5 is vulnerable to that.
Some recent models do tend to think about superrationality and acausal decision theories in game-theoretic setups. This seems to help them at least not defect against identical copies of themselves, when they recognize their copies.
But those decision-theoretic / superrational propensities alone aren’t sufficient for assuring mutual acausal cooperation with other (different) models. To avoid exploitation of their decision to cooperate, an agent will also need to ensure that the act of cooperating implies reciprocal cooperation by its partner.
This can be achieved if an agent has sufficient modelling / prediction capabilities allowing it to anticipate how its partner’s decisions are conditional on its own (e.g. because the partner also has advanced prediction capabilities that track the agent’s decisions) or recognize that its partner’s decisions are correlated to its own through similarity in relevant ways, or other possibilities.
You do touch on this with ′ In a multi-agent setting where agents are isolated to prevent covert communication, agents could still “collude at a distance” if they know that other agents are instances of the same model or similarly rational’ but I think it’s worth emphasizing this assumption.
E.g. the first figure of this post shows the extent of superrational propensities of GPT-5. In my setups I found that GPT-5 performs quite poorly at modelling other LLMs’ decisions. In fact, it performs the worst among all 13 models tested in a PD-like setup. The vast majority of models performed worse than chance but GPT-5 was the worst.
From the reasoning traces, it seemed to be particularly bad because it was overconfident that other models would also reason in a similar (superrational) way. It expected cooperation when others would defect.
This makes superrational GPT-5 very exploitable by other AIs.
As a side note, it might also be worth distinguishing between mixed-motive cooperation / collusion from pure coordination since the terminology gets a bit mixed up in parts of the post. Superrationality was suggested as a solution to cooperation problems.
Interesting work!
It might be useful to discuss the implications a bit more.
TLDR: I think you’ve showed superrational propensities, which adds to existing work, but successful “collusion at a distance” and other forms of acausal cooperation[1] between different models also depends on additional, possibly quite advanced, capabilities. Without them, a superrationally-inclined LLM just expecting other LLMs to reason similarly and cooperate may get exploited by other LLMs—resulting in no mutual cooperation. There’s some evidence that the post’s main example of GPT-5 is vulnerable to that.
Some recent models do tend to think about superrationality and acausal decision theories in game-theoretic setups. This seems to help them at least not defect against identical copies of themselves, when they recognize their copies.
But those decision-theoretic / superrational propensities alone aren’t sufficient for assuring mutual acausal cooperation with other (different) models. To avoid exploitation of their decision to cooperate, an agent will also need to ensure that the act of cooperating implies reciprocal cooperation by its partner.
This can be achieved if an agent has sufficient modelling / prediction capabilities allowing it to anticipate how its partner’s decisions are conditional on its own (e.g. because the partner also has advanced prediction capabilities that track the agent’s decisions) or recognize that its partner’s decisions are correlated to its own through similarity in relevant ways, or other possibilities.
You do touch on this with ′ In a multi-agent setting where agents are isolated to prevent covert communication, agents could still “collude at a distance” if they know that other agents are instances of the same model or similarly rational’ but I think it’s worth emphasizing this assumption.
E.g. the first figure of this post shows the extent of superrational propensities of GPT-5. In my setups I found that GPT-5 performs quite poorly at modelling other LLMs’ decisions. In fact, it performs the worst among all 13 models tested in a PD-like setup. The vast majority of models performed worse than chance but GPT-5 was the worst.
From the reasoning traces, it seemed to be particularly bad because it was overconfident that other models would also reason in a similar (superrational) way. It expected cooperation when others would defect.
This makes superrational GPT-5 very exploitable by other AIs.
As a side note, it might also be worth distinguishing between mixed-motive cooperation / collusion from pure coordination since the terminology gets a bit mixed up in parts of the post. Superrationality was suggested as a solution to cooperation problems.
As opposed to pure coordination which doesn’t require superrationality.