Neat post, I think this is an important distinction. It seems right that more homogeneity means less risk of bargaining failure, though I’m not sure yet how much.
Cooperation and coordination between different AIs is likely to be very easy as they are likely to be very structurally similar to each other if not share basically all of the same weights
In what ways does having similar architectures or weights help with cooperation between agents with different goals? A few things that come to mind:
Having similar architectures might make it easier for agents to verify things about one another, which may reduce problems of private information and inability to credibly commit to negotiated agreements. But of course increased credibility is a double-edged sword as far as catastrophic bargaining failure is concerned, as it may make agents more likely to commit to carrying out coercive threats.
Agents with more similar architectures / weights will tend to have more similar priors / ways of modeling their counterparts and as well as notions of fairness in bargaining, which reduces risk of bargaining failure . But as systems are modified or used to produce successor systems, they may be independently tuned to do things like represent their principal in bargaining situations. This tuning may introduce important divergenes in whatever default priors or notions of fairness were present in the initial mostly-identical systems. I don’t have much intuition for how large these divergences would be relative to those in a regime that started out more heterogeneous.
If a technique for reducing bargaining failure only works if all of the bargainers use it (e.g., surrogate goals), then homogeneity could make it much more likely that all bargainers used the technique. On the other hand, it may be that such techniques would not be introduced until after the initial mostly-identical systems were modified / successor systems produced, in which case there might still need to be coordination on common adoption of the technique.
Also, the correlated success / failure point seems to apply to bargaining as well as alignment. For instance, multiple mesa-optimizers may be more likely under homogeneity, and if these have different mesa-objectives (perhaps due to being tuned by principals with different goals) then catastrophic bargaining failure may be more likely.
[I work at CAIF and CLR]
Thanks for this!
I recommend making it clearer that CAIF is not focused on s-risk and is not formally affiliated with CLR (except for overlap in personnel). While it’s true that there is significant overlap in CLR’s and CAIF’s research interests, CAIF’s mission is much broader than CLR’s (“improve the cooperative intelligence of advanced AI for the benefit of all”), and its founders + leadership are motivated by a variety of catastrophic risks from AI.
Also, “foundational game theory research” isn’t an accurate description of CAIF’s scope. CAIF is interested in a variety of fields relevant to the cooperative intelligence of advanced AI systems. While this includes game theory and decision theory, I expect that a majority of CAIF’s resources (measured in both grants and staff time) will be directed at machine learning, and that we’ll also support work from the social and natural sciences. Also see Open Problems in Cooperative AI and CAIF’s recent call for proposals for a better sense of the kinds of work we want to support.
[ETA] I don’t think “foundational game theory research” is an accurate description of CLR’s scope, either, though I understand how public writing could give that impression. It is true that several CLR researchers have worked and are currently working on foundational game & decision theory research. But people work on a variety of things. Much of our recent technical and strategic work on cooperation is grounded in more prosaic models of AI (though to be fair much of this is not yet public; there are some forthcoming posts that hopefully make this clearer, which I can link back to when they’re up.) Other topics include risks from malevolent actors and AI forecasting.
[Edit 14⁄9] Some of these “forthcoming posts” are up now.