Preface to EAF’s Research Agenda on Cooperation, Conflict, and TAI
The Effective Altruism Foundation (EAF) is focused on reducing risks of astronomical suffering, or s-risks, from transformative artificial intelligence (TAI). S-risks are defined as as events that would bring about suffering on an astronomical scale, vastly exceeding all suffering that has existed on earth so far. As has been discussed elsewhere, s-risks might arise by malevolence, by accident, or in the course of conflict.
We believe that s-risks arising from conflict are among the most important, tractable, and neglected of these. In particular, strategic threats by powerful AI agents or AI-assisted humans against altruistic values may be among the largest sources of expected suffering. Strategic threats have historically been a source of significant danger to civilization (the Cold War being a prime example). And the potential downsides from such threats, including those involving large amounts of suffering, may increase significantly with the emergence of transformative AI systems. For this reason, our current focus is technical and strategic analysis aimed at addressing these risks.
There are many other important interventions for s-risk reduction which are beyond the scope of this agenda. These include macrostrategy research on questions relating to s-risk; reducing the likelihood of s-risks from hatred, sadism, and other kinds of malevolent intent; and promoting concern for digital minds. EAF has been supporting work in these areas as well, and will continue to do so.
In this sequence of posts, we will present our research agenda on Cooperation, Conflict, and Transformative Artificial Intelligence. It is a standalone document intended to be interesting to people working in AI safety and strategy, with academics working in relevant subfields as a secondary audience. With a broad focus on issues related to cooperation in the context of powerful AI systems, we think the questions raised in the agenda are beneficial from a range of both normative views and empirical beliefs about the future course of AI development, even if at EAF we are particularly concerned with s-risks.
The purpose of this sequence is to
communicate what we think are the most important, tractable, and neglected technical AI research directions for reducing s-risks;
communicate what we think are the most promising directions for reducing downsides from threats more generally;
explicate several novel or little-discussed considerations concerning cooperation and AI safety, such as surrogate goals;
propose concrete research questions which could be addressed as part of an EAF Fund-supported project, by those interested in working as a full-time researcher at EAF, or by researchers in academia, or at other EA organizations, think tanks, or AI labs;
contribute to the portfolio of research directions which are of interest to the longtermist EA and AI safety communities broadly.
The agenda is divided into the following sections:
AI strategy and governance. What does the strategic landscape at time of TAI development look like (e.g., unipolar or multipolar, balance between offensive and defensive capabilities?), and what does this imply for cooperation failures? How can we shape the governance of AI so as to reduce the chances of catastrophic cooperation failures?
Credibility. What might the nature of credible commitment among TAI systems look like, and what are the implications for improving cooperation? Can we develop new theory (such as open-source game theory) to account for relevant features of AI?
Peaceful bargaining mechanisms. Can we further develop bargaining strategies which do not lead to destructive conflict (e.g., by implementing surrogate goals)?
Contemporary AI architectures. How can we make progress on reducing cooperation failures using contemporary AI tools — for instance, learning to solve social dilemmas among deep reinforcement learners?
Humans in the loop. How do we expect human overseers or operators of AI systems to behave in interactions between humans and AIs? How can human-in-the-loop systems be designed to reduce the chances of conflict?
Foundations of rational agency, including bounded decision theory and acausal reasoning.
We plan to post two sections every other day. The next post in the sequence, “Sections 1 & 2: Introduction, Strategy and Governance” will be posted on Sunday, December 15.
This definition may be updated in the near future. ↩︎