Nihiland: There is not even a coherent uni-agent theory, not to mention multi-agency. I find this world quite unlikely, but leave it here for the sake of completeness (and for the sake of the number 5). Closely related is antirealism about rationality, which I have criticized in the past. In this world it is not clear whether the alignment problem is well-posed at all.
Discordia: There is a coherent uni-agent theory, but no coherent theory of multi-agency. This world is conceivable, since the current understanding of multi-agency is much worse than the understanding of “solitary” agents. In this world, negative-sum conflicts and coordination failures are probably ubiquitous (even among arbitrarily sophisticated agents), because there is no principle of rationality that rules them out. Acausal trade is probably not a thing, or at least rare and fragile. In the context of value learning, there might be no principled way to deal with uncertainty (which could otherwise be regarded as a bargaining problem). There is also no principled solution to multi-user alignment.
Linguistica: There is a coherent theory of multi-agency, but agents are inevitably divided into “types” s.t. only interactions between agents of the same type have strong guarantees. (The name of the world is because we can metaphorically think of the types as different “languages”.) An example of how this might happen is reflective oracles, where the type corresponds to the choice of fixed point. Acausal trade probably exists[1], but is segregated by type. Alignment is complicated by the need to specify or learn the human type.
Economica: There is a coherent uni-type theory of multi-agency, but this theory involves desiderata that can only be motivated by multi-agency. Explicitly thinking about multi-agency is necessary to construct the full theory of agents[2]. In this world, the Yudkowskian hope for ubiquitous strong cooperation guarantees can be justified, and acausal trade might be very common. Figuring out the multi-agent theory, and not just the uni-agent fragment, is probably important for alignment, or at least necessary in order to avoid leaving huge gains from trade on the table.
Harmonia: There is a coherent uni-type theory of multi-agency, and this theory can be derived entirely from desiderata that can be motivated without invoking multi-agency at all. There is no special “mutli-agent sauce”: any sufficiently rational agents automatically have strong guarantees in the multi-agent setting. Explicitly understanding multi-agency is arguably still important for dealing with uncertainty in value-learning, and dealing with multi-user alignment. (And also in order to know that we are in this world.)
For simplicity, I’m ignoring what is arguably an “orthogonal” axis: to which extent the “correct” multi-agent theory implies acausal cooperation even under favorable conditions. I believe that, outside of Nihiland and Discordia, it probably does, but the alternative hypothesis is also tenable.
On the border between Linguistica and Economica, there are worlds with strong guarantees for agents of the same type and medium-strength guarantees for agents of different type (where “medium-strength” is still stronger than “achieve maximin payoff”: the latter is already guaranteed in infra-Bayesianism). This blurs the boundary, but I would consider this to be Linguistica if even slightly different types have much weaker guarantees (or if there is no useful notion of “slightly different types”) and Economica if there is continuous graceful degradation like in Yudkowsky’s subjective fairness proposal.
I propose a taxonomy of 5 possible worlds for multi-agent theory, inspired by Imagliazzo’s 5 possible worlds of complexity theory (and also the Aaronson-Barak 5 worlds of AI):
Nihiland: There is not even a coherent uni-agent theory, not to mention multi-agency. I find this world quite unlikely, but leave it here for the sake of completeness (and for the sake of the number 5). Closely related is antirealism about rationality, which I have criticized in the past. In this world it is not clear whether the alignment problem is well-posed at all.
Discordia: There is a coherent uni-agent theory, but no coherent theory of multi-agency. This world is conceivable, since the current understanding of multi-agency is much worse than the understanding of “solitary” agents. In this world, negative-sum conflicts and coordination failures are probably ubiquitous (even among arbitrarily sophisticated agents), because there is no principle of rationality that rules them out. Acausal trade is probably not a thing, or at least rare and fragile. In the context of value learning, there might be no principled way to deal with uncertainty (which could otherwise be regarded as a bargaining problem). There is also no principled solution to multi-user alignment.
Linguistica: There is a coherent theory of multi-agency, but agents are inevitably divided into “types” s.t. only interactions between agents of the same type have strong guarantees. (The name of the world is because we can metaphorically think of the types as different “languages”.) An example of how this might happen is reflective oracles, where the type corresponds to the choice of fixed point. Acausal trade probably exists[1], but is segregated by type. Alignment is complicated by the need to specify or learn the human type.
Economica: There is a coherent uni-type theory of multi-agency, but this theory involves desiderata that can only be motivated by multi-agency. Explicitly thinking about multi-agency is necessary to construct the full theory of agents[2]. In this world, the Yudkowskian hope for ubiquitous strong cooperation guarantees can be justified, and acausal trade might be very common. Figuring out the multi-agent theory, and not just the uni-agent fragment, is probably important for alignment, or at least necessary in order to avoid leaving huge gains from trade on the table.
Harmonia: There is a coherent uni-type theory of multi-agency, and this theory can be derived entirely from desiderata that can be motivated without invoking multi-agency at all. There is no special “mutli-agent sauce”: any sufficiently rational agents automatically have strong guarantees in the multi-agent setting. Explicitly understanding multi-agency is arguably still important for dealing with uncertainty in value-learning, and dealing with multi-user alignment. (And also in order to know that we are in this world.)
For simplicity, I’m ignoring what is arguably an “orthogonal” axis: to which extent the “correct” multi-agent theory implies acausal cooperation even under favorable conditions. I believe that, outside of Nihiland and Discordia, it probably does, but the alternative hypothesis is also tenable.
On the border between Linguistica and Economica, there are worlds with strong guarantees for agents of the same type and medium-strength guarantees for agents of different type (where “medium-strength” is still stronger than “achieve maximin payoff”: the latter is already guaranteed in infra-Bayesianism). This blurs the boundary, but I would consider this to be Linguistica if even slightly different types have much weaker guarantees (or if there is no useful notion of “slightly different types”) and Economica if there is continuous graceful degradation like in Yudkowsky’s subjective fairness proposal.