This is a fantastic set of definitions, and it is definitely useful. That said, I want to add something to what you said near the end. I think the penultimate point needs further elaboration. I’ve spoken about “multi-agent Goodhart” in other contexts, and discussed why I think it’s a fundamentally hard problem, but I don’t think I’ve really clarified how I think this relates to alignment and takeoff. I’ll try to do that below.
Essentially, I think that the question of multipolarity versus individual or collective takeoff is critical, as (to me) it is the most worrying scenario for alignment.
Individual or collective vs. Multipolar takeoff
Individual takeoff implies that a coherent, agentic system is being improved or accelerating, where takeoff could be defined by either economic growth, where a single company or system accounts for a majority of humanity’s economic output, or otherwise be a foom or similar scenario. Collective takeoff would imply that a set of agentic systems are accelerating in ways that are (in the short term) non-competitive. If humanity as a whole does benefit widely from greatly increased economic growth, at some point even doubling in a year, yet there is no single dominant system, this would be a collective takeoff.
Multipolar takeoff, however, is a scenario where systems are actively competing in some domain. It seems plausible that competition of this sort would provide incentives for rapid improvement that could impact even non-agentic systems like Drexler’s CAIS. Alternatively, or additionally, improvement could be enabled by feedback from competition with peer or near-peer systems. (This seems to be the way humans developed intelligence, and so it seems a-priori worrying.) In either case, this type of takeoff could involve zero or negative sum interaction between systems. If a single winner emerged quickly enough to prevent destructive competition, it would be the “evolutionary” winner, with goals being aligned with success. For that reason, it seems implausible to me that it would be aligned with humanity’s interests as a whole. If no winner emerged, it seems that convergent instrumental goals combined with rapidly increasing capabilities would lead to at best a Hansonian Em-scenario, where systems respect property and other rights, but all available resources would be directed towards competition, and systems would be expanded to take over resources until the marginal cost of expansion equals marginal benefit. It seems implausible that in a takeoff scenario, competition reaching this point would leave significant resources for the remainder of humanity, likely at least wasting our cosmic endowment. If the competition turned negative sum, there could be even faster races to the bottom, leading to worse consequences.
This is a fantastic set of definitions, and it is definitely useful. That said, I want to add something to what you said near the end. I think the penultimate point needs further elaboration. I’ve spoken about “multi-agent Goodhart” in other contexts, and discussed why I think it’s a fundamentally hard problem, but I don’t think I’ve really clarified how I think this relates to alignment and takeoff. I’ll try to do that below.
Essentially, I think that the question of multipolarity versus individual or collective takeoff is critical, as (to me) it is the most worrying scenario for alignment.
Individual or collective vs. Multipolar takeoff
Individual takeoff implies that a coherent, agentic system is being improved or accelerating, where takeoff could be defined by either economic growth, where a single company or system accounts for a majority of humanity’s economic output, or otherwise be a foom or similar scenario. Collective takeoff would imply that a set of agentic systems are accelerating in ways that are (in the short term) non-competitive. If humanity as a whole does benefit widely from greatly increased economic growth, at some point even doubling in a year, yet there is no single dominant system, this would be a collective takeoff.
Multipolar takeoff, however, is a scenario where systems are actively competing in some domain. It seems plausible that competition of this sort would provide incentives for rapid improvement that could impact even non-agentic systems like Drexler’s CAIS. Alternatively, or additionally, improvement could be enabled by feedback from competition with peer or near-peer systems. (This seems to be the way humans developed intelligence, and so it seems a-priori worrying.) In either case, this type of takeoff could involve zero or negative sum interaction between systems. If a single winner emerged quickly enough to prevent destructive competition, it would be the “evolutionary” winner, with goals being aligned with success. For that reason, it seems implausible to me that it would be aligned with humanity’s interests as a whole. If no winner emerged, it seems that convergent instrumental goals combined with rapidly increasing capabilities would lead to at best a Hansonian Em-scenario, where systems respect property and other rights, but all available resources would be directed towards competition, and systems would be expanded to take over resources until the marginal cost of expansion equals marginal benefit. It seems implausible that in a takeoff scenario, competition reaching this point would leave significant resources for the remainder of humanity, likely at least wasting our cosmic endowment. If the competition turned negative sum, there could be even faster races to the bottom, leading to worse consequences.