Homogeneity vs. heterogeneity in AI takeoff scenarios
Special thanks to Kate Woolverton for comments and feedback.
There has been a lot of work and discussion surrounding the speed and continuity of AI takeoff scenarios, which I do think are important variables, but in my opinion ones which are relatively less important when compared to many other axes on which different takeoff scenarios could differ.
In particular, one axis on which different takeoff scenarios can differ that I am particularly interested in is their homogeneity—that is, how similar are the different AIs that get deployed in that scenario likely to be? If there is only one AI, or many copies of the same AI, then you get a very homogenous takeoff, whereas if there are many different AIs trained via very different training regimes, then you get a heterogenous takeoff. Of particular importance is likely to be how homogenous the alignment of these systems is—that is, are deployed AI systems likely to all be equivalently aligned/misaligned, or some aligned and others misaligned? It’s also worth noting that a homogenous takeoff doesn’t necessarily imply anything about how fast, discontinuous, or unipolar the takeoff might be—for example, you can have a slow, continuous, multipolar, homogenous takeoff if many different human organizations are all using AIs and the development of those AIs is slow and continuous but the structure and alignment of all of them are basically the same (a scenario which in fact I think is quite plausible).
In my opinion, I expect a relatively homogenous takeoff, for the following reasons:
I expect that the amount of compute necessary to train the first advanced AI system will vastly outpace the amount of compute necessary to run it such that once you’ve trained an advanced AI system you will have the resources necessary to deploy many copies of that trained system and it will be much cheaper to do that than to train an entirely new system for each different application. Even in a CAIS-like scenario, I expect that most of what you’ll be doing to create new services is fine-tuning existing ones rather than doing entirely new training runs.
I expect training compute to be sufficiently high such that the cost of training a competing system to the first advanced AI system will be high enough that it will be far cheaper for most organizations to simply buy/license/use a copy of the first advanced AI from the organization that built it rather than train an entirely new one on their own.
For those organizations that do choose to compete (because they’re a state actor that’s worried about the national security issues involved in using another state’s AI, for example), I think it is highly likely that they will attempt to build competing systems in basically the exact same way as the first organization did, since the cost of a failed training run is likely to be very high and so the most risk-averse option is just to copy exactly what was already shown to work. Furthermore, even if an organization isn’t trying to be risk averse, they’re still likely to be building off of previous work in a similar way to the first organization such that the results are also likely to be fairly similar. More generally, I expect big organizations to generally take the path of least resistance, which I expect to be either buying or copying what already exists with only minimal changes.
Once you start using your first advanced AI to help you build more advanced AI systems, if your first AI system is relatively competent at doing alignment work, then you should get a second system which has similar alignment properties to the first. Furthermore, to the extent that you’re not using your first advanced AI to help you build your second, you’re likely to still be using similar techniques, which will likely have similar alignment properties. This is especially true if you’re using the first system as a base to build future ones (e.g. via fine-tuning). As a result, I think that homogeneity is highly likely to be preserved as AI systems are improved during the takeoff period.
EDIT: Eventually, you probably will start to get more risk-taking behavior as the barrier to entry gets low enough for building an equivalent to the first advanced AI and thus a larger set of actors become capable of doing so. By that point, however, I expect the state-of-the-art to be significantly beyond the first advanced AI such that any systems created by such smaller, lower-resourced, more risk-taking organizations won’t be very capable relative to the other systems that already exist in that world—and thus likely won’t pose an existential risk.
Once you accept homogenous takeoff, however, I think it has a bunch of far-reaching consequences, including:
It’s unlikely for there to exist both aligned and misaligned AI systems at the same time—either all of the different AIs will be aligned to approximately the same degree or they will all be misaligned to approximately the same degree. As a result, scenarios involving human coalitions with aligned AIs losing out to misaligned AI coalitions are relatively unlikely, which rules out some of the ways in which the strategy-stealing assumption might fail.
Cooperation and coordination between different AIs is likely to be very easy as they are likely to be very structurally similar to each other if not share basically all of the same weights. As a result, x-risk scenarios involving AI coordination failures or s-risk scenarios involving AI bargaining failures (at least those that don’t involve acausal trade) are relatively unlikely.
It’s unlikely you’ll get a warning shot for deceptive alignment, since if the first advanced AI system is deceptive and that deception is missed during training, once it’s deployed it’s likely for all the different deceptively aligned systems to be able to relatively easily coordinate with each other to defect simultaneously and ensure that their defection is unrecoverable (e.g. Paul’s “cascading failures”).
Homogeneity makes the alignment of the first advanced AI system absolutely critical (in a similar way to fast/discontinuous takeoff without the takeoff actually needing to be fast/discontinuous), since whether the first AI is aligned or not is highly likely to determine/be highly correlated with whether all future AIs built after that point are aligned as well. Thus, homogenous takeoff scenarios demand a focus on ensuring that the first advanced AI system is actually sufficiently aligned at the point when it’s first built rather than relying on feedback mechanisms after the first advanced AI’s development to correct issues.
Regardless, in general, I’d very much like to see more discussion of the extent to which different people expect homogenous vs. heterogenous takeoff scenarios—similar to the existing discussion of slow vs. fast and continuous vs. discontinuous takeoffs—as it’s an in my opinion very important axis on which takeoff scenarios can differ that I haven’t seen much discussion of.