We Need Holistic AI Macrostrategy
AI Macrostrategy is the study of high level questions having to do with prioritizing the use of resources on the current margin in order to achieve good AI outcomes. AI macrostrategy seems important if it is tractable. However, while few people are working on estimating particular parameters relevant to macrostrategy, even fewer are working on developing holistic macrostrategic models that combine estimates for different parameters to guide our actions. Moreover, while macrostrategy was less tractable in the past, recent developments (especially increased evidence for <10 year timelines) have made macrostrategy substantially more tractable. Thus, using the importance/tractability/neglectedness heuristics from EA, I conclude that on current margins macrostrategy should be a top priority.
Thanks to Chinmay Deshpande, Carson Ezell, Nikola Jurkovic, and others for helping me to develop many of the ideas in this post, and to Thomas Larsen for both doing this and helping to directly edit the post.
Speculative, but I think the arguments are pretty straightforward and so I have >70% confidence in the main conclusion that more macrostrategy work should be done on current margins relative to other kinds of alignment work.
What is AI Macrostrategy?
AI Macrostrategy (henceforth just macrostrategy) is the study of high level questions having to do with prioritizing the use of resources to achieve good AGI outcomes on the current margin.
Macrostrategic work can be divided broadly into two categories:
Parameter estimates: attempts to forecast key variables such as timelines, takeoff speeds, and the difficulty of aligning AGI
Holistic macrostrategy: attempts to combine these estimates and other pieces of data into a coherent, action-guiding model of AI alignment.
For examples of macrostrategic questions, Holden mentions several central macrostrategic questions in this post.
Importance of Macrostrategy
I think that attempting to answer macrostrategic questions is extremely important for four primary reasons.
Importance of Prioritization in Heavy Tailed Domains
It is widely accepted among Effective Altruists that the distribution of impactfulness among different cause areas is heavy tailed. However, while I expect that the distribution of impactfulness among different AI interventions is not as heavy tailed as the distribution of impactfulness among cause areas in general, I do expect it to be at least somewhat heavy tailed, with the best interventions being >2 orders of magnitude more effective in expectation than the median intervention. Thus, it is critical to identify the best interventions rather than settling for interventions that seem vaguely pointed in the direction of solving alignment/making AGI go well. However, identifying these interventions requires some kind of macrostrategic model. Thus, applying the basic heuristic that prioritization is important in heavy tailed domains already suggests that macrostrategy is quite important.
Achieving The Best Long Term Outcomes Requires Macrostrategy
In addition to the distribution of the impactfulness of different AI interventions, the distribution of value across possible long run futures is also likely to be heavy tailed. This is because if what happens in the long run future will be controlled by powerful optimizers such as superintelligent AI, then due to the fact that tails come apart, most of the expected value relative to a particular utility function lies in futures where the powerful optimizers controlling the future in question are optimizing that specific utility function (or something extremely close to it). As a result, if you have consequentialist values, you should be focused on trying to achieve futures where the powerful optimizers are very closely aligned with your values.
The “value lock-in” hypothesis is the claim that the long run future will be dictated by the values which we build into the first superintelligent AI(s) which we create, as they will be the optimizers controlling the long run future. If the value lock-in hypothesis is true, then in conjunction with the heavy tailed-ness of long run outcomes, it implies that it is very important to influence exactly which values the AI optimizes for. While aligning an AI in the sense of creating an AI which does not cause existential doom is a narrow target, aligning an AI with the utilitarian-ish values characteristic of LW/EA types is yet a narrower target. For example, extremely “conservative” values which prioritize staying on Earth and turning the world into a “contemporary utopia” (i. e. people have lives ~as good as very rich people nowadays) would avoid humanity literally going extinct. However, from a total utilitarian perspective, such futures are ~valueless compared to futures where we expand outwards and create many flourishing beings.
Moreover, while ensuring that AGI is sufficiently aligned that it does not cause existential doom seems to me like a primarily technical question, increasing the odds that it is aligned with the “best” values seems to depend more on macrostrategic questions. For example, while making a robustly corrigible AI and making a sovereign AI that has aligned values are both considered “solutions to alignment” from a technical perspective, they have significantly different implications for what the future will actually look like from a macrostrategic perspective. Similarly, this framework draws attention to the importance of trying to increase the probability that the lab which first creates AGI has highly aligned values, either by trying to directly influence the values of AGI developers or by trying to slow down progress at less aligned labs. It also speaks in favor of trying to accelerate progress at more aligned labs, but other considerations speak against such interventions because they worsen race dynamics and reduce the amount of time we have to find a technical solution to alignment. Overall, I expect that if we are able to find a “solution” to alignment in the sense of a low-cost way to reliably “aim” an AI at a desired objective, this alone would make the worst case misalignment scenarios unlikely. But it would be far from sufficient for realizing the best outcomes, which depends on many other more macrostrategic questions.
We Need Cooperation/Good “Last Mile Work” From AGI Developers
Third, in my view it is pretty unlikely that we develop a complete solution to alignment prior to the development of AGI that can be straightforwardly implemented by the developer of the first AGI regardless of the details of what AGI actually looks like. Instead, I expect that solving alignment will require whoever actually builds AGI to work out the details of an alignment scheme that works given the architecture they are using, the size of their model, their willingness to pay an alignment tax, etc. While certain parts of such a scheme such as a solution to ELK seem more independent of these details, I suspect that a full solution cannot be worked out until more of them are known. Indeed, some key sub-problems such as responding to the efforts of an AGI to hide its thoughts from our interpretability tools may only come up while it is being trained. Overall, I expect alignment outcomes to be significantly if not primarily determined by the quality of the “last mile” work done by the first AGI developer and other actors in close cooperation with them in the ~2 years prior to the development of AGI. This is doubly true if the solution to alignment involves using AIs to help with alignment research, as these systems do not exist yet and will likely only exist when AGI is already relatively close.
If this model of alignment success is true, it increases the importance of macrostrategic work. This is because influencing the quality of the aforementioned last mile work depends on macrostrategic questions such as timelines, takeoff speeds, which actors seem likely to develop AGI, how seriously the actors in question take alignment, and what the first AGI will look like.
It’s Important To Have Plans
Finally, at a higher level of abstraction, I agree with Yudkowsky and Bensinger that it is important when working on any complicated problem with lots of potential failure modes to have a concrete-ish idea of what the “path to victory” looks like. Right now, it seems to me like lots of people have a vague idea that it is important to do alignment research, but do not have an (even vague) end-to-end model of how their work will actually end up being used by an organization that is developing AGI in such a way that it causes that AI to be aligned. This is bad.
Neglectedness Of (Holistic) Macrostrategy
In addition to being highly important, holistic macrostrategy is also highly neglected. In the past two years we have seen some excellent work done to estimate important macrostrategic parameters, most notably Ajeya Cotra’s bioanchors report. Such work is an essential part of the broader research area of macrostrategy.
I think that this work is necessary but insufficient for figuring out holistic macrostrategy, which in turn is very useful for selecting better actions right now, as many of the ways in which macrostrategy informs our actions depends on how different parameters interact. For example, there is an important relationship between timelines and which research agendas seem promising, as we should not invest in research agendas that seem promising but likely to take longer than our timelines estimates. Similarly, what kind of system the first AGI will be has implications for takeoff speeds, which in turn has implications for whether certain oversight-based research agendas might work, etc. Overall, I think coming to action-guiding macrostrategic conclusions generally requires integrating multiple different parameters into a single model.
My sense is that relatively little of this “holistic macrostrategy” work has been done. I would estimate that while there are hundreds of people working on alignment in some capacity, there are less than 20 who are primarily trying to form holistic macrostrategic models. Moreover, while some people (particularly those within the AI governance community) seem to have more comprehensive models, these models do not seem to me to be widely propagated throughout the broader AI safety community. Thus, I suspect that there is lots of low hanging fruit in both developing macrostrategic models and deriving their implications for how we should act.
Tractability Of Macrostrategy
For a while, I thought that macrostrategy was largely intractable. My view was that there were certain basic insights that seem pretty straightforward to me and others in the community (“misalignment is a greater x-risk than misuse,” “the marginal AGI developer is generally bad”), and that many other, more detailed parameters were very difficult to estimate due to being too far in the future.
However, I believe that macrostrategy has become much more tractable for two reasons. Firstly, the success of scaling has increased the plausibility of short (5-10 year) timelines. Because forecasting on 5-10 year timelines is much easier than forecasting on 20+ year timelines, this makes macrostrategy much more tractable. Moreover, I believe that key parts of the macrostrategic picture have started to come into focus. For example:
In addition to making forecasting easier, short timelines have significant direct macrostrategy implications (e. g. don’t rely on solutions that will likely take >15 years).
The success of large language models also suggests that they (or something similar to them) may lead to AGI. This gives us lots of information about what kinds of threat models are most likly and what kinds of alignment solutions are likely to work. For example, the fact that language models are perhaps better understood as “simulators” rather than as “agents” makes research agendas based off of automating alignment research more promising, as we may not have to fully align language models for them to be simultaneously powerful enough to meaningfully contribute to safety research without posing the x-risk that more agentic systems would.
If scaling works, then AGI is primarily bottlenecked by compute. This gives bigger developers with the capital to buy a lot of compute a massive advantage, making it likely that one of them will develop AGI. However, there are only a handful of actors that are focused on AGI and large enough to plausibly develop AGI if compute is the key resource (DAO + FAIR, Google Brain). Moreover, fewer such actors than I would have expected have seriously entered the AGI race in the last 5 years, suggesting to me that one of those specific actors will develop AGI.
A decreasing likelihood that China will win the AGI race. Between the CHIPS act, short timelines, and its recent political dysfunctionality, China seems increasingly unlikely to beat the US to AGI.
The emergence of (in my opinion) plausibly successful research agendas. Alignment is still an extremely new field, and so I would not be surprised if nothing we are currently doing ends up solving the “central” part of the alignment problem. But I think that some promising research agendas such as ELK have been formulated relatively recently, and that this provides important information for how difficult alignment is likely to be and how to prioritize between different agendas.
People may disagree with me on a number of these specific points, but overall, I think it is pretty clear that the macrostrategic picture is much clearer now than it was a couple of years ago. Moreover, because it is easier to map territory when it is already partially known, this makes macrostrategy more tractable.
Overall, holistic macrostrategy is highly important and much more tractable than it was in the past. However, it remains quite neglected. Thus, I am extremely excited about the marginal person doing holistic macrostrategy work, probably moreso than them doing technical safety work. I also think that people who are not doing macrostrategy should pay more attention to it and have more detailed models for how what they are doing fits into the project of actually making AGI go well.