Power dynamics as a blind spot or blurry spot in our collective world-modeling, especially around AI

Where I’m coming from

***Epistemic status: personal experience***

In a number of prior posts, and in ARCHES, I’ve argued that more existential safety consideration is needed on the topic of multi-principal/multi-agent (multi/multi) dynamics among powerful AI systems.

In general, I have found it much more difficult to convince thinkers within and around LessWrong’s readership base to attend to multi/multi dynamics, as opposed to, say, convincing generally morally conscious AI researchers who are not (yet) closely associated with the effective altruism or rationality communities.

Because EA/rationality discourse is particularly concerned with maintaining good epistemic processes, I think it would be easy to conclude from this state of affairs that

multi/multi dynamics are not important (because communities with great concern for epistemic process do not care about them much), and
AI researchers who do care about multi/multi dynamics have “bad epistemics” (e.g., because they have been biased by institutionalized trends).

In fact, more than one LessWrong reader has taken these positions with me in private conversation, in good faith (I’m almost certain).

In this post, I wish to share an opposing concern: that the EA and rationality communities have become systematically biased to ignore multi/multi dynamics, and power dynamics more generally.

A history of systemic avoidance

***Epistemic status: self-evidently important considerations based on somewhat-publicly verifiable facts/trends.***

Our neglect of multi/multi dynamics has not been coincidental. For a time, influential thinkers in the rationality community intentionally avoided discussions of multi/multi dynamics, so as to avoid contributing to the sentiment that the development and use of AI technology would be driven by competitive (imperfectly cooperative) motives. (FWIW, I also did this sometimes.) The idea was that we — the rationality community — should avoid developing narratives that could provoke businesses and state leaders into worrying about whose values would be most represented in powerful AI systems, because that might lead them to go to war with each other, ideologically or physically.

Indeed, there was a time when this community — particularly the Singularity Institute — represented a significant share of public discourse on the future of AI technology, and it made sense to be thoughtful about how to use that influence. Eliezer recently wrote (in a semi-private group, but with permission to share):

The vague sense of assumed common purpose, in the era of AGI-alignment thinking from before Musk, was a fragile equilibrium, one that I had to fight to support every time some wise fool sniffed and said “Friendly to who?”. Maybe somebody much weaker than Elon Musk could and inevitably would have smashed that equilibrium with much less of a financial investment, reducing Musk’s “counterfactual impact”. Maybe I’m an optimistic fool for thinking that this axis didn’t just go from 0%-in-practice to 0%-in-practice. But I am still inclined to consider people a little responsible for the thing that they seem to have proximally caused according to surface appearances. That vague sense of common purpose might have become stronger if it had been given more time to grow and be formalized, rather than being smashed.

That ship has now sailed. Perhaps it was right to worry that our narratives could trigger competition between states and companies, or perhaps the competitive dynamic was bound to emerge anyway and it was hubristic to think ourselves so important. Either way, carefully avoiding questions about multi/multi dynamics on LessWrong or The Alignment Forum will not turn back the clock. It will not trigger an OpenAI/DeepMind merger, nor unmake statements by the US or Chinese governments concerning the importance of AI technology. As such, it no longer makes sense to worry that “we” will accidentally trigger states or big companies to worry more-than-a-healthy-amount about who will control future AI systems, at least not simply by writing blog posts or papers about the issue.

Multi-stakeholder concerns as a distraction from x-risk

*** Epistemic status: personal experience, probably experienced by many other readers ***

I’ve also been frustrated many times by the experience of trying to point out that AI could be an existential threat to humanity, and being met with a response that side-steps the existential threat and says something like, “Yes, but who gets to decide how safe it should be?” or “Yes, but whose values will it serve if we do make it safe?”. This happened especially-much in grad school, around 2010–2013, when talking to other grad students. The multi-stakeholder considerations were almost always raised in ways that seemed to systematically distract conversation away from x-risk, rather than action-oriented considerations of how to assemble a multi-stakeholder solution to x-risk.

This led me to build up a kind of resistance toward people who wanted to ask questions about multi/multi dynamics. However, by 2015, I started to think that multi-stakeholder dynamics were going to play into x-risk, even at a technical scale. But when I point this out to other x-risk-concerned people, it often feels like I’m met with the same kind of immune response that I used to have toward people with multi-stakeholder concerns about AI technology.

Other than inertia resulting from this “immune” response, I think there may be other factors contributing to our collective blind-spot around multi/multi dynamics, e.g., a collective aversion to politics.

Aversion to thinking about political forces

*** Epistemic status: observation + speculation ***

Politics is the Mind-Killer (PMK) is one of the most heavily quoted posts on LessWrong. By my count of the highly-rated posts that cite it, the post has a “LessWrong h-index” of 32, i.e., 32 posts citing that are each rated 32 karma or higher:

The PMK post does not directly advocate for readers to avoid thinking about politics; in fact, it says “I’m not saying that I think we should be apolitical, or even that we should adopt Wikipedia’s ideal of the Neutral Point of View.” However, it also begins with the statement “People go funny in the head when talking about politics”. If a person doubts their own ability not to “go funny in the head”, the PMK post — and concerns like it — could lead them to avoid thinking about or engaging with politics as a way of preventing themselves from “going funny”.

Furthermore, one can imagine this avoidance leading to an under-development of mental and social habits for thinking and communicating clearly about political issues, conceivably making the problem worse for some individuals or groups. This has been remarked before, in the “Has “politics is the mind-killer” been a mind-killer?”.

Finally, these effects could easily combine to create a culture filter selecting heavily for people who dislike or find it difficult to interact with political forces in discourse.

*** Epistemic status: personal experience ***

The above considerations are reflective of my experience. For instance, at least five people that I know and respect as intellectuals within this community have shared with me that they find it difficult to think about topics that their friends or co-workers disagree with them about.

“Alignment” framings from MIRI’s formative years.

*** Epistemic status: publicly verifiable facts ***

Aligning Superintelligence with Human Interests: A Technical Research Agenda (soares2015aligning) was written at a time when laying out precise research objectives was an important step in establishing MIRI as an institution focused primarily on research rather than movement-building (after re-branding from SingInst). The problem of aligning a single agent with a single principal is a conceptually simple starting point, and any good graduate education in mathematics will teach you that for the purpose of understanding something confusing, it’s always best to start with the simplest non-trivial example.

*** Epistemic status: speculation ***

Over time, friends and fans of MIRI may have over-defended this problem framing, in the course of defending MIRI itself as an fledgling research institution against social/political pressures to dismiss AI as a source of risk to humanity. For instance, back in 2015, folks like Andrew Ng were in the habit of publicly claiming that worrying about AGI was “like worrying about overpopulation on mars” (The Register; Wired), so it was often more important to say “No, that doesn’t make sense, it’s possible powerful AI systems to be misaligned with what we want” than to address the more nuanced issue that “Moreover, we’re going to want a lot of resilient socio-technical solutions to account for disagreements about how powerful systems should be used.”

Corrective influences from the MIRI meme-plex

*** Epistemic status: publicly verifiable facts ***

Not all influences from the MIRI-sphere have pushed us away from thinking about multi-stakeholder issues. For instance, Inadequate Equilibria (yudkowsky2017inadequate), is clearly an effort to help this community to think about multi-agent dynamics, which could help with thinking about politics. Reflective Oracles (fallenstein2015reflective) are another case of this, but in a highly technical context that probably (according-to-me: unfortunately) didn’t have much effect on broader rationality-community-discourse.

The “problems are only real when you can solve them” problem

*** Epistemic status: personal experience / reflections***

It seems to me that many will people tend to ignore a given problem until it becomes sufficiently plausible that the problem can be solved. Moreover, I think their «ignore the problem» mental operation often goes as far as «believing the problem doesn’t exist». I saw MIRI facing this for years when trying to point to AI friendliness or alignment as a problem. Frequently people would ask “But what would a solution look like?”, and absent a solution, they’d tag the problem as “not a real problem” rather than just “a difficult problem”.

I think the problem of developing AI tech to enable cooperation-rather-than-conflict is in a similar state right now. Open Problems in Cooperative AI (dafoe2020cooperative) is a good start at carving out well-defined problems, and I’m hoping a lot more work will follow in that vein.

Conclusion

*** Epistemic status: personal reflections ***

I think it’s important to consider the potential that a filter-bubble with a fair amount of inertia has formed as a result of our collective efforts to defend AI x-risk as a real and legitimate concern, and to consider what biases or weaknesses are most likely to now be present in that filter bubble. Personally, I think we’ve developed such a blind spot around technical issues with multi-principal/multi-agent AI interaction, but since Open Problems in Cooperative AI (dafoe2020cooperative), it might be starting to clear up. Similarly, aversion to political thinking may also be weakening our collective ability to understand, discuss, and reach consensus on the state of the political world, particularly surrounding AI.