Rohin Shah comments on Clarifying “What failure looks like”

Rohin Shah 5 Oct 2020 23:27 UTC
LW: 4 AF: 4
AF
Planned summary for the Alignment Newsletter:
The first scenario outlined in <@What failure looks like@> stems from a failure to specify what we actually want, so that we instead build AI systems that pursue proxies of what we want instead. As AI systems become responsible for more of the economy, human values become less influential relative to the proxy objectives the AI systems pursue, and as a result we lose control over the future. This post aims to clarify whether such a scenario leads to _lock in_, where we are stuck with the state of affairs and cannot correct it to get “back on course”. It identifies five factors which make this more likely:
1. _Collective action problems:_ Many human institutions will face competitive (short-term) pressures to deploy AI systems with bad proxies, even if it isn’t in humanity’s long-term interest.
2. _Regulatory capture:_ Influential people (such as CEOs of AI companies) may benefit from AI systems that optimize proxies, and so oppose measures to fix the issue (e.g. by banning such AI systems).
3. _Ambiguity:_ There may be genuine ambiguity about whether it is better to have these AI systems that optimize for proxies, even from a long-term perspective, especially because all clear and easy-to-define metrics will likely be going up (since those can be turned into proxy objectives).
4. _Dependency:_ AI systems may become so embedded in society that society can no longer function without them.
5. _Opposition:_ The AI systems themselves may oppose any fixes we propose.
We can also look at historical precedents. Climate change has been exacerbated by factors 1-3, though if it does lead to lock in that will be “because of physics” unlike the case with AI. The agricultural revolution, which arguably made human life significantly worse, still persisted thanks to its productivity gains (factor 1) and the loss of hunter-gathering skills (factor 4). When the British colonized New Zealand, the Maori people lost significant control over their future, because each individual chief needed guns (factor 1), trading with the British genuinely made them better off initially (factor 3), and eventually the British turned to manipulation, confiscation and conflict (factor 5).
With AI in particular, we might expect that an increase in misinformation and echo chambers exacerbates ambiguity (factor 3), and that due to its general-purpose nature dependency (factor 4) may be more of a risk.
The post also suggests some future directions for estimating the _severity_ of lock in for this failure mode.
Planned opinion:
I think this topic is important and the post did it justice. I feel like factors 4 and 5 (dependency and opposition) capture the reasons I expect lock in, with factors 1-3 as less important but still relevant mechanisms. I also really liked the analogy with the British colonization of New Zealand—it felt like it was in fact quite analogous to how I’d expect this sort of failure to happen.
Random note: initially I thought this post was part 1 of N, and only later did I realize the “part 1” was a modifier to “what failure looks like”. That’s partly why it wasn’t summarized till now—I was waiting for future parts to show up.