Yudkowsky on AGI ethics

A Cornell computer scientist recently wrote on social media:

[...] I think the general sense in AI is that we don’t know what will play out, but some of these possibilities are bad, and we need to start thinking about it. We are plagued by highly visible people ranging from Musk to Ng painting pictures ranging from imminent risk to highly premature needless fear, but that doesn’t depict the center of gravity, which has noticeably shifted to thinking about the potential bad outcomes and what we might do about it. (Turning close to home to provide an example of how mainstream this is becoming, at Cornell two AI professors, Joe Halpern and Bart Selman, ran a seminar and course last semester on societal and ethical challenges for AI, and only just a few weeks ago we had a labor economist speak in our CS colloquium series about policy ideas targeting possible future directions for CS and AI, to an extremely large and enthusiastic audience.)

To which Eliezer Yudkowsky replied:

My forecast of the net effects of “ethical” discussion is negative; I expect it to be a cheap, easy, attention-grabbing distraction from technical issues and technical thoughts that actually determine okay outcomes. [...]

The ethics of bridge-building is to not have your bridge fall down and kill people and there is a frame of mind in which this obviousness is obvious enough. How not to have the bridge fall down is hard.

This is possibly surprising coming from the person who came up with coherent extrapolated volition, co-wrote the Cambridge Handbook of Artificial Intelligence article on “The Ethics of AI,” etc. The relevant background comes from Eliezer’s writing on the minimality principle:

[W]hen we are building the first sufficiently advanced Artificial Intelligence, we are operating in an extremely dangerous context in which building a marginally more powerful AI is marginally more dangerous. The first AGI ever built should therefore execute the least dangerous plan for preventing immediately following AGIs from destroying the world six months later. Furthermore, the least dangerous plan is not the plan that seems to contain the fewest material actions that seem risky in a conventional sense, but rather the plan that requires the least dangerous cognition from the AGI executing it. Similarly, inside the AGI itself, if a class of thought seems dangerous but necessary to execute sometimes, we want to execute the least instances of that class of thought required to accomplish the overall task.

E.g., if we think it’s a dangerous kind of event for the AGI to ask “How can I achieve this end using strategies from across every possible domain?” then we might want a design where most routine operations only search for strategies within a particular domain, and events where the AI searches across all known domains are rarer and visible to the programmers. Processing a goal that can recruit subgoals across every domain would be a dangerous event, albeit a necessary one, and therefore we want to do less of it within the AI.”

So the technical task of figuring out how to build a robust minimal AGI system that’s well-aligned with its operators’ intentions is very different from “AI ethics”; and the tendency to conflate those two has plausibly caused a lot of thought and attention to go into much broader (or much narrower) issues that could have more profitably gone into thinking about the alignment problem.

One part of doing the absolute bare world-saving minimum with a general-purpose reasoning system is steering clear of any strategies that require the system to do significant moral reasoning (or implement less-than-totally-airtight moral views held by its operators). Just execute the most simple and straightforward concrete sequence of actions, requiring the least dangerous varieties and quantity of AGI cognition needed for success.

Another way of putting this view is that nearly all of the effort should be going into solving the technical problem, “How would you get an AI system to do some very modest concrete action requiring extremely high levels of intelligence, such as building two strawberries that are completely identical at the cellular level, without causing anything weird or disruptive to happen?”

Where obviously it’s important that the system not do anything severely unethical in the process of building its strawberries; but if your strawberry-building system requires its developers to have a full understanding of meta-ethics or value aggregation in order to be safe and effective, then you’ve made some kind of catastrophic design mistake and should start over with a different approach.