Extrapolating values without outsourcing

I first took note of “Coherent Extrapolated Volition” in 2006. I thought it was a brilliant idea, an exact specification of how to arrive at a better future: Figure out exactly how it is that humans make their existing choices, idealize that human decision procedure according to its own criteria, and then use the resulting “renormalized human utility function” as the value system of an AI. The first step is a problem in cognitive neuroscience, the second step is a conceptual problem in reflective decision theory, and the third step is where you make the Friendly AI.

For some reason, rather than pursuing this research program directly, people interested in CEV talk about using simulated human beings (“uploads”, “ems”, “whole-brain emulations”) to do all the hard work. Paul Christiano just made a post called “Formalizing Value Extrapolation”; but it’s really about formalizing the safe outsourcing of value extrapolation to a group of human uploads. All the details of how value extrapolation is actually performed (e.g. the three steps listed above) are left completely unspecified. Another recent article proposed that making an AI with a submodule based on models of its makers’ opinions is the fast way to Friendly AI. It’s also been suggested to me that simulating human thinkers and running them for centuries of subjective time until they reach agreement on the nature of consciousness is a way to tackle that problem; and clearly the same “solution” could be applied to any other aspect of FAI design, strategy, and tactics.

Whatever its value as a thought experiment, in my opinion this idea of outsourcing the hard work to simulated humans has zero practical value, and we would be much better off if the minuscule sub-sub-culture of people interested in creating Friendly AI didn’t think in this way. Daydreaming about how they’d solve the problem of FAI in Permutation City is a recipe for irrelevance.

Suppose we were trying to make a “C.elegans-friendly AI”. The first thing we would do is take the first step mentioned above—we would try to figure out the C.elegans utility function or decision procedure. Then we would have to decide how to aggregate utility across multiple individuals. Then we would make the AI. Performing this task for H.sapiens is a lot more difficult, and qualitatively new factors enter at the first and second steps, but I don’t see why it is fundamentally different, different enough that we need to engage in the rigmarole of delegating the task to uploaded human beings. It shouldn’t be necessary, and we probably won’t even get the chance to do so; by the time you have hardware and neuro-expertise sufficient to emulate a whole human brain, you will most likely have nonhuman AI anyway.

A year ago, I wrote: “My expectation is that the presently small fields of machine ethics and neuroscience of morality will grow rapidly and will come into contact, and there will be a distributed research subculture which is consciously focused on determining the optimal AI value system in the light of biological human nature. In other words, there will be human minds trying to answer this question long before anyone has the capacity to direct an AI to solve it. We should expect that before we reach the point of a Singularity, there will be a body of educated public opinion regarding what the ultimate utility function or decision method (for a transhuman AI) should be, deriving from work in those fields which ought to be FAI-relevant but which have yet to engage with the problem. In other words, they will be collectively engaging with the problem before anyone gets to outsource the necessary research to AIs.”

I’ll also link to my previous post about “practical Friendly AI”. What I’m doing here is going into a fraction more detail about how you arrive at the Friendly value system. There, I basically said that you just get a committee together and figure it out, clearly an inadequate recipe, but in that article I was focused more on sketching the nature of an organization and a plan which would have some chance of genuinely creating FAI in the real world. Here, I’ll say that working out the Friendly value system consists of: making a naturalistic explanation of how human decision-making occurs; determining the core essentials of that process, and applying its own metamoral criteria to arrive at a “renormalized” decision procedure that has been idealized according to human cognition’s own preferences (“our wish if we knew more, thought faster, were more the people we wished we were”); and then implementing that decision procedure within an AI—this is where all the value-neutral parts of AI research come into play, such as AGI theory, the theory of value stability under self-modification, and so on. That is the sort of “value extrapolation” that we should be “formalizing”—and preparing to carry out in real life.