What a practical plan for Friendly AI looks like

I have seen too many discussions of Friendly AI, here and elsewhere (e.g. in comments at Michael Anissimov’s blog), detached from any concrete idea of how to do it. Sometimes the issue is the lack of code, demos, or a practical plan from SIAI. SIAI is seen as a source of wishful thinking about magic machines that will solve all our problems for us, or as a place engaged in a forever quest for nebulous mathematical vaporware such as “reflective decision theory”. You will get singularity enthusiasts who say, it’s great that SIAI has given the concept of FAI visibility, but enough with the philosophy, let’s get coding! … does anyone know where to start? And you will get singularity skeptics who say, unfriendly AI is a bedtime ghost story for credulous SF fans, wake me up when SIAI actually ships a product. Or, within this subculture of rationalist altruists who want to do the optimal thing, you’ll get people saying, I don’t know if I should donate, because I don’t see how any of this is supposed to happen.

So in this post I want to sketch what a “practical” plan for Friendly AI looks like. I’m not here to advocate this plan—I’m not saying this is the right way to do it. I’m just providing an example of a plan that could be pursued in the real world. Perhaps it will also allow people to better understand SIAI’s indirect approach.

I won’t go into the details of financial or technical logistics. If we were talking about how to get to the moon from Earth, then the following plan is along the lines of “Make a chemical-powered rocket big enough to get you there.” Once you have that concept, you still have a lot of work to do, but you are at least on the right track—compared to people who want to make a teleportation device or a balloon that goes really high. But I will make one remark about how the idea of Friendly AI is framed. At present, it is discussed in conjunction with a whole cornucopia of science fiction notions such as: immortality, conquering the galaxy, omnipresent wish-fulfilling super-AIs, good and bad Jupiter-brains, mind uploads in heaven and hell, and so on. Similarly, we have all these thought-experiments: guessing games with omniscient aliens, decision problems in a branching multiverse, “torture versus dust specks”. Whatever the ultimate relevance of such ideas, it is clearly possible to divorce the notion of Friendly AI from all of them. If a FAI project was trying to garner mass support, it first needs to be comprehensible, and the simple approach would be to say it is simply an exercise in creating artificial intelligence that does the right thing. Nothing about utopia; nothing about dystopia caused by unfriendly AI; nothing about godlike superintelligence; just the scenario, already familiar in popular culture, of robots, androids, computers you can talk with. All that is coming, says the practical FAI project, and we are here to design these new beings so they will be good citizens, a positive rather than a negative addition to the world.

So much for how the project describes itself to the world at large. What are its guiding technical conceptions? What’s the specific proposal which will allow educated skeptics to conclude that this might get off the ground? Remember that there are two essential challenges to overcome: the project has to create intelligence, and it has to create ethical intelligence; what we call, in our existing discussions, “AGI”—artificial general intelligence—and “FAI”—friendly artificial intelligence.

There is a very simple approach which—like the idea of a chemical-powered rocket which gets you to the moon—should be sufficient to get you to FAI, when sufficiently elaborated. It can be seen by stripping away some of the complexities peculiar to SIAI’s strategy, complexities which tend to dominate the discussion. The basic idea should also be thoroughly familiar. We are to conceive of the AI as having two parts, a goal system and a problem-solving system. AGI is achieved by creating a problem-solving system of sufficient power and universality; FAI is achieved by specifying the right goal system.

SIAI, in discussing the quest for the right goal system, emphasizes the difficulties of this process and the unreliability of human judgment. Their idea of a solution is to use artificial intelligence to neuroscientifically deduce the actual algorithmic structure of human decision-making, and to then employ a presently nonexistent branch of decision theory to construct a goal system embodying ideals implicit in the unknown human cognitive algorithms.

The practical approach would not bother with this attempt to outsource the task of designing the AI’s morality, to a presently nonexistent neuromathematical cognitive bootstrap process. While fully cognizant of the fact that value is complex, as eloquently attested by Eliezer in many speeches, the practical FAI project would nonetheless choose the AI’s goal system in the old-fashioned way, by human deliberation and consensus. You would get a team of professional ethicists, some worldly people like managers, some legal experts in the formulation of contracts, and together you would hammer out a mission statement for the AI. Then you would get your programmers and your cognitive scientists to implement that goal condition in a way such that the symbols have the meanings that they are supposed to have. End of story.

So far, all we’ve done is to make a wish. We’ve decided, after appropriate deliberation, what to wish for, and we have found a way to represent it in symbols. All that means nothing if we can’t create AGI, the problem solver with at least a human level of intelligence. Here again, SIAI comes in for a lot of criticism, from two angles: it’s said to have no ideas about how to create AGI, and it’s said to actively discourage work on AGI, on the grounds that we need to solve the FAI problem first. Instead, it only discusses hopelessly impractical models of cognition like AIXI and exact Bayesian inference, that are mostly of theoretical interest.

Our practical FAI project has “solved” FAI by simply coming to an agreement on what to wish for, and by studying with legalistic care how to avoid pitfalls and loopholes in the finer details of the wish; but what is its approach to the hard technical problem of AGI? The answer is, first of all, heuristics and incremental improvement. Projects like Lenat’s Cyc are on the right track. A newborn AI has to be seeded with useful knowledge, including useful knowledge of problem-solving methods. It doesn’t have time to discover such things entirely unaided. We should not imagine AGI developing just from a simple architecture, like Schmidhuber’s Gödel machine, but from a basic architecture plus a large helping of facts and heuristics which are meant to give it a head start.

So fine, the practical approach to AGI isn’t a search for a single killer concept, it’s a matter of incrementally increasing the power of a general-purpose problem solver with many diverse ingredients in its design, so that it becomes more and more capable and independent. Ben Goertzel’s approach to AGI exhibits the sort of eclectic pluralism that I have in mind. Still, we do need a selling point, something which shows that we’re different, that we’re aiming for the stars and we have a plan to get there.

Here, I want to use Steve Omohundro’s paper “The Basic AI Drives” in a slightly unusual way. The paper lists a number of behaviors that should be exhibited by a sufficiently sophisticated AI: it will try to model its own operation, clarify its goals, protect them from modification, protect itself from destruction, acquire resources and use them efficiently… The twist I propose is that Omohundro’s list of drives should be used as a design specification. If your goal is AGI, then you want a cognitive architecture that will exhibit these emergent behaviors. They offer a series of milestones for your theorists and developers: a criterion of progress, and a set of intermediate goals sufficient to bridge the gap between a blank-slate beginning and an open-ended problem solver.

That’s the whole plan. It’s an anticlimax, I know, for anyone who might have imagined that there was a magic formula for superintelligence coming at the end of this post. But I do claim that what I have described is the skeleton of a plan which can be fleshed out, and which, if it was fleshed out and pursued, would produce goal-directed AGI. Whether the project as I have described it would really produce “friendly” AI is another matter. Anyone versed in the folk wisdom about FAI should be able to point out multiple points of potential failure. But I hope this makes it a little clearer, to people who just don’t see how FAI is supposed to happen at all, how it might be pursued in the real world.