Neutral AI

Unfriendly AI has goal conflicts with us. Friendly AI (roughly speaking) shares our goals. How about an AI with no goals at all?

I’ll call this “neutral AI”. Cyc is a neutral AI. It has no goals, no motives, no desires; it is inert unless someone asks it a question. It then has a set of routines it uses to try to answer the question. It executes these routines, and terminates, whether the question was answered or not. You could say that it had the temporary goal to answer the question. We then have two important questions:

  1. Is it possible (or feasible) to build a useful AI that operates like this?

  2. Is an AI built in this fashion significantly less-dangerous than one with goals?

Many people have answered the first question “no”. This would probably include Hubert Dreyfus (based on a Heideggerian analysis of semantics, which was actually very good but I would say misguided in its conclusions because Dreyfus mistook “what AI researchers do today” for “what is possible using a computer”), Phil Agre, Rodney Brooks, and anyone who describes their work as “reactive”, “behavior-based”, or “embodied cognition”. We could also point to the analogous linguistic divide. There are two general approaches to natural language understanding, one descending from generative grammars and symbolic AI and embodied by James Allen’s book Natural Language Understanding, and in the “program in the knowledge” camp that would answer the first question “yes”. The other approach has more kinship with construction grammars and machine learning, and is embodied by Manning & Schutze’s Foundations of Statistical Natural Language Processing, and its practitioners would be more likely to answer the first question “no”. (Eugene Charniak is noteworthy for having been prominent in both camps.)

The second question, I think, hinges on two sub-questions:

  1. Can we prevent an AI from harvesting more resources than it should for a question?

  2. Can we prevent an AI from conceiving the goal of increasing its own intelligence as a subgoal to answering a question?

The Jack Williamson story “With Folded Hands” (1947) tells how humanity was enslaved by robots given the order to protect humanity, who became… overprotective. Or suppose a physicist asked an AI, “Does the Higgs boson exist?” You don’t want it to use the Earth to build a supercollider. These are cases of using more resources than intended to carry out an order.

You may be able to build a Cyc-like question-answering architectures that would have no risk of doing any such thing. It may be as simple as placing resource limitations on every question. The danger is that if the AI is given a very thorough knowledge base that includes, for instance, an understanding of human economics and motivations, it may syntactically construct a plan to find the answer to a question that is technically within the resource limitations posed, for instance by manipulating humans in ways that don’t tweak its cost function. This could lead to very big mistakes; but it isn’t the kind of mistake that builds on itself, like a FOOM scenario. The question is whether any of these very big mistakes would be irreversible. My intuition is that there would be a power-law distribution of mistake sizes, with a small number of irreversible mistakes. We might then figure out a reasonable way of determining our risk level.

If the answer to the second subquestion is “yes”, then we probably don’t need to fear a FOOM from neutral AI.

The short answer is, Yes, there are “neutral AI architectures” that don’t currently have the risk either of harvesting too many resources, or of attempting to increase their own intelligence. Many existing AI architectures are examples. (I’m thinking specifically of “hierarchical task-network planning”, which I don’t consider true planning; it only allows the piecing together of plan components that were pre-built by the programmer.) But they can’t do much. There’s a power /​ safety tradeoff. The question is how much power you can get in the “completely safe” region, and where the sweet spots are in that tradeoff outside the completely safe region.

If you could build an AI that did nothing but parse published articles to answer the question, “Has anyone said X?”, that would be very useful, and very safe. I worked on such a program (SemRep) at NIH. It works pretty well within the domain of medical journal articles. If it could take one step more, and ask, “Can you find a set of one to four statements that, taken together, imply X?”, that would be a huge advance in capability, with little if any additional risk. (I added that capability to SemRep, but no one has ever used it, and it isn’t accessible through the web interface.)