FAI FAQ draft: What is Friendly AI?

I invite your feedback on this snippet from the forthcoming Friendly AI FAQ. This one is an answer to the question “What is Friendly AI?”

_____

A Friendly AI (FAI) is an artificial intelligence that benefits humanity. More specifically, Friendly AI may refer to:

  • a very powerful and general AI that acts autonomously in the world to benefit humanity.

  • an AI that continues to benefit humanity during and after an intelligence explosion.

  • a research program concerned with the production of such an AI.

  • Singularity Institute’s approach (Yudkowsky 2001, 2004) to designing such an AI:

    • Goals should be defined by the Coherent Extrapolated Volition of humanity.

    • Goals should be reliably preserved during recursive self-improvement.

    • Design should be mathematically rigorous and proof-apt.

Friendly AI is a more difficult project than often supposed. As explored in other sections, commonly suggested solutions for Friendly AI are likely to fail because of two features possessed by any superintelligence:

  1. Superpower: a superintelligent machine will have unprecedented powers to reshape reality, and therefore will achieve its goals with highly efficient methods that confound human expectations and desires.

  2. Literalness: a superintelligent machine will make decisions using the mechanisms it is designed with, not the hopes its designers had in mind when they programmed those mechanisms. It will act only on precise specifications of rules and values, and will do so in ways that need not respect the complexity and subtlety (Kringelbach & Berridge 2009; Schroeder 2004; Glimcher 2010) of what humans value. A demand like “maximize human happiness” sounds simple to us because it contains few words, but philosophers and scientists have failed for centuries to explain exactly what this means, and certainly have not translated it into a form sufficiently rigorous for AI programmers to use.