FAI FAQ draft: What is Friendly AI?

lukeprog13 Nov 2011 19:26 UTC

3 points

I invite your feedback on this snippet from the forthcoming Friendly AI FAQ. This one is an answer to the question “What is Friendly AI?”

_____

A Friendly AI (FAI) is an artificial intelligence that benefits humanity. More specifically, Friendly AI may refer to:

a very powerful and general AI that acts autonomously in the world to benefit humanity.
an AI that continues to benefit humanity during and after an intelligence explosion.
a research program concerned with the production of such an AI.
Singularity Institute’s approach (Yudkowsky 2001, 2004) to designing such an AI:
- Goals should be defined by the Coherent Extrapolated Volition of humanity.
- Goals should be reliably preserved during recursive self-improvement.
- Design should be mathematically rigorous and proof-apt.

Friendly AI is a more difficult project than often supposed. As explored in other sections, commonly suggested solutions for Friendly AI are likely to fail because of two features possessed by any superintelligence:

Superpower: a superintelligent machine will have unprecedented powers to reshape reality, and therefore will achieve its goals with highly efficient methods that confound human expectations and desires.
Literalness: a superintelligent machine will make decisions using the mechanisms it is designed with, not the hopes its designers had in mind when they programmed those mechanisms. It will act only on precise specifications of rules and values, and will do so in ways that need not respect the complexity and subtlety (Kringelbach & Berridge 2009; Schroeder 2004; Glimcher 2010) of what humans value. A demand like “maximize human happiness” sounds simple to us because it contains few words, but philosophers and scientists have failed for centuries to explain exactly what this means, and certainly have not translated it into a form sufficiently rigorous for AI programmers to use.

What links here?

Friendly AI FAQ drafts by lukeprog (13 Nov 2011 16:47 UTC; 4 points)

lukeprog13 Nov 2011 19:26 UTC

3 points

11 comments1 min readLW link Archive

[deleted] 13 Nov 2011 20:06 UTC
4 points
I don’t think that “benefits humanity” explains anything any more than “friendly”. Quite the contrary, I can imagine someone being friendly. I can not, however, think of anything that would benefit all of humanity.
- lukeprog 13 Nov 2011 20:39 UTC
  1 point
  Parent
  Perhaps, but I cannot briefly explain all of metaethics, especially since we haven’t solved it yet. But I’m open to suggestions on how this could be clearer while remaining brief.
- wedrifid 14 Nov 2011 9:04 UTC
  0 points
  Parent
  Eliminate acne.
Vladimir_Nesov 14 Nov 2011 13:30 UTC
3 points
The “features possessed by any superintelligence” paragraphs seem particularly vulnerable to antagonistic reading:

Superpower: a superintelligent machine will have unprecedented powers to reshape reality, and therefore will achieve its goals with highly efficient methods that confound human expectations and desires.

Superintelligent machines have mystical powers allowing them to transcend laws of physics and reality.

It will act only on precise specifications of rules and values, and will do so in ways that need not respect the complexity and subtlety

It will refuse to do anything unless rules are precise. It will ignore details of designers’ specifications out of spite.
steven0461 14 Nov 2011 2:49 UTC
3 points
“Superpower”, while logical enough, is a bad word to use because of its associations with low-status fiction.

ETA: for an alternative, I think just “power” would work in this context. Then you could change “powers to reshape reality” to “ability to reshape reality”.
lessdazed 13 Nov 2011 21:30 UTC
2 points

A Friendly

A Friendly AI (FAI) is an artificial intelligence that benefits humanity. It is contrasted with Unfriendly AI, which includes both Malicious AI and Uncaring AI.

Goals should be defined by the Coherent Extrapolated Volition of humanity.

Goals should be defined by the aggregating the desires of humanity in a fair way, with special attention paid to the fact that people’s current dispositions reflect untrue beliefs about the world. How to do this is an unsolved problem for which Coherent Extrapolated Volition is an incomplete outline of a theory.

Superpower

Superpower: a superintelligent machine will (would?) have unprecedented powers to reshape reality, and therefore will achieve its goals with highly efficient methods that confound human expectations and desires. Such a machine’s actions would be difficult to predict because to reliably predict the way something will solve a problem, one must be at least approximately as smart as the problem solver.

Literalness

...

In general, human language and thought is designed to explain things to and interface with other humans that have similar ways of thinking. Designing an AI that does not have human failings like hate yet does understand what humans care for is an unprecedented challenge. This step, describing the goal system to the AI, builds on the solution of determining what is a fair goal system to give the AI. It conceptually precedes ensuring that the goal system remains stable under recursive self improvement.
- lukeprog 13 Nov 2011 22:14 UTC
  0 points
  Parent
  Thanks.
spuckblase 15 Nov 2011 10:56 UTC
0 points
“Literalness” is explained in sufficient detail to get a first idea of the connection to FAI, but “Superpower” is not.
kilobug 15 Nov 2011 9:05 UTC
0 points
In which stage of the FAQ is that supposed to be ? For people not used to singularity concepts, things like “intelligence explosion” will not mean much. Maybe the FAQ should first refer to a more general FAQ about the singularity ? Or include a link to it from “intelligence explosion” ?
- wedrifid 15 Nov 2011 9:21 UTC
  0 points
  Parent
  
  For people not used to singularity concepts, things like “intelligence explosion” will not mean much.
  
  Terrorist attack on MENSA!
steven0461 14 Nov 2011 3:45 UTC
0 points
I think this:

an AI that continues to benefit humanity during and after an intelligence explosion

is redundant with this:

Goals should be reliably preserved during recursive self-improvement.

and could be omitted.