Superintelligent

WikiLast edit: 8 Jun 2016 17:24 UTC by Eliezer Yudkowsky

Machine performance inside a domain (class of problems) can potentially be:

Optimal (impossible to do better)
Strongly superhuman (better than all humans by a significant margin)
Weakly superhuman (better than all the humans most of the time and most of the humans all of the time)
Par-human (performs about as well as most humans, better in some places and worse in others)
Subhuman or infrahuman (performs worse than most humans)

A superintelligence is either ‘strongly superhuman’, or else at least ‘optimal’, across all cognitive domains. It can’t win against a human at logical tic-tac-toe, but it plays optimally there. In a real-world game of tic-tac-toe that it strongly wanted to win, it might sabotage the opposing player, deploying superhuman strategies on the richer “real world” gameboard.

I. J. Good originally used ‘ultraintelligence’ to denote the same concept: “Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever.”

To say that a hypothetical agent or process is “superintelligent” will usually imply that it has all the advanced-agent properties.

Superintelligences are still bounded (if the character of physical law at all resembles the Standard Model of physics). They are (presumably) not infinitely smart, infinitely fast, all-knowing, or able to achieve every describable outcome using their available resources and options. However:

A supernova isn’t infinitely hot, but it’s still pretty warm. “Bounded” does not imply “small”. You should not try to walk into a supernova using a standard flame-retardant jumpsuit after reasoning, correctly but unhelpfully, that it is only boundedly hot.
A superintelligence doesn’t know everything and can’t perfectly estimate every quantity. However, to say that something is “superintelligent” or superhuman/optimal in every cognitive domain should almost always imply that its estimates are epistemically efficient relative to every human and human group. Even a superintelligence may not be able to exactly estimate the number of hydrogen atoms in the Sun, but a human shouldn’t be able to say, “Oh, it will probably underestimate the number by 10% because hydrogen atoms are pretty light”—the superintelligence knows that too. For us to know better than the superintelligence is at least as implausible as our being able to predict a 20% price increase in Microsoft’s stock six months in advance without any private information.
A superintelligence is not omnipotent and can’t obtain every describable outcome. But to say that it is “superintelligent” should suppose at least that it is instrumentally efficient relative to humans: We should not suppose that a superintelligence carries out any policy $π_{0}$ such that a human can think of a policy $π_{1}$ which would get more of the agent’s utility. To put it another way, the assertion that a superintelligence optimizing for utility function $U,$ would pursue a policy $π_{0},$ is by default refuted if we observe some $π_{1}$ such that, so far as we can see, $E [U | π_{0}] < E [U | π_{1}] .$ We’re not sure the efficient agent will do $π_{1}$ - there might be an even better alternative we haven’t foreseen—but we should regard it as very likely that it won’t do $π_{0} .$

If we’re talking about a hypothetical superintelligence, probably we’re either supposing that an intelligence explosion happened, or we’re talking about a limit state approached by a long period of progress.

Many/most problems in AI alignment seem like they ought to first appear at a point short of full superintelligence. As part of the project of making discourse about advanced agents precise, we should try to identify the key advanced agent property more precisely than saying “this problem would appear on approaching superintelligence”—to suppose superintelligence is usually sufficient but will rarely be necessary.

For the book, see Nick Bostrom’s book Superintelligence.

No comments.