RSS

In­stru­men­tal Convergence

TagLast edit: 1 Oct 2020 21:45 UTC by Ruby

Instrumental convergence or convergent instrumental values is the theorized tendency for most sufficiently intelligent agents to pursue potentially unbounded instrumental goals such as self-preservation and resource acquisition [1]. This concept has also been discussed under the term basic drives.

The idea was first explored by Steve Omohundro. He argued that sufficiently advanced AI systems would all naturally discover similar instrumental subgoals. The view that there are important basic AI drives was subsequently defended by Nick Bostrom as the instrumental convergence thesis, or the convergent instrumental goals thesis. On this view, a few goals are instrumental to almost all possible final goals. Therefore, all advanced AIs will pursue these instrumental goals. Omohundro uses microeconomic theory by von Neumann to support this idea.

Omohundro’s Drives

Omohundro presents two sets of values, one for self-improving artificial intelligences 1 and another he says will emerge in any sufficiently advanced AGI system 2. The former set is composed of four main drives:

Bostrom’s Drives

Bostrom argues for an orthogonality thesis: But he also argues that, despite the fact that values and intelligence are independent, any recursively self-improving intelligence would likely possess a particular set of instrumental values that are useful for achieving any kind of terminal value.3 On his view, those values are:

Relevance

Both Bostrom and Omohundro argue these values should be used in trying to predict a superintelligence’s behavior, since they are likely to be the only set of values shared by most superintelligences. They also note that these values are consistent with safe and beneficial AIs as well as unsafe ones. Omohundro says:

Bostrom emphasizes, however, that our ability to predict a superintelligence’s behavior may be very limited even if it shares most intelligences’ instrumental goals.

Yudkowsky echoes Omohundro’s point that the convergence thesis is consistent with the possibility of Friendly AI. However, he also notes that the convergence thesis implies that most AIs will be extremely dangerous, merely by being indifferent to one or more human values:4

Pathological Cases

In some rarer cases, AIs may not pursue these goals. For instance, if there are two AIs with the same goals, the less capable AI may determine that it should destroy itself to allow the stronger AI to control the universe. Or an AI may have the goal of using as few resources as possible, or of being as unintelligent as possible. These relatively specific goals will limit the growth and power of the AI.

See Also

References

Seek­ing Power is Often Ro­bustly In­stru­men­tal in MDPs

5 Dec 2019 2:33 UTC
133 points
34 comments17 min readLW link2 nominations2 reviews
(arxiv.org)

Corrigibility

paulfchristiano27 Nov 2018 21:50 UTC
40 points
4 comments6 min readLW link

AI pre­dic­tion case study 5: Omo­hun­dro’s AI drives

Stuart_Armstrong15 Mar 2013 9:09 UTC
10 points
5 comments8 min readLW link

Gen­eral pur­pose in­tel­li­gence: ar­gu­ing the Orthog­o­nal­ity thesis

Stuart_Armstrong15 May 2012 10:23 UTC
32 points
156 comments18 min readLW link

Toy model: con­ver­gent in­stru­men­tal goals

Stuart_Armstrong25 Feb 2016 14:03 UTC
15 points
2 comments4 min readLW link

De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More

Ben Pace4 Oct 2019 4:08 UTC
177 points
54 comments15 min readLW link2 nominations2 reviews

Goal re­ten­tion dis­cus­sion with Eliezer

MaxTegmark4 Sep 2014 22:23 UTC
92 points
26 comments6 min readLW link

Gen­er­al­iz­ing the Power-Seek­ing Theorems

TurnTrout27 Jul 2020 0:28 UTC
40 points
6 comments4 min readLW link

The Catas­trophic Con­ver­gence Conjecture

TurnTrout14 Feb 2020 21:16 UTC
39 points
15 comments8 min readLW link

Power as Easily Ex­ploitable Opportunities

TurnTrout1 Aug 2020 2:14 UTC
24 points
5 comments6 min readLW link

Clar­ify­ing Power-Seek­ing and In­stru­men­tal Convergence

TurnTrout20 Dec 2019 19:59 UTC
41 points
7 comments3 min readLW link

Walk­through of ‘For­mal­iz­ing Con­ver­gent In­stru­men­tal Goals’

TurnTrout26 Feb 2018 2:20 UTC
10 points
2 comments10 min readLW link

2019 Re­view Rewrite: Seek­ing Power is Often Ro­bustly In­stru­men­tal in MDPs

TurnTrout23 Dec 2020 17:16 UTC
35 points
0 comments4 min readLW link
(www.lesswrong.com)

Re­view of ‘De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More’

TurnTrout12 Jan 2021 3:57 UTC
37 points
1 comment2 min readLW link

TASP Ep 3 - Op­ti­mal Poli­cies Tend to Seek Power

Quinn11 Mar 2021 1:44 UTC
24 points
0 comments1 min readLW link
(technical-ai-safety.libsyn.com)

Co­her­ence ar­gu­ments im­ply a force for goal-di­rected behavior

KatjaGrace26 Mar 2021 16:10 UTC
66 points
13 comments14 min readLW link
(aiimpacts.org)

A Gym Grid­world En­vi­ron­ment for the Treach­er­ous Turn

Michaël Trazzi28 Jul 2018 21:27 UTC
66 points
9 comments3 min readLW link
(github.com)

Ra­tion­al­ity: Com­mon In­ter­est of Many Causes

Eliezer Yudkowsky29 Mar 2009 10:49 UTC
55 points
52 comments4 min readLW link

Plau­si­bly, al­most ev­ery pow­er­ful al­gorithm would be manipulative

Stuart_Armstrong6 Feb 2020 11:50 UTC
38 points
25 comments3 min readLW link

Asymp­tot­i­cally Unam­bi­tious AGI

michaelcohen6 Mar 2019 1:15 UTC
39 points
216 comments2 min readLW link

The Utility of Hu­man Atoms for the Paper­clip Maximizer

avturchin2 Feb 2018 10:06 UTC
3 points
19 comments3 min readLW link

Let’s talk about “Con­ver­gent Ra­tion­al­ity”

capybaralet12 Jun 2019 21:53 UTC
35 points
33 comments6 min readLW link

Su­per­in­tel­li­gence 10: In­stru­men­tally con­ver­gent goals

KatjaGrace18 Nov 2014 2:00 UTC
12 points
33 comments5 min readLW link

Mili­tary AI as a Con­ver­gent Goal of Self-Im­prov­ing AI

avturchin13 Nov 2017 12:17 UTC
5 points
3 comments1 min readLW link

Against In­stru­men­tal Convergence

zulupineapple27 Jan 2018 13:17 UTC
7 points
31 comments2 min readLW link

Gen­er­al­iz­ing Power to multi-agent games

22 Mar 2021 2:41 UTC
43 points
17 comments7 min readLW link
No comments.