Instrumental Convergence

TagLast edit: 30 May 2023 6:04 UTC by papetoast

Instrumental convergence or convergent instrumental values is the theorized tendency for most sufficiently intelligent agents to pursue potentially unbounded instrumental goals such as self-preservation and resource acquisition [1]. This concept has also been discussed under the term basic drives.

The idea was first explored by Steve Omohundro. He argued that sufficiently advanced AI systems would all naturally discover similar instrumental subgoals. The view that there are important basic AI drives was subsequently defended by Nick Bostrom as the instrumental convergence thesis, or the convergent instrumental goals thesis. On this view, a few goals are instrumental to almost all possible final goals. Therefore, all advanced AIs will pursue these instrumental goals. Omohundro uses microeconomic theory by von Neumann to support this idea.

Omohundro’s Drives

Omohundro presents two sets of values, one for self-improving artificial intelligences [2] and another he says will emerge in any sufficiently advanced AGI system [3]. The former set is composed of four main drives:

Self-preservation: A sufficiently advanced AI will probably be the best entity to achieve its goals. Therefore it must continue existing in order to maximize goal fulfillment. Similarly, if its goal system were modified, then it would likely begin pursuing different ends. Since this is not desirable to the current AI, it will act to preserve the content of its goal system.
Efficiency: At any time, the AI will have finite resources of time, space, matter, energy and computational power. Using these more efficiently will increase its utility. This will lead the AI to do things like implement more efficient algorithms, physical embodiments, and particular mechanisms. It will also lead the AI to replace desired physical events with computational simulations as much as possible, to expend fewer resources.
Acquisition: Resources like matter and energy are indispensable for action. The more resources the AI can control, the more actions it can perform to achieve its goals. The AI’s physical capabilities are determined by its level of technology. For instance, if the AI could invent nanotechnology, it would vastly increase the actions it could take to achieve its goals.
Creativity: The AI’s operations will depend on its ability to come up with new, more efficient ideas. It will be driven to acquire more computational power for raw searching ability, and it will also be driven to search for better search algorithms. Omohundro argues that the drive for creativity is critical for the AI to display the richness and diversity that is valued by humanity. He discusses signaling goals as particularly rich sources of creativity.

Bostrom’s Drives

Bostrom argues for an orthogonality thesis: But he also argues that, despite the fact that values and intelligence are independent, any recursively self-improving intelligence would likely possess a particular set of instrumental values that are useful for achieving any kind of terminal value [4]. On his view, those values are:

Self-preservation: A superintelligence will value its continuing existence as a means to continuing to take actions that promote its values.
Goal-content integrity: The superintelligence will value retaining the same preferences over time. Modifications to its future values through swapping memories, downloading skills, and altering its cognitive architecture and personalities would result in its transformation into an agent that no longer optimizes for the same things.
Cognitive enhancement: Improvements in cognitive capacity, intelligence and rationality will help the superintelligence make better decisions, furthering its goals more in the long run.
Technological perfection: Increases in hardware power and algorithm efficiency will deliver increases in its cognitive capacities. Also, better engineering will enable the creation of a wider set of physical structures using fewer resources (e.g., nanotechnology).
Resource acquisition: In addition to guaranteeing the superintelligence’s continued existence, basic resources such as time, space, matter and free energy could be processed to serve almost any goal, in the form of extended hardware, backups and protection.

Relevance

Both Bostrom and Omohundro argue these values should be used in trying to predict a superintelligence’s behavior, since they are likely to be the only set of values shared by most superintelligences. They also note that these values are consistent with safe and beneficial AIs as well as unsafe ones.

Bostrom emphasizes, however, that our ability to predict a superintelligence’s behavior may be very limited even if it shares most intelligences’ instrumental goals.

Yudkowsky echoes Omohundro’s point that the convergence thesis is consistent with the possibility of Friendly AI. However, he also notes that the convergence thesis implies that most AIs will be extremely dangerous, merely by being indifferent to one or more human values [5]:

Pathological Cases

In some rarer cases, AIs may not pursue these goals. For instance, if there are two AIs with the same goals, the less capable AI may determine that it should destroy itself to allow the stronger AI to control the universe. Or an AI may have the goal of using as few resources as possible, or of being as unintelligent as possible. These relatively specific goals will limit the growth and power of the AI.

Experimental Evidence

The question of instrumentally convergent drives potentially arising in machine learning models is explored in the paper—Optimal Policies Tend To Seek Power. The authors explored instrumental convergence (specifically power-seeking behavior) as a statistical tendency of optimal policies in reinforcement learning (RL) agents.

The authors focus on Markov Decision Processes (MDPs) and prove that certain environmental symmetries are sufficient for optimal policies to seek power in the environment. They formalize power as the ability to achieve a wide range of goals. Within this formalization, the authors show that most reward functions make it optimal to try and seek power since this allows for keeping a wide range of options available to the agent.

This provides a counter to the claim that instrumental convergence is merely an anthropomorphic theoretical tendency, and that human-like power-seeking instincts will not arise in RL agents.

References

Omohundro, S. (2007). The Nature of Self-Improving Artificial Intelligence.
Omohundro, S. (2008). “The Basic AI Drives”. Proceedings of the First AGI Conference.
Omohundro, S. (2012). Rational Artificial Intelligence for the Greater Good.
Bostrom, N. (2012). “The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents”. Minds and Machines.
Shulman, C. (2010). Omohundro’s “Basic AI Drives” and Catastrophic Risks.
Alexander Matt Turner, Logan Riggs Smith, Rohin Shah, Andrew Critch, Prasad Tadepalli (2021). Optimal Policies Tend To Seek Power

Instrumental Convergence? [Draft]

J. Dmitri Gallow14 Jun 2023 20:21 UTC

48 points

20 comments33 min readLW link

P₂B: Plan to P₂B Better

Ramana Kumar and Daniel Kokotajlo

24 Oct 2021 15:21 UTC

38 points

17 comments6 min readLW link

Seeking Power is Often Convergently Instrumental in MDPs

TurnTrout and Logan Riggs

5 Dec 2019 2:33 UTC

162 points

39 comments17 min readLW link 2 reviews

(arxiv.org)

General purpose intelligence: arguing the Orthogonality thesis

Stuart_Armstrong15 May 2012 10:23 UTC

33 points

155 comments18 min readLW link

Corrigibility

paulfchristiano27 Nov 2018 21:50 UTC

57 points

8 comments6 min readLW link

Deliberation, Reactions, and Control: Tentative Definitions and a Restatement of Instrumental Convergence

Oliver Sourbut27 Jun 2022 17:25 UTC

12 points

0 comments11 min readLW link

AI prediction case study 5: Omohundro’s AI drives

Stuart_Armstrong15 Mar 2013 9:09 UTC

10 points

5 comments8 min readLW link

Empowerment is (almost) All We Need

jacob_cannell23 Oct 2022 21:48 UTC

64 points

44 comments17 min readLW link

Draft report on existential risk from power-seeking AI

Joe Carlsmith28 Apr 2021 21:41 UTC

85 points

23 comments1 min readLW link

Power-seeking for successive choices

adamShimi12 Aug 2021 20:37 UTC

11 points

9 comments4 min readLW link

You can still fetch the coffee today if you’re dead tomorrow

davidad9 Dec 2022 14:06 UTC

84 points

19 comments5 min readLW link

Environmental Structure Can Cause Instrumental Convergence

TurnTrout22 Jun 2021 22:26 UTC

71 points

43 comments16 min readLW link

(arxiv.org)

A Gym Gridworld Environment for the Treacherous Turn

Michaël Trazzi28 Jul 2018 21:27 UTC

74 points

9 comments3 min readLW link

(github.com)

Contingency: A Conceptual Tool from Evolutionary Biology for Alignment

clem_acs12 Jun 2023 20:54 UTC

57 points

2 comments14 min readLW link

(acsresearch.org)

Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More

Ben Pace4 Oct 2019 4:08 UTC

221 points

61 comments15 min readLW link 2 reviews

Instrumental Convergence To Offer Hope?

michael_mjd22 Apr 2022 1:56 UTC

12 points

7 comments3 min readLW link

“If we go extinct due to misaligned AI, at least nature will continue, right? … right?”

plex18 May 2024 14:09 UTC

47 points

23 comments2 min readLW link

(aisafety.info)

A framework for thinking about AI power-seeking

Joe Carlsmith24 Jul 2024 22:41 UTC

58 points

7 comments16 min readLW link

Lessons from Convergent Evolution for AI Alignment

Jan_Kulveit and rosehadshar

27 Mar 2023 16:25 UTC

54 points

9 comments8 min readLW link

[Question] Best arguments against instrumental convergence?

lfrymire5 Apr 2023 17:06 UTC

5 points

7 comments1 min readLW link

n=3 AI Risk Quick Math and Reasoning

lionhearted (Sebastian Marshall)7 Apr 2023 20:27 UTC

6 points

3 comments4 min readLW link

Toy model: convergent instrumental goals

Stuart_Armstrong25 Feb 2016 14:03 UTC

15 points

2 comments4 min readLW link

Goal retention discussion with Eliezer

Max Tegmark4 Sep 2014 22:23 UTC

93 points

26 comments6 min readLW link

Generalizing the Power-Seeking Theorems

TurnTrout27 Jul 2020 0:28 UTC

41 points

6 comments4 min readLW link

The Catastrophic Convergence Conjecture

TurnTrout14 Feb 2020 21:16 UTC

45 points

16 comments8 min readLW link

Power as Easily Exploitable Opportunities

TurnTrout1 Aug 2020 2:14 UTC

32 points

5 comments6 min readLW link

Clarifying Power-Seeking and Instrumental Convergence

TurnTrout20 Dec 2019 19:59 UTC

42 points

7 comments3 min readLW link

Requirements for a STEM-capable AGI Value Learner (my Case for Less Doom)

RogerDearnaley25 May 2023 9:26 UTC

32 points

3 comments15 min readLW link

Walkthrough of ‘Formalizing Convergent Instrumental Goals’

TurnTrout26 Feb 2018 2:20 UTC

10 points

2 comments10 min readLW link

The Sharp Right Turn: sudden deceptive alignment as a convergent goal

avturchin6 Jun 2023 9:59 UTC

38 points

5 comments1 min readLW link

Hedonic Loops and Taming RL

beren19 Jul 2023 15:12 UTC

20 points

14 comments9 min readLW link

[Question] What are some examples of AIs instantiating the ‘nearest unblocked strategy problem’?

EJT4 Oct 2023 11:05 UTC

6 points

4 comments1 min readLW link

2019 Review Rewrite: Seeking Power is Often Robustly Instrumental in MDPs

TurnTrout23 Dec 2020 17:16 UTC

35 points

0 comments4 min readLW link

(www.lesswrong.com)

Review of ‘Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More’

TurnTrout12 Jan 2021 3:57 UTC

40 points

1 comment2 min readLW link

TASP Ep 3 - Optimal Policies Tend to Seek Power

Quinn11 Mar 2021 1:44 UTC

24 points

0 comments1 min readLW link

(technical-ai-safety.libsyn.com)

Coherence arguments imply a force for goal-directed behavior

KatjaGrace26 Mar 2021 16:10 UTC

91 points

24 comments11 min readLW link 1 review

(aiimpacts.org)

MDP models are determined by the agent architecture and the environmental dynamics

TurnTrout26 May 2021 0:14 UTC

23 points

34 comments3 min readLW link

Alex Turner’s Research, Comprehensive Information Gathering

adamShimi23 Jun 2021 9:44 UTC

15 points

3 comments3 min readLW link

The More Power At Stake, The Stronger Instrumental Convergence Gets For Optimal Policies

TurnTrout11 Jul 2021 17:36 UTC

45 points

7 comments6 min readLW link

A world in which the alignment problem seems lower-stakes

TurnTrout8 Jul 2021 2:31 UTC

20 points

17 comments2 min readLW link

Seeking Power is Convergently Instrumental in a Broad Class of Environments

TurnTrout8 Aug 2021 2:02 UTC

44 points

15 comments9 min readLW link

When Most VNM-Coherent Preference Orderings Have Convergent Instrumental Incentives

TurnTrout9 Aug 2021 17:22 UTC

53 points

4 comments5 min readLW link

Applications for Deconfusing Goal-Directedness

adamShimi8 Aug 2021 13:05 UTC

38 points

3 comments5 min readLW link 1 review

Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability

TurnTrout18 Nov 2021 1:54 UTC

85 points

8 comments17 min readLW link

(www.overleaf.com)

AXRP Episode 11 - Attainable Utility and Power with Alex Turner

DanielFilan25 Sep 2021 21:10 UTC

19 points

5 comments53 min readLW link

A Certain Formalization of Corrigibility Is VNM-Incoherent

TurnTrout20 Nov 2021 0:30 UTC

67 points

24 comments8 min readLW link

Instrumental Convergence For Realistic Agent Objectives

TurnTrout22 Jan 2022 0:41 UTC

35 points

9 comments9 min readLW link

[Intro to brain-like-AGI safety] 10. The alignment problem

Steven Byrnes30 Mar 2022 13:24 UTC

48 points

6 comments19 min readLW link

Questions about ″formalizing instrumental goals”

Mark Neyer1 Apr 2022 18:52 UTC

7 points

8 comments11 min readLW link

Circumventing interpretability: How to defeat mind-readers

Lee Sharkey14 Jul 2022 16:59 UTC

113 points

12 comments33 min readLW link

[ASoT] Instrumental convergence is useful

Ulisse Mini9 Nov 2022 20:20 UTC

5 points

9 comments1 min readLW link

Instrumental convergence is what makes general intelligence possible

tailcalled11 Nov 2022 16:38 UTC

98 points

11 comments4 min readLW link

Parametrically retargetable decision-makers tend to seek power

TurnTrout18 Feb 2023 18:41 UTC

172 points

10 comments2 min readLW link

(arxiv.org)

Research Notes: What are we aligning for?

Shoshannah Tekofsky8 Jul 2022 22:13 UTC

19 points

8 comments2 min readLW link

Generalizing POWER to multi-agent games

midco and TurnTrout

22 Mar 2021 2:41 UTC

52 points

16 comments7 min readLW link

A Critique of AI Alignment Pessimism

ExCeph19 Jul 2022 2:28 UTC

9 points

1 comment9 min readLW link

Active Inference as a formalisation of instrumental convergence

Roman Leventov26 Jul 2022 17:55 UTC

12 points

2 comments3 min readLW link

(direct.mit.edu)

You are Underestimating The Likelihood That Convergent Instrumental Subgoals Lead to Aligned AGI

Mark Neyer26 Sep 2022 14:22 UTC

3 points

6 comments3 min readLW link

Deceptive Alignment

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

5 Jun 2019 20:16 UTC

117 points

20 comments17 min readLW link

Instrumental convergence in single-agent systems

Edouard Harris and simonsdsuo

12 Oct 2022 12:24 UTC

31 points

4 comments8 min readLW link

(www.gladstone.ai)

Misalignment-by-default in multi-agent systems

Edouard Harris and simonsdsuo

13 Oct 2022 15:38 UTC

19 points

8 comments20 min readLW link

(www.gladstone.ai)

Instrumental convergence: scale and physical interactions

Edouard Harris and simonsdsuo

14 Oct 2022 15:50 UTC

15 points

0 comments17 min readLW link

(www.gladstone.ai)

Building selfless agents to avoid instrumental self-preservation.

blallo7 Dec 2023 18:59 UTC

14 points

2 comments6 min readLW link

Pursuing convergent instrumental subgoals on the user’s behalf doesn’t always require good priors

jessicata30 Dec 2016 2:36 UTC

15 points

9 comments3 min readLW link

Alien Axiology

snerx20 Apr 2023 0:27 UTC

3 points

2 comments5 min readLW link

POWERplay: An open-source toolchain to study AI power-seeking

Edouard Harris24 Oct 2022 20:03 UTC

27 points

0 comments1 min readLW link

(github.com)

Rationality: Common Interest of Many Causes

Eliezer Yudkowsky29 Mar 2009 10:49 UTC

84 points

53 comments4 min readLW link

Plausibly, almost every powerful algorithm would be manipulative

Stuart_Armstrong6 Feb 2020 11:50 UTC

38 points

25 comments3 min readLW link

Asymptotically Unambitious AGI

michaelcohen10 Apr 2020 12:31 UTC

50 points

217 comments2 min readLW link

The Utility of Human Atoms for the Paperclip Maximizer

avturchin2 Feb 2018 10:06 UTC

2 points

21 comments3 min readLW link

Ted Kaczyinski proves instrumental convergence?

xXAlphaSigmaXx28 Jun 2024 3:50 UTC

1 point

0 comments1 min readLW link

A potentially high impact differential technological development area

Noosphere898 Jun 2023 14:33 UTC

5 points

2 comments2 min readLW link

Instrumentality makes agents agenty

porby21 Feb 2023 4:28 UTC

20 points

4 comments6 min readLW link

Let’s talk about “Convergent Rationality”

David Scott Krueger (formerly: capybaralet)12 Jun 2019 21:53 UTC

44 points

33 comments6 min readLW link

Alignment, conflict, powerseeking

Oliver Sourbut22 Nov 2023 9:47 UTC

6 points

1 comment1 min readLW link

human intelligence may be alignment-limited

bhauth15 Jun 2023 22:32 UTC

16 points

3 comments2 min readLW link

Superintelligence 10: Instrumentally convergent goals

KatjaGrace18 Nov 2014 2:00 UTC

13 points

33 comments5 min readLW link

AI Alternative Futures: Scenario Mapping Artificial Intelligence Risk—Request for Participation (Closed)

Kakili27 Apr 2022 22:07 UTC

10 points

2 comments8 min readLW link

Instrumental Convergence to Complexity Preservation

Macro Flaneur13 Jul 2023 17:40 UTC

2 points

2 comments3 min readLW link

Ideas for studies on AGI risk

dr_s20 Apr 2023 18:17 UTC

5 points

1 comment11 min readLW link

Military AI as a Convergent Goal of Self-Improving AI

avturchin13 Nov 2017 12:17 UTC

5 points

3 comments1 min readLW link

ACI#5: From Human-AI Co-evolution to the Evolution of Value Systems

Akira Pyinya18 Aug 2023 0:38 UTC

0 points

0 comments9 min readLW link

The Game of Dominance

Karl von Wendt27 Aug 2023 11:04 UTC

24 points

15 comments6 min readLW link

Against Instrumental Convergence

zulupineapple27 Jan 2018 13:17 UTC

11 points

31 comments2 min readLW link

Instrumental Convergence Bounty

Logan Zoellner14 Sep 2023 14:02 UTC

62 points

24 comments1 min readLW link

Destroying the fabric of the universe as an instrumental goal.

AI-doom14 Sep 2023 20:04 UTC

−6 points

5 comments1 min readLW link

Instrumental Convergence and human extinction.

Spiritus Dei2 Oct 2023 0:41 UTC

−10 points

3 comments7 min readLW link

Machines vs Memes Part 3: Imitation and Memes

ceru231 Jun 2022 13:36 UTC

7 points

0 comments7 min readLW link

Natural Abstraction: Convergent Preferences Over Information Structures

paulom14 Oct 2023 18:34 UTC

13 points

1 comment36 min readLW link

Reinforcement Learner Wireheading

Nate Showell8 Jul 2022 5:32 UTC

8 points

2 comments3 min readLW link

papetoast 30 May 2023 6:05 UTC
1 point
0
I feel like using changing it to proper footnotes would be better
e7wAwpa 12 Jun 2021 3:17 UTC
1 point
In the ‘Relevance’ section there’s a sentence that begins “Omohundro says:” but then there’s no text, so I have no idea what Omohundro says.
- jimrandomh 13 Jun 2021 19:36 UTC
  2 points
  Parent
  I thought this might be an editing artifact, but it looks like that fragment was there and dangling that way in the revision where it was first inserted, so I just took it out.

In­stru­men­tal Convergence

Omohundro’s Drives

Bostrom’s Drives

Relevance

Pathological Cases

Experimental Evidence

See Also

References

Instrumental Convergence