The Causes of Power-seeking and Instrumental Convergence

5 Jul 2021 21:49 UTC

Instrumental convergence posits that smart goal-directed agents will tend to take certain actions (eg gain resources, stay alive) in order to achieve their goals. These actions seem to involve taking power from humans. Human disempowerment seems like a key part of how AI might go very, very wrong.

But where does instrumental convergence come from? When does it occur, and how strongly? And what does the math look like?

Seeking Power is Often Convergently Instrumental in MDPs

TurnTrout and Logan Riggs

5 Dec 2019 2:33 UTC

162 points

39 comments17 min readLW link 2 reviews

(arxiv.org)

Power as Easily Exploitable Opportunities

TurnTrout1 Aug 2020 2:14 UTC

32 points

5 comments6 min readLW link

The Catastrophic Convergence Conjecture

TurnTrout14 Feb 2020 21:16 UTC

45 points

16 comments8 min readLW link

Generalizing POWER to multi-agent games

midco and TurnTrout

22 Mar 2021 2:41 UTC

52 points

16 comments7 min readLW link

MDP models are determined by the agent architecture and the environmental dynamics

TurnTrout26 May 2021 0:14 UTC

23 points

34 comments3 min readLW link

Environmental Structure Can Cause Instrumental Convergence

TurnTrout22 Jun 2021 22:26 UTC

71 points

43 comments16 min readLW link

(arxiv.org)

A world in which the alignment problem seems lower-stakes

TurnTrout8 Jul 2021 2:31 UTC

19 points

17 comments2 min readLW link

The More Power At Stake, The Stronger Instrumental Convergence Gets For Optimal Policies

TurnTrout11 Jul 2021 17:36 UTC

45 points

7 comments6 min readLW link

Seeking Power is Convergently Instrumental in a Broad Class of Environments

TurnTrout8 Aug 2021 2:02 UTC

44 points

15 comments9 min readLW link

When Most VNM-Coherent Preference Orderings Have Convergent Instrumental Incentives

TurnTrout9 Aug 2021 17:22 UTC

53 points

4 comments5 min readLW link

Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability

TurnTrout18 Nov 2021 1:54 UTC

85 points

8 comments17 min readLW link

(www.overleaf.com)

A Certain Formalization of Corrigibility Is VNM-Incoherent

TurnTrout20 Nov 2021 0:30 UTC

65 points

24 comments8 min readLW link

Instrumental Convergence For Realistic Agent Objectives

TurnTrout22 Jan 2022 0:41 UTC

35 points

9 comments9 min readLW link

Parametrically retargetable decision-makers tend to seek power

TurnTrout18 Feb 2023 18:41 UTC

166 points

9 comments2 min readLW link

(arxiv.org)