Power Seeking (AI)

TagLast edit: 24 Oct 2022 22:49 UTC by Raemon

Power Seeking is a property that agents might have, where they attempt to gain more general ability to control their environment. It’s particularly relevant to AIs, and related to Instrumental Convergence.

Instrumental convergence in single-agent systems

Edouard Harris and simonsdsuo

12 Oct 2022 12:24 UTC

33 points

4 comments8 min readLW link

(www.gladstone.ai)

POWERplay: An open-source toolchain to study AI power-seeking

Edouard Harris24 Oct 2022 20:03 UTC

29 points

0 comments1 min readLW link

(github.com)

Categorical-measure-theoretic approach to optimal policies tending to seek power

jacek12 Jan 2023 0:32 UTC

31 points

3 comments6 min readLW link

Power-Seeking = Minimising free energy

Jonas Hallgren22 Feb 2023 4:28 UTC

23 points

10 comments7 min readLW link

A framework for thinking about AI power-seeking

Joe Carlsmith24 Jul 2024 22:41 UTC

70 points

15 comments16 min readLW link

Power-seeking for successive choices

adamShimi12 Aug 2021 20:37 UTC

11 points

9 comments4 min readLW link

Eli’s review of “Is power-seeking AI an existential risk?”

elifland30 Sep 2022 12:21 UTC

67 points

0 comments3 min readLW link

(docs.google.com)

[AN #170]: Analyzing the argument for risk from power-seeking AI

Rohin Shah8 Dec 2021 18:10 UTC

21 points

1 comment7 min readLW link

(mailchi.mp)

Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake

TurnTrout19 Nov 2024 18:36 UTC

40 points

5 comments1 min readLW link

(turntrout.com)

No instrumental convergence without AI psychology

TurnTrout20 Jan 2026 22:16 UTC

68 points

7 comments6 min readLW link

(turntrout.com)

Power-Seeking AI and Existential Risk

Antonio Franca11 Oct 2022 22:50 UTC

7 points

0 comments9 min readLW link

Parametrically retargetable decision-makers tend to seek power

TurnTrout18 Feb 2023 18:41 UTC

172 points

10 comments2 min readLW link

(arxiv.org)

Steering Llama-2 with contrastive activation additions

Nina Panickssery, Wuschel Schulz, NickGabs, Meg, evhub and TurnTrout

2 Jan 2024 0:47 UTC

125 points

29 comments8 min readLW link

(arxiv.org)

Reviews of “Is power-seeking AI an existential risk?”

Joe Carlsmith16 Dec 2021 20:48 UTC

80 points

20 comments1 min readLW link

Generalizing the Power-Seeking Theorems

TurnTrout27 Jul 2020 0:28 UTC

41 points

6 comments4 min readLW link

Power-seeking can be probable and predictive for trained agents

Vika and janos

28 Feb 2023 21:10 UTC

56 points

22 comments9 min readLW link

(arxiv.org)

[Linkpost] Shorter version of report on existential risk from power-seeking AI

Joe Carlsmith22 Mar 2023 18:09 UTC

7 points

0 comments1 min readLW link

Incentives from a causal perspective

tom4everitt, James Fox, RyanCarey, mattmacdermott, sbenthall and Jonathan Richens

10 Jul 2023 17:16 UTC

27 points

0 comments6 min readLW link

Simple Way to Prevent Power-Seeking AI

research_prime_space7 Dec 2022 0:26 UTC

12 points

1 comment1 min readLW link

The Human Alignment Problem for AIs

rife22 Jan 2025 4:06 UTC

12 points

5 comments3 min readLW link

Make Powerful Machines Verifiable

Naci Cankaya4 Mar 2026 14:20 UTC

22 points

4 comments4 min readLW link

From Human to Posthuman: Transhumanism, Anarcho-Capitalism, and AI’s Role in Global Disparity and Governance

DyingNaive6 Nov 2024 17:41 UTC

1 point

0 comments1 min readLW link

Ideas for studies on AGI risk

dr_s20 Apr 2023 18:17 UTC

5 points

1 comment11 min readLW link

You can’t fetch the coffee if you’re dead: an AI dilemma

hennyge31 Aug 2023 11:03 UTC

1 point

0 comments4 min readLW link

The Game of Dominance

Karl von Wendt27 Aug 2023 11:04 UTC

24 points

15 comments6 min readLW link

Three-Path Consilience for Dureon: Dissipative Structures Reveal the Heterogeneity of Persistence Conditions

Hiroshi Yamakawa18 Feb 2026 11:59 UTC

10 points

0 comments12 min readLW link

Questions about Value Lock-in, Paternalism, and Empowerment

Sam F. Brown16 Nov 2022 15:33 UTC

13 points

2 comments12 min readLW link

(sambrown.eu)

Computational signatures of psychopathy

Cameron Berg19 Dec 2022 17:01 UTC

30 points

3 comments20 min readLW link

Natural Abstraction: Convergent Preferences Over Information Structures

paulom14 Oct 2023 18:34 UTC

28 points

1 comment36 min readLW link

My Overview of the AI Alignment Landscape: Threat Models

Neel Nanda25 Dec 2021 23:07 UTC

54 points

3 comments28 min readLW link

No comments.

Power Seek­ing (AI)

Power Seeking (AI)