RSS

Power Seek­ing (AI)

TagLast edit: 24 Oct 2022 22:49 UTC by Raemon

Power Seeking is a property that agents might have, where they attempt to gain more general ability to control their environment. It’s particularly relevant to AIs, and related to Instrumental Convergence.

In­stru­men­tal con­ver­gence in sin­gle-agent systems

12 Oct 2022 12:24 UTC
31 points
4 comments8 min readLW link
(www.gladstone.ai)

Cat­e­gor­i­cal-mea­sure-the­o­retic ap­proach to op­ti­mal poli­cies tend­ing to seek power

jacek12 Jan 2023 0:32 UTC
31 points
3 comments6 min readLW link

POWER­play: An open-source toolchain to study AI power-seeking

Edouard Harris24 Oct 2022 20:03 UTC
27 points
0 comments1 min readLW link
(github.com)

Power-Seek­ing = Min­imis­ing free energy

Jonas Hallgren22 Feb 2023 4:28 UTC
21 points
10 comments7 min readLW link

Power-Seek­ing AI and Ex­is­ten­tial Risk

Antonio Franca11 Oct 2022 22:50 UTC
6 points
0 comments9 min readLW link

[Linkpost] Shorter ver­sion of re­port on ex­is­ten­tial risk from power-seek­ing AI

Joe Carlsmith22 Mar 2023 18:09 UTC
7 points
0 comments1 min readLW link

Power-seek­ing for suc­ces­sive choices

adamShimi12 Aug 2021 20:37 UTC
11 points
9 comments4 min readLW link

Steer­ing Llama-2 with con­trastive ac­ti­va­tion additions

2 Jan 2024 0:47 UTC
119 points
29 comments8 min readLW link
(arxiv.org)

Gen­er­al­iz­ing the Power-Seek­ing Theorems

TurnTrout27 Jul 2020 0:28 UTC
41 points
6 comments4 min readLW link

Re­views of “Is power-seek­ing AI an ex­is­ten­tial risk?”

Joe Carlsmith16 Dec 2021 20:48 UTC
79 points
20 comments1 min readLW link

Eli’s re­view of “Is power-seek­ing AI an ex­is­ten­tial risk?”

elifland30 Sep 2022 12:21 UTC
67 points
0 comments3 min readLW link
(docs.google.com)

[AN #170]: An­a­lyz­ing the ar­gu­ment for risk from power-seek­ing AI

Rohin Shah8 Dec 2021 18:10 UTC
21 points
1 comment7 min readLW link
(mailchi.mp)

Para­met­ri­cally re­tar­getable de­ci­sion-mak­ers tend to seek power

TurnTrout18 Feb 2023 18:41 UTC
166 points
9 comments2 min readLW link
(arxiv.org)

Power-seek­ing can be prob­a­ble and pre­dic­tive for trained agents

28 Feb 2023 21:10 UTC
56 points
22 comments9 min readLW link
(arxiv.org)

Com­pu­ta­tional sig­na­tures of psychopathy

Cameron Berg19 Dec 2022 17:01 UTC
28 points
3 comments20 min readLW link

Sim­ple Way to Prevent Power-Seek­ing AI

research_prime_space7 Dec 2022 0:26 UTC
12 points
1 comment1 min readLW link

Risks from GPT-4 Byproduct of Re­cur­sively Op­ti­miz­ing AIs

ben hayum7 Apr 2023 0:02 UTC
73 points
9 comments10 min readLW link
(forum.effectivealtruism.org)

Ques­tions about Value Lock-in, Pa­ter­nal­ism, and Empowerment

Sam F. Brown16 Nov 2022 15:33 UTC
13 points
2 comments12 min readLW link
(sambrown.eu)

My Overview of the AI Align­ment Land­scape: Threat Models

Neel Nanda25 Dec 2021 23:07 UTC
52 points
3 comments28 min readLW link

Ideas for stud­ies on AGI risk

dr_s20 Apr 2023 18:17 UTC
5 points
1 comment11 min readLW link

In­cen­tives from a causal perspective

10 Jul 2023 17:16 UTC
27 points
0 comments6 min readLW link

The Game of Dominance

Karl von Wendt27 Aug 2023 11:04 UTC
24 points
15 comments6 min readLW link

You can’t fetch the coffee if you’re dead: an AI dilemma

hennyge31 Aug 2023 11:03 UTC
1 point
0 comments4 min readLW link

Nat­u­ral Ab­strac­tion: Con­ver­gent Prefer­ences Over In­for­ma­tion Structures

paulom14 Oct 2023 18:34 UTC
13 points
1 comment36 min readLW link

The Waluigi Effect (mega-post)

Cleo Nardo3 Mar 2023 3:22 UTC
617 points
188 comments16 min readLW link