RSS

Value Drift

TagLast edit: 20 Nov 2024 21:18 UTC by Dakara

Value drift refers to the idea that over time, the values or goals of a person or an AI system can change, often in ways that weren’t originally intended.

For humans, this might happen as life experiences, personal growth, or external influences cause someone’s beliefs to evolve.

For AI, it could occur if the system starts to interpret its goals differently as it learns and interacts with the world.

Schel­ling fences on slip­pery slopes

Scott Alexander16 Mar 2012 23:44 UTC
644 points
250 comments6 min readLW link

Un­der­stand­ing and avoid­ing value drift

TurnTrout9 Sep 2022 4:16 UTC
48 points
14 comments6 min readLW link

Straight-edge Warn­ing Against Phys­i­cal Intimacy

Raphaëll23 Nov 2020 21:35 UTC
18 points
42 comments5 min readLW link

Let Values Drift

Gordon Seidoh Worley20 Jun 2019 20:45 UTC
4 points
19 comments8 min readLW link

Would I think for ten thou­sand years?

Stuart_Armstrong11 Feb 2019 19:37 UTC
28 points
13 comments1 min readLW link

Pre­dict­ing Parental Emo­tional Changes?

jefftk6 Jul 2022 13:50 UTC
44 points
11 comments2 min readLW link
(www.jefftk.com)

Where does Son­net 4.5′s de­sire to “not get too com­fortable” come from?

Kaj_Sotala4 Oct 2025 10:19 UTC
103 points
24 comments64 min readLW link

[Question] Is value drift net-pos­i­tive, net-nega­tive, or nei­ther?

MarisaJurczyk5 May 2019 2:37 UTC
5 points
3 comments1 min readLW link

REACH Meetup – Value Drift

Raemon30 May 2018 4:53 UTC
4 points
0 comments1 min readLW link

Ma­hatma Arm­strong: CEVed to death.

Stuart_Armstrong6 Jun 2013 12:50 UTC
33 points
62 comments2 min readLW link

Up­com­ing sta­bil­ity of values

Stuart_Armstrong15 Mar 2018 11:36 UTC
15 points
15 comments2 min readLW link

Gandhi, mur­der pills, and men­tal illness

erratio13 Oct 2010 9:16 UTC
39 points
16 comments1 min readLW link

Man­i­fest X DC Open­ing Bene­dic­tion - Mak­ing Friends Along the Way

JohnofCharleston9 Nov 2025 23:10 UTC
41 points
0 comments4 min readLW link

Beyond Co­sine Similar­ity: Au­dit­ing Knowl­edge Graphs via Se­man­tic Diffraction

Sergi Garcia12 Jan 2026 13:42 UTC
1 point
0 comments1 min readLW link

Rock bot­tom ter­mi­nal value

ihatenumbersinusernames74 Jan 2026 20:43 UTC
4 points
7 comments2 min readLW link

SAAP: Is De­liber­ate Struc­tural Ineffi­ciency the Inevitable Cost of AGI Align­ment?

Articus1930 Nov 2025 17:45 UTC
1 point
0 comments1 min readLW link

SAAP: A Nor­ma­tive AGI Ar­chi­tec­ture for Safety us­ing Dual-Pro­cess Con­trol and Hu­man Sovereignty

Articus1930 Nov 2025 17:58 UTC
1 point
0 comments1 min readLW link

New Hackathon: Ro­bust­ness to dis­tri­bu­tion changes and ambiguity

Charbel-Raphaël31 Jan 2023 12:50 UTC
12 points
3 comments1 min readLW link
No comments.