RSS

Mild Optimization

TagLast edit: 28 Dec 2020 20:55 UTC by plex

Mild optimization is an approach for mitigating Goodhart’s law in AI alignment. Instead of maximizing a fixed objective, the hope is that the agent pursues the goal in a “milder” fashion.

Further reading: Arbital page on Mild Optimization

Soft op­ti­miza­tion makes the value tar­get bigger

Jeremy Gillen2 Jan 2023 16:06 UTC
100 points
18 comments12 min readLW link

When to use quantilization

RyanCarey5 Feb 2019 17:17 UTC
65 points
5 comments4 min readLW link

Op­ti­miza­tion Reg­u­lariza­tion through Time Penalty

Linda Linsefors1 Jan 2019 13:05 UTC
11 points
4 comments3 min readLW link

Stable Poin­t­ers to Value III: Re­cur­sive Quantilization

abramdemski21 Jul 2018 8:06 UTC
19 points
4 comments4 min readLW link

Thoughts on Quantilizers

Stuart_Armstrong2 Jun 2017 16:24 UTC
2 points
0 comments2 min readLW link

Quan­tiliz­ers max­i­mize ex­pected util­ity sub­ject to a con­ser­va­tive cost constraint

jessicata28 Sep 2015 2:17 UTC
29 points
0 comments5 min readLW link

Quan­tilal con­trol for finite MDPs

Vanessa Kosoy12 Apr 2018 9:21 UTC
14 points
0 comments13 min readLW link

Steam

abramdemski20 Jun 2022 17:38 UTC
67 points
9 comments5 min readLW link

Satis­ficers want to be­come maximisers

Stuart_Armstrong21 Oct 2011 16:27 UTC
33 points
68 comments1 min readLW link

Ex­plor­ing Mild Be­havi­our in Embed­ded Agents

Megan Kinniment27 Jun 2022 18:56 UTC
21 points
4 comments18 min readLW link

Re­ward is not Ne­c­es­sary: How to Create a Com­po­si­tional Self-Pre­serv­ing Agent for Life-Long Learning

Roman Leventov12 Jan 2023 16:43 UTC
17 points
2 comments2 min readLW link
(arxiv.org)

Val­ida­tor mod­els: A sim­ple ap­proach to de­tect­ing goodharting

beren20 Feb 2023 21:32 UTC
14 points
1 comment4 min readLW link

Break­ing the Op­ti­mizer’s Curse, and Con­se­quences for Ex­is­ten­tial Risks and Value Learning

Roger Dearnaley21 Feb 2023 9:05 UTC
4 points
0 comments23 min readLW link

The Op­ti­mizer’s Curse and How to Beat It

lukeprog16 Sep 2011 2:46 UTC
95 points
82 comments3 min readLW link
No comments.