RSS

Good­hart’s Law

TagLast edit: 1 Oct 2020 22:11 UTC by Ruby

Goodhart’s Law states that when a proxy for some value becomes the target of optimization pressure, the proxy will cease to be a good proxy. One form of Goodhart is demonstrated by the Soviet story of a factory graded on how many shoes they produced (a good proxy for productivity) – they soon began producing a higher number of tiny shoes. Useless, but the numbers look good.

Goodhart’s Law is of particular relevance to AI Alignment. Suppose you have something which is generally a good proxy for “the stuff that humans care about”, it would be dangerous to have a powerful AI optimize for the proxy, in accordance with Goodhart’s law, the proxy will breakdown.

Goodhart Taxonomy

In Goodhart Taxonomy, Scott Garrabrant identifies four kinds of Goodharting:

See Also

Good­hart Taxonomy

Scott Garrabrant30 Dec 2017 16:38 UTC
154 points
33 comments10 min readLW link

Every­thing I ever needed to know, I learned from World of War­craft: Good­hart’s law

Said Achmiz3 May 2018 16:33 UTC
28 points
21 comments6 min readLW link
(blog.obormot.net)

Clas­sify­ing speci­fi­ca­tion prob­lems as var­i­ants of Good­hart’s Law

Vika19 Aug 2019 20:40 UTC
67 points
5 comments5 min readLW link2 nominations1 review

Speci­fi­ca­tion gam­ing ex­am­ples in AI

Vika3 Apr 2018 12:30 UTC
39 points
9 comments1 min readLW link

Good­hart’s Curse and Limi­ta­tions on AI Alignment

G Gordon Worley III19 Aug 2019 7:57 UTC
21 points
18 comments9 min readLW link

[Question] How does Gra­di­ent Des­cent In­ter­act with Good­hart?

Scott Garrabrant2 Feb 2019 0:14 UTC
68 points
19 comments4 min readLW link

The Im­por­tance of Good­hart’s Law

blogospheroid13 Mar 2010 8:19 UTC
102 points
123 comments3 min readLW link

Good­hart Tax­on­omy: Agreement

Ben Pace1 Jul 2018 3:50 UTC
43 points
4 comments7 min readLW link

Does Bayes Beat Good­hart?

abramdemski3 Jun 2019 2:31 UTC
43 points
26 comments7 min readLW link

New Paper Ex­pand­ing on the Good­hart Taxonomy

Scott Garrabrant14 Mar 2018 9:01 UTC
17 points
4 comments1 min readLW link
(arxiv.org)

Defeat­ing Good­hart and the “clos­est un­blocked strat­egy” problem

Stuart_Armstrong3 Apr 2019 14:46 UTC
40 points
15 comments6 min readLW link

Us­ing ex­pected util­ity for Good(hart)

Stuart_Armstrong27 Aug 2018 3:32 UTC
42 points
5 comments4 min readLW link

Is Google Paper­clip­ping the Web? The Per­ils of Op­ti­miza­tion by Proxy in So­cial Systems

Alexandros10 May 2010 13:25 UTC
51 points
104 comments10 min readLW link

Is Click­bait De­stroy­ing Our Gen­eral In­tel­li­gence?

Eliezer Yudkowsky16 Nov 2018 23:06 UTC
150 points
58 comments5 min readLW link

All I know is Goodhart

Stuart_Armstrong21 Oct 2019 12:12 UTC
28 points
23 comments3 min readLW link

Re­ward hack­ing and Good­hart’s law by evolu­tion­ary algorithms

Jan_Kulveit30 Mar 2018 7:57 UTC
18 points
5 comments1 min readLW link
(arxiv.org)

Bound­ing Good­hart’s Law

eric_langlois11 Jul 2018 0:46 UTC
33 points
2 comments5 min readLW link

Con­struct­ing Goodhart

johnswentworth3 Feb 2019 21:59 UTC
29 points
10 comments3 min readLW link

What does Op­ti­miza­tion Mean, Again? (Op­ti­miz­ing and Good­hart Effects—Clar­ify­ing Thoughts, Part 2)

Davidmanheim28 Jul 2019 9:30 UTC
26 points
7 comments4 min readLW link

Non-Ad­ver­sar­ial Good­hart and AI Risks

Davidmanheim27 Mar 2018 1:39 UTC
22 points
9 comments6 min readLW link

(Some?) Pos­si­ble Multi-Agent Good­hart Interactions

Davidmanheim22 Sep 2018 17:48 UTC
20 points
2 comments5 min readLW link

Re-in­tro­duc­ing Selec­tion vs Con­trol for Op­ti­miza­tion (Op­ti­miz­ing and Good­hart Effects—Clar­ify­ing Thoughts, Part 1)

Davidmanheim2 Jul 2019 15:36 UTC
28 points
5 comments4 min readLW link

Embed­ded Agency (full-text ver­sion)

15 Nov 2018 19:49 UTC
115 points
11 comments54 min readLW link

Ro­bust Delegation

4 Nov 2018 16:38 UTC
108 points
10 comments1 min readLW link

Op­ti­miza­tion Amplifies

Scott Garrabrant27 Jun 2018 1:51 UTC
86 points
12 comments4 min readLW link

Speci­fi­ca­tion gam­ing: the flip side of AI ingenuity

6 May 2020 23:51 UTC
45 points
8 comments6 min readLW link

Notic­ing the Taste of Lotus

Valentine27 Apr 2018 20:05 UTC
155 points
80 comments3 min readLW link

Guard­ing Slack vs Substance

Raemon13 Dec 2017 20:58 UTC
36 points
6 comments6 min readLW link

Hu­mans are not au­to­mat­i­cally strategic

AnnaSalamon8 Sep 2010 7:02 UTC
295 points
274 comments4 min readLW link

The Good­hart Game

John_Maxwell18 Nov 2019 23:22 UTC
13 points
5 comments5 min readLW link

If I were a well-in­ten­tioned AI… III: Ex­tremal Goodhart

Stuart_Armstrong28 Feb 2020 11:24 UTC
21 points
0 comments5 min readLW link

nos­talge­braist: Re­cur­sive Good­hart’s Law

Kaj_Sotala26 Aug 2020 11:07 UTC
52 points
27 comments1 min readLW link
(nostalgebraist.tumblr.com)

Mar­kets are Anti-Inductive

Eliezer Yudkowsky26 Feb 2009 0:55 UTC
64 points
62 comments4 min readLW link

The Three Levels of Good­hart’s Curse

Scott Garrabrant30 Dec 2017 16:41 UTC
5 points
0 comments3 min readLW link

How my school gamed the stats

Srdjan Miletic20 Feb 2021 19:23 UTC
75 points
26 comments4 min readLW link

Boot­strapped Alignment

G Gordon Worley III27 Feb 2021 15:46 UTC
14 points
12 comments2 min readLW link

Mo­ral Mazes and Short Termism

Zvi2 Jun 2019 11:30 UTC
63 points
21 comments4 min readLW link1 nomination
(thezvi.wordpress.com)

The new dot com bub­ble is here: it’s called on­line advertising

G Gordon Worley III18 Nov 2019 22:05 UTC
50 points
17 comments2 min readLW link
(thecorrespondent.com)

How Doomed are Large Or­ga­ni­za­tions?

Zvi21 Jan 2020 12:20 UTC
76 points
42 comments9 min readLW link
(thezvi.wordpress.com)

When to use quantilization

RyanCarey5 Feb 2019 17:17 UTC
53 points
5 comments4 min readLW link

Leto among the Machines

Virgil Kurkjian30 Sep 2018 21:17 UTC
43 points
19 comments13 min readLW link

The Les­son To Unlearn

Ben Pace8 Dec 2019 0:50 UTC
37 points
11 comments1 min readLW link
(paulgraham.com)

“De­sign­ing agent in­cen­tives to avoid re­ward tam­per­ing”, DeepMind

gwern14 Aug 2019 16:57 UTC
28 points
15 comments1 min readLW link
(medium.com)

Lo­tuses and Loot Boxes

Davidmanheim17 May 2018 0:21 UTC
13 points
2 comments4 min readLW link

Speci­fi­ca­tion gam­ing ex­am­ples in AI

Samuel Rødal10 Nov 2018 12:00 UTC
24 points
6 comments1 min readLW link
(docs.google.com)

Su­per­in­tel­li­gence 12: Mal­ig­nant failure modes

KatjaGrace2 Dec 2014 2:02 UTC
15 points
51 comments5 min readLW link

When Good­hart­ing is op­ti­mal: lin­ear vs diminish­ing re­turns, un­likely vs likely, and other factors

Stuart_Armstrong19 Dec 2019 13:55 UTC
24 points
18 comments7 min readLW link

The An­cient God Who Rules High School

lifelonglearner5 Apr 2017 18:55 UTC
13 points
113 comments1 min readLW link
(medium.com)

Reli­gion as Goodhart

shminux8 Jul 2019 0:38 UTC
21 points
6 comments2 min readLW link
No comments.