Goodhart’s Law

TagLast edit: 19 Mar 2023 21:29 UTC by Diabloto96

Goodhart’s Law states that when a proxy for some value becomes the target of optimization pressure, the proxy will cease to be a good proxy. One form of Goodhart is demonstrated by the Soviet story of a factory graded on how many shoes they produced (a good proxy for productivity) – they soon began producing a higher number of tiny shoes. Useless, but the numbers look good.

Goodhart’s Law is of particular relevance to AI Alignment. Suppose you have something which is generally a good proxy for “the stuff that humans care about”, it would be dangerous to have a powerful AI optimize for the proxy, in accordance with Goodhart’s law, the proxy will breakdown.

Goodhart Taxonomy

In Goodhart Taxonomy, Scott Garrabrant identifies four kinds of Goodharting:

Regressional Goodhart—When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal.
Causal Goodhart—When there is a non-causal correlation between the proxy and the goal, intervening on the proxy may fail to intervene on the goal.
Extremal Goodhart—Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the correlation between the proxy and the goal was observed.
Adversarial Goodhart—When you optimize for a proxy, you provide an incentive for adversaries to correlate their goal with your proxy, thus destroying the correlation with your goal.

See Also

Goodhart Taxonomy

Scott Garrabrant30 Dec 2017 16:38 UTC

218 points

34 comments10 min readLW link

Classifying specification problems as variants of Goodhart’s Law

Vika19 Aug 2019 20:40 UTC

72 points

5 comments5 min readLW link 1 review

Specification gaming examples in AI

Vika3 Apr 2018 12:30 UTC

48 points

9 comments1 min readLW link 2 reviews

Everything I ever needed to know, I learned from World of Warcraft: Goodhart’s law

Said Achmiz3 May 2018 16:33 UTC

37 points

21 comments5 min readLW link 1 review

(blog.obormot.net)

Replacing Karma with Good Heart Tokens (Worth $1!)

Ben Pace and habryka

1 Apr 2022 9:31 UTC

226 points

173 comments4 min readLW link

Signaling isn’t about signaling, it’s about Goodhart

Valentine6 Jan 2022 18:49 UTC

59 points

31 comments9 min readLW link

Goodhart’s Law Causal Diagrams

JustinShovelain and Jeremy Gillen

11 Apr 2022 13:52 UTC

35 points

6 comments6 min readLW link

The Natural State is Goodhart

devansh20 Mar 2023 0:00 UTC

59 points

4 comments2 min readLW link

How much do you believe your results?

Eric Neyman6 May 2023 20:31 UTC

515 points

18 comments15 min readLW link 4 reviews

(ericneyman.wordpress.com)

When is Goodhart catastrophic?

Drake Thomas and Thomas Kwa

9 May 2023 3:59 UTC

180 points

30 comments8 min readLW link 1 review

Goodhart’s Curse and Limitations on AI Alignment

Gordon Seidoh Worley19 Aug 2019 7:57 UTC

25 points

18 comments10 min readLW link

The Importance of Goodhart’s Law

blogospheroid13 Mar 2010 8:19 UTC

117 points

123 comments3 min readLW link

[Question] How does Gradient Descent Interact with Goodhart?

Scott Garrabrant2 Feb 2019 0:14 UTC

68 points

19 comments4 min readLW link

Introduction to Reducing Goodhart

Charlie Steiner26 Aug 2021 18:38 UTC

48 points

10 comments4 min readLW link

Goodhart Taxonomy: Agreement

Ben Pace1 Jul 2018 3:50 UTC

44 points

4 comments7 min readLW link

Really radical empathy

MichaelStJules6 Jan 2025 17:46 UTC

19 points

0 comments10 min readLW link

Approximately Bayesian Reasoning: Knightian Uncertainty, Goodhart, and the Look-Elsewhere Effect

RogerDearnaley26 Jan 2024 3:58 UTC

16 points

2 comments11 min readLW link

Utilitarianism and the replaceability of desires and attachments

MichaelStJules27 Jul 2024 1:57 UTC

5 points

2 comments12 min readLW link

Actualism, asymmetry and extinction

MichaelStJules7 Jan 2025 16:02 UTC

1 point

4 comments9 min readLW link

Some implications of radical empathy

MichaelStJules7 Jan 2025 16:10 UTC

3 points

0 comments7 min readLW link

The Mirror Trap

Cameron Berg6 Jun 2025 22:30 UTC

94 points

13 comments4 min readLW link

Does Bayes Beat Goodhart?

abramdemski3 Jun 2019 2:31 UTC

48 points

26 comments7 min readLW link

Catastrophic Regressional Goodhart: Appendix

Thomas Kwa and Drake Thomas

15 May 2023 0:10 UTC

25 points

1 comment9 min readLW link

Using expected utility for Good(hart)

Stuart_Armstrong27 Aug 2018 3:32 UTC

42 points

5 comments4 min readLW link

Defeating Goodhart and the “closest unblocked strategy” problem

Stuart_Armstrong3 Apr 2019 14:46 UTC

45 points

15 comments6 min readLW link

Is Google Paperclipping the Web? The Perils of Optimization by Proxy in Social Systems

Alexandros10 May 2010 13:25 UTC

56 points

105 comments10 min readLW link

New Paper Expanding on the Goodhart Taxonomy

Scott Garrabrant14 Mar 2018 9:01 UTC

17 points

4 comments1 min readLW link

(arxiv.org)

Goodhart Typology via Structure, Function, and Randomness Distributions

JustinShovelain and Mateusz Bagiński

25 Mar 2025 16:01 UTC

35 points

1 comment15 min readLW link

Is Clickbait Destroying Our General Intelligence?

Eliezer Yudkowsky16 Nov 2018 23:06 UTC

207 points

68 comments5 min readLW link 2 reviews

Don’t design agents which exploit adversarial inputs

TurnTrout and Garrett Baker

18 Nov 2022 1:48 UTC

72 points

64 comments12 min readLW link

Requirements for a STEM-capable AGI Value Learner (my Case for Less Doom)

RogerDearnaley25 May 2023 9:26 UTC

33 points

3 comments15 min readLW link

Proxy misspecification and the capabilities vs. value learning race

Sam Marks16 May 2022 18:58 UTC

23 points

3 comments4 min readLW link

How my school gamed the stats

Srdjan Miletic20 Feb 2021 19:23 UTC

83 points

26 comments4 min readLW link

Embedded Agency (full-text version)

Scott Garrabrant and abramdemski

15 Nov 2018 19:49 UTC

210 points

17 comments54 min readLW link

Optimized for Something other than Winning or: How Cricket Resists Moloch and Goodhart’s Law

A.H.5 Jul 2023 12:33 UTC

54 points

26 comments4 min readLW link

Optimization Amplifies

Scott Garrabrant27 Jun 2018 1:51 UTC

119 points

12 comments4 min readLW link

Specification gaming: the flip side of AI ingenuity

Vika, Vlad Mikulik, Matthew Rahtz, tom4everitt, Zac Kenton and janleike

6 May 2020 23:51 UTC

69 points

9 comments6 min readLW link

Goodhart Ethology

Charlie Steiner17 Sep 2021 17:31 UTC

20 points

4 comments14 min readLW link

The Three Levels of Goodhart’s Curse

Scott Garrabrant30 Dec 2017 16:41 UTC

7 points

2 comments3 min readLW link

Satisficers want to become maximisers

Stuart_Armstrong21 Oct 2011 16:27 UTC

38 points

70 comments1 min readLW link

If I were a well-intentioned AI… III: Extremal Goodhart

Stuart_Armstrong28 Feb 2020 11:24 UTC

22 points

0 comments5 min readLW link

Catastrophic Goodhart in RL with KL penalty

Thomas Kwa and Adrià Garriga-alonso

15 May 2024 0:58 UTC

62 points

10 comments7 min readLW link

Fundamental Uncertainty: Chapter 8 - When does fundamental uncertainty matter?

Gordon Seidoh Worley26 Apr 2024 18:10 UTC

11 points

4 comments32 min readLW link

Constructing Goodhart

johnswentworth3 Feb 2019 21:59 UTC

29 points

10 comments3 min readLW link

Competent Preferences

Charlie Steiner2 Sep 2021 14:26 UTC

30 points

2 comments6 min readLW link

What does Optimization Mean, Again? (Optimizing and Goodhart Effects—Clarifying Thoughts, Part 2)

Davidmanheim28 Jul 2019 9:30 UTC

26 points

7 comments4 min readLW link

All I know is Goodhart

Stuart_Armstrong21 Oct 2019 12:12 UTC

28 points

23 comments3 min readLW link

Re-introducing Selection vs Control for Optimization (Optimizing and Goodhart Effects—Clarifying Thoughts, Part 1)

Davidmanheim2 Jul 2019 15:36 UTC

31 points

5 comments4 min readLW link

Principled Satisficing To Avoid Goodhart

JenniferRM16 Aug 2024 19:05 UTC

45 points

2 comments8 min readLW link

Goodhart’s Law Example: Training Verifiers to Solve Math Word Problems

Chris_Leong25 Nov 2023 0:53 UTC

27 points

2 comments1 min readLW link

(arxiv.org)

nostalgebraist: Recursive Goodhart’s Law

Kaj_Sotala26 Aug 2020 11:07 UTC

53 points

27 comments1 min readLW link

(nostalgebraist.tumblr.com)

[Intro to brain-like-AGI safety] 10. The alignment problem

Steven Byrnes30 Mar 2022 13:24 UTC

53 points

7 comments21 min readLW link

The Goodhart Game

John_Maxwell18 Nov 2019 23:22 UTC

13 points

5 comments5 min readLW link

Detect Goodhart and shut down

Jeremy Gillen22 Jan 2025 18:45 UTC

70 points

21 comments7 min readLW link

Noticing the Taste of Lotus

Valentine27 Apr 2018 20:05 UTC

240 points

81 comments3 min readLW link 3 reviews

Non-Adversarial Goodhart and AI Risks

Davidmanheim27 Mar 2018 1:39 UTC

22 points

11 comments6 min readLW link

Bootstrapped Alignment

Gordon Seidoh Worley27 Feb 2021 15:46 UTC

20 points

12 comments2 min readLW link

(Some?) Possible Multi-Agent Goodhart Interactions

Davidmanheim22 Sep 2018 17:48 UTC

20 points

2 comments5 min readLW link

[Question] Do the Safety Properties of Powerful AI Systems Need to be Adversarially Robust? Why?

DragonGod9 Feb 2023 13:36 UTC

22 points

42 comments2 min readLW link

Reducing Goodhart: Announcement, Executive Summary

Charlie Steiner20 Aug 2022 9:49 UTC

16 points

0 comments1 min readLW link

Reward hacking and Goodhart’s law by evolutionary algorithms

Jan_Kulveit30 Mar 2018 7:57 UTC

18 points

5 comments1 min readLW link

(arxiv.org)

Robust Delegation

abramdemski and Scott Garrabrant

4 Nov 2018 16:38 UTC

116 points

10 comments1 min readLW link

Bounding Goodhart’s Law

eric_langlois11 Jul 2018 0:46 UTC

43 points

2 comments5 min readLW link

Soft optimization makes the value target bigger

Jeremy Gillen2 Jan 2023 16:06 UTC

119 points

20 comments12 min readLW link

Don’t align agents to evaluations of plans

TurnTrout26 Nov 2022 21:16 UTC

48 points

49 comments18 min readLW link

Models Modeling Models

Charlie Steiner2 Nov 2021 7:08 UTC

23 points

5 comments10 min readLW link

My Overview of the AI Alignment Landscape: Threat Models

Neel Nanda25 Dec 2021 23:07 UTC

53 points

3 comments28 min readLW link

Honest science is spirituality

pchvykov1 Jul 2024 20:33 UTC

−1 points

10 comments4 min readLW link

Alignment allows “nonrobust” decision-influences and doesn’t require robust grading

TurnTrout29 Nov 2022 6:23 UTC

62 points

41 comments15 min readLW link

Guarding Slack vs Substance

Raemon13 Dec 2017 20:58 UTC

42 points

6 comments6 min readLW link

Humans are not automatically strategic

AnnaSalamon8 Sep 2010 7:02 UTC

628 points

278 comments4 min readLW link

The Dumbification of our smart screens

Itay Dreyfus4 Jul 2024 6:32 UTC

18 points

0 comments5 min readLW link

(productidentity.co)

Markets are Anti-Inductive

Eliezer Yudkowsky26 Feb 2009 0:55 UTC

98 points

62 comments4 min readLW link

Why Agent Foundations? An Overly Abstract Explanation

johnswentworth25 Mar 2022 23:17 UTC

312 points

60 comments8 min readLW link 1 review

Bayesianism versus conservatism versus Goodhart

Stuart_Armstrong16 Jul 2021 23:39 UTC

15 points

2 comments6 min readLW link

Extinction-level Goodhart’s Law as a Property of the Environment

VojtaKovarik and Ida Mattsson

21 Feb 2024 17:56 UTC

23 points

0 comments10 min readLW link

Outer alignment and imitative amplification

evhub10 Jan 2020 0:26 UTC

24 points

11 comments9 min readLW link

The Dark Miracle of Optics

Suspended Reason24 Jun 2020 3:09 UTC

27 points

5 comments8 min readLW link

“Designing agent incentives to avoid reward tampering”, DeepMind

gwern14 Aug 2019 16:57 UTC

28 points

15 comments1 min readLW link

(medium.com)

How Doomed are Large Organizations?

Zvi21 Jan 2020 12:20 UTC

81 points

42 comments9 min readLW link

(thezvi.wordpress.com)

Weak vs Quantitative Extinction-level Goodhart’s Law

VojtaKovarik and Ida Mattsson

21 Feb 2024 17:38 UTC

27 points

1 comment2 min readLW link

When Goodharting is optimal: linear vs diminishing returns, unlikely vs likely, and other factors

Stuart_Armstrong19 Dec 2019 13:55 UTC

24 points

18 comments7 min readLW link

Goodhart’s Law in Reinforcement Learning

jacek, Joar Skalse, OliverHayman, charlie_griffin and Xingjian Bai

16 Oct 2023 0:54 UTC

126 points

22 comments7 min readLW link

Ambition, Good and Bad: Green Growing Things and Forgeworthiness

Evenstar11 Aug 2025 15:20 UTC

10 points

0 comments5 min readLW link

Don’t want Goodhart? — Specify the variables more

YanLyutnev21 Nov 2024 22:43 UTC

2 points

2 comments5 min readLW link

Defining and Characterising Reward Hacking

Joar Skalse28 Feb 2025 19:25 UTC

15 points

0 comments4 min readLW link

Validator models: A simple approach to detecting goodharting

beren20 Feb 2023 21:32 UTC

14 points

1 comment4 min readLW link

The new dot com bubble is here: it’s called online advertising

Gordon Seidoh Worley18 Nov 2019 22:05 UTC

50 points

17 comments2 min readLW link

(thecorrespondent.com)

The Lesson To Unlearn

Ben Pace8 Dec 2019 0:50 UTC

38 points

11 comments1 min readLW link

(paulgraham.com)

The reverse Goodhart problem

Stuart_Armstrong8 Jun 2021 15:48 UTC

20 points

22 comments1 min readLW link

Thinking about maximization and corrigibility

James Payor21 Apr 2023 21:22 UTC

63 points

4 comments5 min readLW link

This Is Not Life

samhealy28 Jul 2025 8:43 UTC

55 points

2 comments23 min readLW link

The Gödelian Constraint on Epistemic Freedom (GCEF): A Topological Frame for Alignment, Collapse, and Simulation Drift

austin.miller14 Jul 2025 4:17 UTC

1 point

0 comments1 min readLW link

Dynamics Crucial to AI Risk Seem to Make for Complicated Models

VojtaKovarik and Ida Mattsson

21 Feb 2024 17:54 UTC

19 points

0 comments9 min readLW link

[Question] Beyond Benchmarks: A Psychometric Approach to AI Evaluation

Kareem Soliman27 Jul 2025 16:09 UTC

1 point

0 comments8 min readLW link

Other Papers About the Theory of Reward Learning

Joar Skalse28 Feb 2025 19:26 UTC

16 points

0 comments5 min readLW link

Religion as Goodhart

Shmi8 Jul 2019 0:38 UTC

21 points

6 comments2 min readLW link

The Ancient God Who Rules High School

lifelonglearner5 Apr 2017 18:55 UTC

12 points

113 comments1 min readLW link

(medium.com)

When to use quantilization

RyanCarey5 Feb 2019 17:17 UTC

65 points

5 comments4 min readLW link

Specification gaming examples in AI

Samuel Rødal10 Nov 2018 12:00 UTC

24 points

6 comments1 min readLW link

(docs.google.com)

Resolutions to the Challenge of Resolving Forecasts

Davidmanheim11 Mar 2021 19:08 UTC

58 points

13 comments6 min readLW link

Don’t want Goodhart? — Specify the damn variables

Yan Lyutnev21 Nov 2024 22:45 UTC

−3 points

2 comments5 min readLW link

Lotuses and Loot Boxes

Davidmanheim17 May 2018 0:21 UTC

14 points

2 comments4 min readLW link

Can “Reward Economics” solve AI Alignment?

Q Home7 Sep 2022 7:58 UTC

3 points

15 comments18 min readLW link

Atlas: Stress-Testing ASI Value Learning Through Grand Strategy Scenarios

NeilFox17 Feb 2025 23:55 UTC

1 point

0 comments2 min readLW link

The Paradox of Expert Opinion

Emrik26 Sep 2021 21:39 UTC

12 points

9 comments2 min readLW link

Goodhart’s Law and Emotions

Zero Contradictions7 Jul 2024 8:32 UTC

1 point

5 comments1 min readLW link

(expandingrationality.substack.com)

Superintelligence 12: Malignant failure modes

KatjaGrace2 Dec 2014 2:02 UTC

15 points

51 comments5 min readLW link

Emergence Spirals—what Yudkowsky gets wrong

James Stephen Brown8 Jun 2025 19:02 UTC

28 points

25 comments9 min readLW link

Practical everyday human strategizing

akaTrickster27 Mar 2022 14:20 UTC

6 points

0 comments3 min readLW link

Degamification

Nate Showell19 Feb 2023 5:35 UTC

23 points

3 comments2 min readLW link

Aldix and the Book of Life

ville1 Jan 2024 17:23 UTC

1 point

0 comments4 min readLW link

(medium.com)

Moral Mazes and Short Termism

Zvi2 Jun 2019 11:30 UTC

74 points

21 comments4 min readLW link

(thezvi.wordpress.com)

Oversight Leagues: The Training Game as a Feature

Paul Bricman9 Sep 2022 10:08 UTC

20 points

6 comments10 min readLW link

Leto among the Machines

Virgil Kurkjian30 Sep 2018 21:17 UTC

57 points

20 comments13 min readLW link

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well). Subtleties and Open Challenges.

Roland Pihlakas12 Jan 2025 3:37 UTC

47 points

7 comments12 min readLW link

[Aspiration-based designs] A. Damages from misaligned optimization – two more models

Jobst Heitzig and Simon Dima

15 Jul 2024 14:08 UTC

6 points

0 comments9 min readLW link

Don’t you mean “the most conditionally forbidden technique?”

Knight Lee26 Apr 2025 3:45 UTC

19 points

0 comments3 min readLW link

Visual demonstration of Optimizer’s curse

Roman Malov30 Nov 2024 19:34 UTC

25 points

3 comments7 min readLW link

When Can Optimization Be Done Safely?

StrivingForLegibility30 Dec 2023 1:24 UTC

12 points

0 comments3 min readLW link

Scaling Laws for Reward Model Overoptimization

leogao, John Schulman and Jacob_Hilton

20 Oct 2022 0:20 UTC

103 points

13 comments1 min readLW link

(arxiv.org)

Extinction Risks from AI: Invisible to Science?

VojtaKovarik, Chris van Merwijk and Ida Mattsson

21 Feb 2024 18:07 UTC

24 points

7 comments1 min readLW link

(arxiv.org)

Could Things Be Very Different?—How Historical Inertia Might Blind Us To Optimal Solutions

James Stephen Brown11 Sep 2024 9:53 UTC

5 points

0 comments8 min readLW link

(nonzerosum.games)

AISC team report: Soft-optimization, Bayes and Goodhart

Simon Fischer, benjaminko, jazcarretao, DFNaiff and Jeremy Gillen

27 Jun 2023 6:05 UTC

38 points

2 comments15 min readLW link

No comments.