TagLast edit: 16 Feb 2021 20:06 UTC by Yoav Ravid

Mesa-Optimization is the situation that occurs when a learned model (such as a neural network) is itself an optimizer. A base optimizer optimizes and creates a mesa-optimizer. Previously work under this concept was called Inner Optimizer or Optimization Daemons.


Natural selection is an optimization process (that optimizes for reproductive fitness) that produced humans (who are capable of pursuing goals that no longer correlate reliably with reproductive fitness). In this case, humans are optimization daemons of natural selection. In the context of AI alignment, the concern is that an artificial general intelligence exerting optimization pressure may produce mesa-optimizers that break alignment.1


Previously work under this concept was called Inner Optimizer or Optimization Daemons.

Wei Dai brings up a similar idea in an SL4 thread.2

The optimization daemons article on Arbital was published probably in 2016.3

Jessica Taylor wrote two posts about daemons while at MIRI:

See also


  1. “Optimization daemons”. Arbital.

  2. Wei Dai. ‘”friendly” humans?’ December 31, 2003.

External links

Video by Robert Miles

Some posts that reference optimization daemons:

Related ideas

Matt Botv­inick on the spon­ta­neous emer­gence of learn­ing algorithms

Adam Scholl12 Aug 2020 7:47 UTC
138 points
90 comments5 min readLW link

Risks from Learned Op­ti­miza­tion: Introduction

31 May 2019 23:44 UTC
140 points
40 comments12 min readLW link3 nominations3 reviews

Mesa-Search vs Mesa-Control

abramdemski18 Aug 2020 18:51 UTC
53 points
45 comments7 min readLW link

Embed­ded Agency (full-text ver­sion)

15 Nov 2018 19:49 UTC
115 points
11 comments54 min readLW link

Con­di­tions for Mesa-Optimization

1 Jun 2019 20:52 UTC
62 points
47 comments12 min readLW link

Sub­sys­tem Alignment

6 Nov 2018 16:16 UTC
99 points
12 comments1 min readLW link

Risks from Learned Op­ti­miza­tion: Con­clu­sion and Re­lated Work

7 Jun 2019 19:53 UTC
70 points
4 comments6 min readLW link

De­cep­tive Alignment

5 Jun 2019 20:16 UTC
69 points
11 comments17 min readLW link

The In­ner Align­ment Problem

4 Jun 2019 1:20 UTC
76 points
17 comments13 min readLW link

Mesa-Op­ti­miz­ers vs “Steered Op­ti­miz­ers”

Steven Byrnes10 Jul 2020 16:49 UTC
40 points
5 comments8 min readLW link

Open ques­tion: are min­i­mal cir­cuits dae­mon-free?

paulfchristiano5 May 2018 22:40 UTC
79 points
69 comments2 min readLW link

[Question] What spe­cific dan­gers arise when ask­ing GPT-N to write an Align­ment Fo­rum post?

Matthew Barnett28 Jul 2020 2:56 UTC
43 points
14 comments1 min readLW link

If I were a well-in­ten­tioned AI… IV: Mesa-optimising

Stuart_Armstrong2 Mar 2020 12:16 UTC
26 points
2 comments6 min readLW link

Why GPT wants to mesa-op­ti­mize & how we might change this

John_Maxwell19 Sep 2020 13:48 UTC
53 points
32 comments9 min readLW link

Prize for prob­a­ble problems

paulfchristiano8 Mar 2018 16:58 UTC
59 points
63 comments4 min readLW link

Defin­ing ca­pa­bil­ity and al­ign­ment in gra­di­ent descent

Edouard Harris5 Nov 2020 14:36 UTC
21 points
6 comments10 min readLW link

AXRP Epi­sode 4 - Risks from Learned Op­ti­miza­tion with Evan Hubinger

DanielFilan18 Feb 2021 0:03 UTC
41 points
10 comments86 min readLW link

For­mal Solu­tion to the In­ner Align­ment Problem

michaelcohen18 Feb 2021 14:51 UTC
46 points
122 comments2 min readLW link

Does SGD Pro­duce De­cep­tive Align­ment?

Mark Xu6 Nov 2020 23:48 UTC
54 points
2 comments16 min readLW link

2-D Robustness

vlad_m30 Aug 2019 20:27 UTC
67 points
1 comment2 min readLW link

Gra­di­ent hacking

evhub16 Oct 2019 0:53 UTC
74 points
34 comments3 min readLW link2 nominations2 reviews

[AN #58] Mesa op­ti­miza­tion: what it is, and why we should care

rohinmshah24 Jun 2019 16:10 UTC
50 points
9 comments8 min readLW link

Weak ar­gu­ments against the uni­ver­sal prior be­ing malign

X4vier14 Jun 2018 17:11 UTC
49 points
23 comments3 min readLW link

[Question] Do mesa-op­ti­mizer risk ar­gu­ments rely on the train-test paradigm?

Ben Cottier10 Sep 2020 15:36 UTC
12 points
7 comments1 min readLW link

Evolu­tions Build­ing Evolu­tions: Lay­ers of Gen­er­ate and Test

plex5 Feb 2021 18:21 UTC
11 points
1 comment6 min readLW link

Map­ping the Con­cep­tual Ter­ri­tory in AI Ex­is­ten­tial Safety and Alignment

jbkjr12 Feb 2021 7:55 UTC
15 points
0 comments26 min readLW link
No comments.