Below is a list of powerful optimizers ranked on properties, as part of a brainstorm on whether there’s a simple core of consequentialism that excludes corrigibility. I think that AlphaZero is a moderately strong argument that there is a simple core of consequentialism which includes inner search.
Properties
Simple: takes less than 10 KB of code. If something is already made of agents (markets and the US government) I marked it as N/A.
Coherent: approximately maximizing a utility function most of the time. There are other definitions:
Not being money-pumped
Nate Soares’s notion in the MIRI dialogues: having all your actions point towards a single goal
Adversarially coherent: something like “appears coherent to weaker optimizers” or “robust to perturbations by weaker optimizers”. This implies that it’s incorrigible.
will achieve high utility even when “disrupted” by an optimizer somewhat less powerful
Search+WM: operates by explicitly ranking plans within a world-model. Evolution is a search process, but doesn’t have a world-model. The contact with the territory it gets comes from directly interacting with the world, and this is maybe why it’s so slow
Thing
Simple?
Coherent?
Adv. coherent?
Search+WM?
Humans
N
Y
Sometimes
Y
AIXI-tl
Y
Y
N
Y
Stockfish
N
Y
Y
Y
AlphaZero/OAI5
Y
Y
Y
Y
Markets
N/A
Y
Y
Y
US government
N/A
Y
N
Y
Evolution
Y
N
N
N
Notes:
Humans are not adversarially coherent: prospect theory and other cognitive biases can be exploited, indoctrination, etc.
AIXI-tl is not adversarially coherent because it is an embedded agent and can be switched off etc.
AlphaZero: when playing chess, you can use another strategy and it still wins
Markets are inexploitable, but they don’t do search in a world-model other than the search done by individual market participants
The US government is not adversarially coherent in most circumstances, even if its subparts are coherent; lobbying can affect the US government’s policies, and it is meant to be corrigible by the voting population.
Evolution is not coherent: species often evolve to extinction; foxes and rabbits, etc.
Below is a list of powerful optimizers ranked on properties, as part of a brainstorm on whether there’s a simple core of consequentialism that excludes corrigibility. I think that AlphaZero is a moderately strong argument that there is a simple core of consequentialism which includes inner search.
Properties
Simple: takes less than 10 KB of code. If something is already made of agents (markets and the US government) I marked it as N/A.
Coherent: approximately maximizing a utility function most of the time. There are other definitions:
Not being money-pumped
Nate Soares’s notion in the MIRI dialogues: having all your actions point towards a single goal
John Wentworth’s setup of Optimization at a Distance
Adversarially coherent: something like “appears coherent to weaker optimizers” or “robust to perturbations by weaker optimizers”. This implies that it’s incorrigible.
Sufficiently optimized agents appear coherent—Arbital
will achieve high utility even when “disrupted” by an optimizer somewhat less powerful
Search+WM: operates by explicitly ranking plans within a world-model. Evolution is a search process, but doesn’t have a world-model. The contact with the territory it gets comes from directly interacting with the world, and this is maybe why it’s so slow
N
Y
Sometimes
Y
Y
Y
N
Y
N
Y
Y
Y
Y
Y
Y
Y
N/A
Y
Y
Y
N/A
Y
N
Y
Y
N
N
N
Notes:
Humans are not adversarially coherent: prospect theory and other cognitive biases can be exploited, indoctrination, etc.
AIXI-tl is not adversarially coherent because it is an embedded agent and can be switched off etc.
AlphaZero: when playing chess, you can use another strategy and it still wins
Markets are inexploitable, but they don’t do search in a world-model other than the search done by individual market participants
The US government is not adversarially coherent in most circumstances, even if its subparts are coherent; lobbying can affect the US government’s policies, and it is meant to be corrigible by the voting population.
Evolution is not coherent: species often evolve to extinction; foxes and rabbits, etc.