Thomas Kwa comments on Thomas Kwa’s Shortform

Thomas Kwa 10 Aug 2022 1:47 UTC
11 points
0
Below is a list of powerful optimizers ranked on properties, as part of a brainstorm on whether there’s a simple core of consequentialism that excludes corrigibility. I think that AlphaZero is a moderately strong argument that there is a simple core of consequentialism which includes inner search.
Properties
- Simple: takes less than 10 KB of code. If something is already made of agents (markets and the US government) I marked it as N/A.
- Coherent: approximately maximizing a utility function most of the time. There are other definitions:
  - Not being money-pumped
  - Nate Soares’s notion in the MIRI dialogues: having all your actions point towards a single goal
  - John Wentworth’s setup of Optimization at a Distance
- Adversarially coherent: something like “appears coherent to weaker optimizers” or “robust to perturbations by weaker optimizers”. This implies that it’s incorrigible.
  - Sufficiently optimized agents appear coherent—Arbital
  - will achieve high utility even when “disrupted” by an optimizer somewhat less powerful
- Search+WM: operates by explicitly ranking plans within a world-model. Evolution is a search process, but doesn’t have a world-model. The contact with the territory it gets comes from directly interacting with the world, and this is maybe why it’s so slow
Thing Simple? Coherent? Adv. coherent? Search+WM?
Humans
N
Y
Sometimes
Y
AIXI-tl
Y
Y
N
Y
Stockfish
N
Y
Y
Y
AlphaZero/OAI5
Y
Y
Y
Y
Markets
N/A
Y
Y
Y
US government
N/A
Y
N
Y
Evolution
Y
N
N
N
Notes:
- Humans are not adversarially coherent: prospect theory and other cognitive biases can be exploited, indoctrination, etc.
- AIXI-tl is not adversarially coherent because it is an embedded agent and can be switched off etc.
- AlphaZero: when playing chess, you can use another strategy and it still wins
- Markets are inexploitable, but they don’t do search in a world-model other than the search done by individual market participants
- The US government is not adversarially coherent in most circumstances, even if its subparts are coherent; lobbying can affect the US government’s policies, and it is meant to be corrigible by the voting population.
- Evolution is not coherent: species often evolve to extinction; foxes and rabbits, etc.

Thing	Simple?	Coherent?	Adv. coherent?	Search+WM?
Humans	N	Y	Sometimes	Y
AIXI-tl	Y	Y	N	Y
Stockfish	N	Y	Y	Y
AlphaZero/OAI5	Y	Y	Y	Y
Markets	N/A	Y	Y	Y
US government	N/A	Y	N	Y
Evolution	Y	N	N	N