Selection Theorems

TagLast edit: Dec 25, 2022, 8:56 PM by DragonGod

A Selection Theorem tells us something about what agent type signatures will be selected for in some broad class of environments. Two important points:
The theorem need not directly talk about selection—e.g. it could state some general property of optima, of “broad” optima, of “most” optima, or of optima under a particular kind of selection pressure (like natural selection or financial profitability).
Any given theorem need not address every question about agent type signatures; it just needs to tell us something about agent type signatures.
For instance, the subagents argument says that, when our “agents” have internal state in a coherence-theorem-like setup, the “goals” will be pareto optimality over multiple utilities, rather than optimality of a single utility function. This says very little about embeddedness or world models or internal architecture; it addresses only one narrow aspect of agent type signatures. And, like the coherence theorems, it doesn’t directly talk about selection; it just says that any strategy which doesn’t fit the pareto-optimal form is strictly dominated by some other strategy (and therefore we’d expect that other strategy to be selected, all else equal).

From: Selection Theorems: A Program For Understanding Agents

Selection Theorems: A Program For Understanding Agents

johnswentworthSep 28, 2021, 5:03 AM

128 points

28 comments6 min readLW link 2 reviews

Epistemic Strategies of Selection Theorems

adamShimiOct 18, 2021, 8:57 AM

33 points

1 comment12 min readLW link

Some Existing Selection Theorems

johnswentworthSep 30, 2021, 4:13 PM

56 points

5 comments4 min readLW link

Understanding Selection Theorems

adamkMay 28, 2022, 1:49 AM

41 points

3 comments7 min readLW link

What Selection Theorems Do We Expect/Want?

johnswentworthOct 1, 2021, 4:03 PM

71 points

11 comments7 min readLW link

Clarifying the Agent-Like Structure Problem

johnswentworthSep 29, 2022, 9:28 PM

63 points

17 comments6 min readLW link

[Question] Why The Focus on Expected Utility Maximisers?

DragonGodDec 27, 2022, 3:49 PM

118 points

84 comments3 min readLW link

Lessons from Convergent Evolution for AI Alignment

Jan_Kulveit and rosehadshar

Mar 27, 2023, 4:25 PM

54 points

9 comments8 min readLW link

Selection processes for subagents

Ryan KiddJun 30, 2022, 11:57 PM

36 points

2 comments9 min readLW link

An Illustrated Summary of “Robust Agents Learn Causal World Model”

DalcyDec 14, 2024, 3:02 PM

67 points

2 comments10 min readLW link

Ruling Out Lookup Tables

Alfred HarwoodFeb 4, 2025, 10:39 AM

22 points

11 comments7 min readLW link

Select Agent Specifications as Natural Abstractions

lukemarksApr 7, 2023, 11:16 PM

19 points

3 comments5 min readLW link

Proof Explained for “Robust Agents Learn Causal World Model”

DalcyDec 22, 2024, 3:06 PM

25 points

0 comments15 min readLW link

Distilling the Internal Model Principle part II

JoseFaustinoApr 30, 2025, 5:56 PM

15 points

0 comments19 min readLW link

How Do Selection Theorems Relate To Interpretability?

johnswentworthJun 9, 2022, 7:39 PM

60 points

14 comments3 min readLW link

AXRP Episode 15 - Natural Abstractions with John Wentworth

DanielFilanMay 23, 2022, 5:40 AM

34 points

1 comment58 min readLW link

Fixing The Good Regulator Theorem

johnswentworthFeb 9, 2021, 8:30 PM

147 points

39 comments8 min readLW link 1 review

Riffing on the agent type

QuinnDec 8, 2022, 12:19 AM

21 points

3 comments4 min readLW link

Distilling the Internal Model Principle

JoseFaustinoFeb 8, 2025, 2:59 PM

21 points

0 comments16 min readLW link

Project Intro: Selection Theorems for Modularity

CallumMcDougall, Avery and Lucius Bushnaq

Apr 4, 2022, 12:59 PM

73 points

20 comments16 min readLW link

DragonGod Jan 1, 2023, 2:17 PM
4 points
2
I am quite enamoured with John Wentworth’s selection theorems, but find myself somewhat dissatisfied. As Wentworth framed it, I think they are a bit off.
I think selection theorems should be conceived of as theorems about artifacts (the products of constructive optimisation processes) and their constructors (the optimisation processes that created such artifacts).
That is, I am quite unconvinced that “agent” is the “true name” of such artifacts. There are powerful artifacts that do not match the agent archetype as traditionally conceived. I do not know that the artifacts that ultimately matter would necessarily conform to the agent archetype.
Agent selection theorems are IMO ultimately too restrictive, and the selection theorem agenda should be about optimisation processes and the kind of constructs they select for.

Selec­tion Theorems

Selection Theorems