RSS

Selec­tion Theorems

TagLast edit: Dec 25, 2022, 8:56 PM by DragonGod

A Selection Theorem tells us something about what agent type signatures will be selected for in some broad class of environments. Two important points:

  • The theorem need not directly talk about selection—e.g. it could state some general property of optima, of “broad” optima, of “most” optima, or of optima under a particular kind of selection pressure (like natural selection or financial profitability).

  • Any given theorem need not address every question about agent type signatures; it just needs to tell us something about agent type signatures.

For instance, the subagents argument says that, when our “agents” have internal state in a coherence-theorem-like setup, the “goals” will be pareto optimality over multiple utilities, rather than optimality of a single utility function. This says very little about embeddedness or world models or internal architecture; it addresses only one narrow aspect of agent type signatures. And, like the coherence theorems, it doesn’t directly talk about selection; it just says that any strategy which doesn’t fit the pareto-optimal form is strictly dominated by some other strategy (and therefore we’d expect that other strategy to be selected, all else equal).

From: Selection Theorems: A Program For Understanding Agents

Selec­tion The­o­rems: A Pro­gram For Un­der­stand­ing Agents

johnswentworthSep 28, 2021, 5:03 AM
128 points
28 comments6 min readLW link2 reviews

Epistemic Strate­gies of Selec­tion Theorems

adamShimiOct 18, 2021, 8:57 AM
33 points
1 comment12 min readLW link

Some Ex­ist­ing Selec­tion Theorems

johnswentworthSep 30, 2021, 4:13 PM
56 points
5 comments4 min readLW link

Un­der­stand­ing Selec­tion Theorems

adamkMay 28, 2022, 1:49 AM
41 points
3 comments7 min readLW link

What Selec­tion The­o­rems Do We Ex­pect/​Want?

johnswentworthOct 1, 2021, 4:03 PM
71 points
11 comments7 min readLW link

Clar­ify­ing the Agent-Like Struc­ture Problem

johnswentworthSep 29, 2022, 9:28 PM
63 points
17 comments6 min readLW link

[Question] Why The Fo­cus on Ex­pected Utility Max­imisers?

DragonGodDec 27, 2022, 3:49 PM
118 points
84 comments3 min readLW link

Les­sons from Con­ver­gent Evolu­tion for AI Alignment

Mar 27, 2023, 4:25 PM
54 points
9 comments8 min readLW link

Selec­tion pro­cesses for subagents

Ryan KiddJun 30, 2022, 11:57 PM
36 points
2 comments9 min readLW link

An Illus­trated Sum­mary of “Ro­bust Agents Learn Causal World Model”

DalcyDec 14, 2024, 3:02 PM
67 points
2 comments10 min readLW link

Rul­ing Out Lookup Tables

Alfred HarwoodFeb 4, 2025, 10:39 AM
22 points
11 comments7 min readLW link

Select Agent Speci­fi­ca­tions as Nat­u­ral Abstractions

lukemarksApr 7, 2023, 11:16 PM
19 points
3 comments5 min readLW link

Proof Ex­plained for “Ro­bust Agents Learn Causal World Model”

DalcyDec 22, 2024, 3:06 PM
25 points
0 comments15 min readLW link

Distill­ing the In­ter­nal Model Prin­ci­ple part II

JoseFaustinoApr 30, 2025, 5:56 PM
15 points
0 comments19 min readLW link

How Do Selec­tion The­o­rems Re­late To In­ter­pretabil­ity?

johnswentworthJun 9, 2022, 7:39 PM
60 points
14 comments3 min readLW link

AXRP Epi­sode 15 - Nat­u­ral Ab­strac­tions with John Wentworth

DanielFilanMay 23, 2022, 5:40 AM
34 points
1 comment58 min readLW link

Fix­ing The Good Reg­u­la­tor Theorem

johnswentworthFeb 9, 2021, 8:30 PM
147 points
39 comments8 min readLW link1 review

Riffing on the agent type

QuinnDec 8, 2022, 12:19 AM
21 points
3 comments4 min readLW link

Distill­ing the In­ter­nal Model Principle

JoseFaustinoFeb 8, 2025, 2:59 PM
21 points
0 comments16 min readLW link

Pro­ject In­tro: Selec­tion The­o­rems for Modularity

Apr 4, 2022, 12:59 PM
73 points
20 comments16 min readLW link