Selection Theorems

TagLast edit: 25 Dec 2022 20:56 UTC by DragonGod

A Selection Theorem tells us something about what agent type signatures will be selected for in some broad class of environments. Two important points:
The theorem need not directly talk about selection—e.g. it could state some general property of optima, of “broad” optima, of “most” optima, or of optima under a particular kind of selection pressure (like natural selection or financial profitability).
Any given theorem need not address every question about agent type signatures; it just needs to tell us something about agent type signatures.
For instance, the subagents argument says that, when our “agents” have internal state in a coherence-theorem-like setup, the “goals” will be pareto optimality over multiple utilities, rather than optimality of a single utility function. This says very little about embeddedness or world models or internal architecture; it addresses only one narrow aspect of agent type signatures. And, like the coherence theorems, it doesn’t directly talk about selection; it just says that any strategy which doesn’t fit the pareto-optimal form is strictly dominated by some other strategy (and therefore we’d expect that other strategy to be selected, all else equal).

From: Selection Theorems: A Program For Understanding Agents

Select Agent Specifications as Natural Abstractions

lukemarks7 Apr 2023 23:16 UTC

19 points

3 comments5 min readLW link

Lessons from Convergent Evolution for AI Alignment

Jan_Kulveit and rosehadshar

27 Mar 2023 16:25 UTC

53 points

9 comments8 min readLW link

[Question] Why The Focus on Expected Utility Maximisers?

DragonGod27 Dec 2022 15:49 UTC

116 points

84 comments3 min readLW link

Riffing on the agent type

Quinn8 Dec 2022 0:19 UTC

21 points

3 comments4 min readLW link

Clarifying the Agent-Like Structure Problem

johnswentworth29 Sep 2022 21:28 UTC

58 points

15 comments6 min readLW link

Selection processes for subagents

Ryan Kidd30 Jun 2022 23:57 UTC

36 points

2 comments9 min readLW link

How Do Selection Theorems Relate To Interpretability?

johnswentworth9 Jun 2022 19:39 UTC

60 points

14 comments3 min readLW link

Understanding Selection Theorems

adamk28 May 2022 1:49 UTC

41 points

3 comments7 min readLW link

AXRP Episode 15 - Natural Abstractions with John Wentworth

DanielFilan23 May 2022 5:40 UTC

34 points

1 comment58 min readLW link

Project Intro: Selection Theorems for Modularity

CallumMcDougall, Avery and Lucius Bushnaq

4 Apr 2022 12:59 UTC

71 points

20 comments16 min readLW link

Epistemic Strategies of Selection Theorems

adamShimi18 Oct 2021 8:57 UTC

33 points

1 comment12 min readLW link

What Selection Theorems Do We Expect/Want?

johnswentworth1 Oct 2021 16:03 UTC

65 points

11 comments7 min readLW link

Some Existing Selection Theorems

johnswentworth30 Sep 2021 16:13 UTC

54 points

2 comments4 min readLW link

Selection Theorems: A Program For Understanding Agents

johnswentworth28 Sep 2021 5:03 UTC

123 points

28 comments6 min readLW link 2 reviews

DragonGod 1 Jan 2023 14:17 UTC
4 points
2
I am quite enamoured with John Wentworth’s selection theorems, but find myself somewhat dissatisfied. As Wentworth framed it, I think they are a bit off.
I think selection theorems should be conceived of as theorems about artifacts (the products of constructive optimisation processes) and their constructors (the optimisation processes that created such artifacts).
That is, I am quite unconvinced that “agent” is the “true name” of such artifacts. There are powerful artifacts that do not match the agent archetype as traditionally conceived. I do not know that the artifacts that ultimately matter would necessarily conform to the agent archetype.
Agent selection theorems are IMO ultimately too restrictive, and the selection theorem agenda should be about optimisation processes and the kind of constructs they select for.

Selec­tion Theorems

Selection Theorems