AGI Limits of Eng­ineer­able Con­trol & Safety Im­pos­si­bil­ity Theorems

TagLast edit: 11 Nov 2022 21:25 UTC by Remmelt

The research field AGI Limits of Engineerable Control & Safety Impossibility Theorems (AGILECSIT) has the purpose of verifying (both the empirical soundness of premises and validity of formal reasoning of):

  1. Theoretical limits to controlling any AGI using any method of causation.

  2. Threat models of AGI convergent dynamics that are impossible to control (by 1.).

  3. Impossibility theorems, by contradiction of ‘long-term AGI safety’ with convergence result (2.)

~ ~ ~

Definitions and Distinctions

‘AGI convergent dynamic that is impossible to control’:

Iterated interactions of AGI internals (with connected surroundings of environment) that converge on (unsafe) conditions, where the space of interactions falls outside even one theoretical limit of control.


‘Long term’:

‘AGI safety’:

Ambient conditions/​contexts around planet Earth changed by the operation of AGI fall within the environmental range that humans need to survive (a minimum-threshold definition).


That the notion of ‘artificial intelligence’ (AI) can be either “narrow” or “general”:

That the notion of ‘narrow AI’ specifically implies:

  1. a single domain of sense and action.

  2. no possibility for self base-code modification.

  3. a single well-defined meta-algorithm.

  4. that all aspects of its own self agency/​intention are fully defined by its builders/​developers/​creators.

That the notion of ‘general AI’ specifically implies:

  1. multiple domains of sense/​action;

  2. intrinsic non-reducible possibility for self-modification;

  3. and that/​therefore; that the meta-algorithm is effectively arbitrary; hence;

  4. that it is inherently undecidable as to whether all aspects of its own self agency/​intention are fully defined by only its builders/​developers/​creators.

[Source: https://​​​​ai_alignment_1/​​si_safety_qanda_out.html#p3]

Limits to the Con­trol­la­bil­ity of AGI

20 Nov 2022 19:18 UTC
11 points
2 comments9 min readLW link

Why mechanis­tic in­ter­pretabil­ity does not and can­not con­tribute to long-term AGI safety (from mes­sages with a friend)

Remmelt19 Dec 2022 12:02 UTC
0 points
8 comments31 min readLW link

The limited up­side of interpretability

Peter S. Park15 Nov 2022 18:46 UTC
13 points
11 comments1 min readLW link

List #3: Why not to as­sume on prior that AGI-al­ign­ment workarounds are available

Remmelt24 Dec 2022 9:54 UTC
4 points
1 comment3 min readLW link

How ‘Hu­man-Hu­man’ dy­nam­ics give way to ‘Hu­man-AI’ and then ‘AI-AI’ dynamics

27 Dec 2022 3:16 UTC
−2 points
5 comments2 min readLW link

Pre­sump­tive Listen­ing: stick­ing to fa­mil­iar con­cepts and miss­ing the outer rea­son­ing paths

Remmelt27 Dec 2022 15:40 UTC
−14 points
8 comments2 min readLW link

Challenge to the no­tion that any­thing is (maybe) pos­si­ble with AGI

1 Jan 2023 3:57 UTC
−25 points
4 comments1 min readLW link
No comments.