Fun­da­men­tal Con­trol­la­bil­ity Limits

TagLast edit: 15 Dec 2023 12:26 UTC by Remmelt

The research field Fundamental Controllability Limits has the purpose of verifying (both the empirical soundness of premises and validity of formal reasoning of):

  1. Theoretical limits to controlling any AGI using any method of causation.

  2. Threat models of AGI convergent dynamics that are impossible to control (by 1.).

  3. Impossibility theorems, by contradiction of ‘long-term AGI safety’ with convergence result (2.)

~ ~ ~

Definitions and Distinctions

‘AGI convergent dynamic that is impossible to control’:

Iterated interactions of AGI internals (with connected surroundings of environment) that converge on (unsafe) conditions, where the space of interactions falls outside even one theoretical limit of control.


‘Long term’:

‘AGI safety’:

Ambient conditions/​contexts around planet Earth changed by the operation of AGI fall within the environmental range that humans need to survive (a minimum-threshold definition).


That the notion of ‘artificial intelligence’ (AI) can be either “narrow” or “general”:

That the notion of ‘narrow AI’ specifically implies:

  1. a single domain of sense and action.

  2. no possibility for self base-code modification.

  3. a single well-defined meta-algorithm.

  4. that all aspects of its own self agency/​intention are fully defined by its builders/​developers/​creators.

That the notion of ‘general AI’ specifically implies:

  1. multiple domains of sense/​action;

  2. intrinsic non-reducible possibility for self-modification;

  3. and that/​therefore; that the meta-algorithm is effectively arbitrary; hence;

  4. that it is inherently undecidable as to whether all aspects of its own self agency/​intention are fully defined by only its builders/​developers/​creators.

[Source: https://​​​​ai_alignment_1/​​si_safety_qanda_out.html#p3]

The Con­trol Prob­lem: Un­solved or Un­solv­able?

Remmelt2 Jun 2023 15:42 UTC
47 points
46 comments14 min readLW link

Pro­jects I would like to see (pos­si­bly at AI Safety Camp)

Linda Linsefors27 Sep 2023 21:27 UTC
22 points
12 comments4 min readLW link

Limits to the Con­trol­la­bil­ity of AGI

20 Nov 2022 19:18 UTC
11 points
2 comments9 min readLW link

On the pos­si­bil­ity of im­pos­si­bil­ity of AGI Long-Term Safety

Roman Yen13 May 2023 18:38 UTC
6 points
1 comment9 min readLW link

Why mechanis­tic in­ter­pretabil­ity does not and can­not con­tribute to long-term AGI safety (from mes­sages with a friend)

Remmelt19 Dec 2022 12:02 UTC
−3 points
9 comments31 min readLW link

The limited up­side of interpretability

Peter S. Park15 Nov 2022 18:46 UTC
13 points
11 comments1 min readLW link

List #3: Why not to as­sume on prior that AGI-al­ign­ment workarounds are available

Remmelt24 Dec 2022 9:54 UTC
4 points
1 comment3 min readLW link

How ‘Hu­man-Hu­man’ dy­nam­ics give way to ‘Hu­man-AI’ and then ‘AI-AI’ dynamics

27 Dec 2022 3:16 UTC
−2 points
5 comments2 min readLW link

Challenge to the no­tion that any­thing is (maybe) pos­si­ble with AGI

1 Jan 2023 3:57 UTC
−27 points
4 comments1 min readLW link

[Question] Help me solve this prob­lem: The basilisk isn’t real, but peo­ple are

canary_itm26 Nov 2023 17:44 UTC
−19 points
4 comments1 min readLW link
No comments.