The research field AGI Limits of Engineerable Control & Safety Impossibility Theorems (AGILECSIT) has the purpose of verifying (both the empirical soundness of premises and validity of formal reasoning of):
Theoretical limits to controlling any AGI using any method of causation.
Threat models of AGI convergent dynamics that are impossible to control (by 1.).
Impossibility theorems, by contradiction of ‘long-term AGI safety’ with convergence result (2.)
~ ~ ~
Definitions and Distinctions
‘AGI convergent dynamic that is impossible to control’:
Iterated interactions of AGI internals (with connected surroundings of environment) that converge on (unsafe) conditions, where the space of interactions falls outside even one theoretical limit of control.
In theory, the control of system A over system B means that A can influence system B to achieve A’s desired subset of state space [Source: https://arxiv.org/pdf/2109.00484.pdf].
In practice, to engineer control of AGI requires simulating or detecting any unsafe effects internally, and then preventing or correcting those effects externally.
In theory: into perpetuity.
In practice: over a thousand years.
Ambient conditions/contexts around planet Earth changed by the operation of AGI fall within the environmental range that humans need to survive (a minimum-threshold definition).
That the notion of ‘artificial intelligence’ (AI) can be either “narrow” or “general”:
That the notion of ‘narrow AI’ specifically implies:
a single domain of sense and action.
no possibility for self base-code modification.
a single well-defined meta-algorithm.
that all aspects of its own self agency/intention are fully defined by its builders/developers/creators.
That the notion of ‘general AI’ specifically implies:
multiple domains of sense/action;
intrinsic non-reducible possibility for self-modification;
and that/therefore; that the meta-algorithm is effectively arbitrary; hence;
that it is inherently undecidable as to whether all aspects of its own self agency/intention are fully defined by only its builders/developers/creators.