Fundamental Controllability Limits

TagLast edit: 15 Dec 2023 12:26 UTC by Remmelt

The research field Fundamental Controllability Limits has the purpose of verifying (both the empirical soundness of premises and validity of formal reasoning of):

Theoretical limits to controlling any AGI using any method of causation.
Threat models of AGI convergent dynamics that are impossible to control (by 1.).
Impossibility theorems, by contradiction of ‘long-term AGI safety’ with convergence result (2.)

~ ~ ~

Definitions and Distinctions

‘AGI convergent dynamic that is impossible to control’:

Iterated interactions of AGI internals (with connected surroundings of environment) that converge on (unsafe) conditions, where the space of interactions falls outside even one theoretical limit of control.

‘Control:’

In theory, the control of system A over system B means that A can influence system B to achieve A’s desired subset of state space [Source: https://arxiv.org/pdf/2109.00484.pdf].
In practice, to engineer control of AGI requires simulating or detecting any unsafe effects internally, and then preventing or correcting those effects externally.

‘Long term’:

In theory: into perpetuity.
In practice: over a thousand years.

‘AGI safety’:

Ambient conditions/contexts around planet Earth changed by the operation of AGI fall within the environmental range that humans need to survive (a minimum-threshold definition).

‘AGI’:

That the notion of ‘artificial intelligence’ (AI) can be either “narrow” or “general”:

That the notion of ‘narrow AI’ specifically implies:

a single domain of sense and action.
no possibility for self base-code modification.
a single well-defined meta-algorithm.
that all aspects of its own self agency/intention are fully defined by its builders/developers/creators.

That the notion of ‘general AI’ specifically implies:

multiple domains of sense/action;
intrinsic non-reducible possibility for self-modification;
and that/therefore; that the meta-algorithm is effectively arbitrary; hence;
that it is inherently undecidable as to whether all aspects of its own self agency/intention are fully defined by only its builders/developers/creators.

[Source: https://mflb.com/ai_alignment_1/si_safety_qanda_out.html#p3]