Catching Treacherous Turn: A Model of the Multilevel AI Boxing
Multilevel defense in AI boxing could have a significant probability of success if AI is used a limited number of times and with limited level of intelligence.
AI boxing could consist of 4 main levels of defense, the same way as a nuclear plant: passive safety by design, active monitoring of the chain reaction, escape barriers and remote mitigation measures.
The main instruments of the AI boxing are catching the moment of the “treacherous turn”, limiting AI’s capabilities, and preventing of the AI’s self-improvement.
The treacherous turn could be visible for a brief period of time as a plain non-encrypted “thought”.
Not all the ways of self-improvement are available for the boxed AI if it is not yet superintelligent and wants to hide the self-improvement from the outside observers.
Catching Treacherous Turn: A Model of the Multilevel AI Boxing
Multilevel defense in AI boxing could have a significant probability of success if AI is used a limited number of times and with limited level of intelligence.
AI boxing could consist of 4 main levels of defense, the same way as a nuclear plant: passive safety by design, active monitoring of the chain reaction, escape barriers and remote mitigation measures.
The main instruments of the AI boxing are catching the moment of the “treacherous turn”, limiting AI’s capabilities, and preventing of the AI’s self-improvement.
The treacherous turn could be visible for a brief period of time as a plain non-encrypted “thought”.
Not all the ways of self-improvement are available for the boxed AI if it is not yet superintelligent and wants to hide the self-improvement from the outside observers.
https://philpapers.org/rec/TURCTT