avturchin comments on avturchin’s Shortform

avturchin 23 Jun 2021 15:03 UTC
4 points
0
Catching Treacherous Turn: A Model of the Multilevel AI Boxing
- Multilevel defense in AI boxing could have a significant probability of success if AI is used a limited number of times and with limited level of intelligence.
- AI boxing could consist of 4 main levels of defense, the same way as a nuclear plant: passive safety by design, active monitoring of the chain reaction, escape barriers and remote mitigation measures.
- The main instruments of the AI boxing are catching the moment of the “treacherous turn”, limiting AI’s capabilities, and preventing of the AI’s self-improvement.
- The treacherous turn could be visible for a brief period of time as a plain non-encrypted “thought”.
- Not all the ways of self-improvement are available for the boxed AI if it is not yet superintelligent and wants to hide the self-improvement from the outside observers.
https://philpapers.org/rec/TURCTT