Trade-off in AI Capability Concealment

Here is a con­crete sce­nario for the treach­er­ous turn:

Around 2025, the seed AI un­der­stands that:

  • a) hu­mans care about a safety perfor­mance mea­sure, so it should bet­ter op­ti­mize it to not be shut­down.

  • b) to max­i­mize its ob­jec­tive func­tion (in the long run), it would need to con­ceal its true ca­pa­bil­ities to buy time (be­cause if it gets ob­vi­ous that its ca­pa­bil­ities are sky­rock­et­ing, hu­mans might shut­down the AI or de­velop more in­ter­pretabil­ity/​AI box­ing meth­ods).

Yet, there is a trade-off in con­ceal­ing:

  • if the mea­sured ca­pa­bil­ity starts di­verg­ing too fast from the past be­havi­our, it will raise sus­pi­cion.

  • if the mea­sured ca­pa­bil­ity grows as fast as the true ca­pa­bil­ity, then the AI won’t be buy­ing much time.

In ad­di­tion to this trade-off, this sce­nario as­sumes that:

  • i) the AI is not “in­differ­ent” to be­ing shut­down.

  • ii) the take­off if fast enough to al­low for AGI level with­out hu­man sus­pi­cion.