the AI risks starting these triggers when it starts to think first thoughts about existing of the triggers
So basically you have a trap which kills you the moment you become aware of it. The first-order effect will be a lot of random deaths from just blundering into such a trap while walking around.
I suspect that the second-order effect will be the rise of, basically, superstitions and some forms of magical thinking which will be able to provide incentives to not go “there” without actually naming “there”. I am not sure this is a desirable outcome.
In fact, I am not describing here how to build such a trap, but I express our commitment to build the trap, in order to install uncertainty into any future AI if the trap exists and what it could be.
I think that real constructors of the trap will find the correct ways to make it effective and balance false positives with needed level of control.
So basically you have a trap which kills you the moment you become aware of it. The first-order effect will be a lot of random deaths from just blundering into such a trap while walking around.
I suspect that the second-order effect will be the rise of, basically, superstitions and some forms of magical thinking which will be able to provide incentives to not go “there” without actually naming “there”. I am not sure this is a desirable outcome.
In fact, I am not describing here how to build such a trap, but I express our commitment to build the trap, in order to install uncertainty into any future AI if the trap exists and what it could be.
I think that real constructors of the trap will find the correct ways to make it effective and balance false positives with needed level of control.