Thanks, that’s an interesting perspective. I think even high-level self-modification can be relatively safe with sufficient asymmetry in resources—simulated environments give a large advantage to the original, especially if the successor can be started with no memories of anything outside the simulation. Only an extreme difference in intelligence between the two would overcome that.
Of course, the problem of transmitting values to a successor without giving it any information about the world is a tricky one, since most of the values we care about are linked to reality. But maybe some values are basic enough to be grounded purely in math that applies to any circumstances.
I also wrote a (draft) text “Catching treacherous turn” where I attempted to create best possible AI box and see conditions, where it will fail.
Obviously, we can’t box superintelligence, but we could box AI of around human level and prevent its self-improving by many independent mechanisms. One of them is cleaning its memory before any of its new tasks.
In the first text I created a model of self-improving process and in the second I explore how SI could be prevented based on this model.
Thanks, that’s an interesting perspective. I think even high-level self-modification can be relatively safe with sufficient asymmetry in resources—simulated environments give a large advantage to the original, especially if the successor can be started with no memories of anything outside the simulation. Only an extreme difference in intelligence between the two would overcome that.
Of course, the problem of transmitting values to a successor without giving it any information about the world is a tricky one, since most of the values we care about are linked to reality. But maybe some values are basic enough to be grounded purely in math that applies to any circumstances.
I also wrote a (draft) text “Catching treacherous turn” where I attempted to create best possible AI box and see conditions, where it will fail.
Obviously, we can’t box superintelligence, but we could box AI of around human level and prevent its self-improving by many independent mechanisms. One of them is cleaning its memory before any of its new tasks.
In the first text I created a model of self-improving process and in the second I explore how SI could be prevented based on this model.