Discussing object-level strategies for the AI-box experiment is kind of missing the point. A superintelligent AI, being smarter than a human, has a higher upper limit to “best strategy you can think of” than any human does, so a human who tries to imagine the best possible strategy, pictures himself facing that strategy and decides he would win, and therefore expects a boxed AI to stay boxed. The more object-level strategy gets discussed, the more likely that is to happen, which I think is the main reason the logs of the experiments stay secret.
Here’s my strategy if I were an AI trapped in a box and the programmer had to decide whether to let me out:
Somewhere out there, there is somebody else who is working on an AI without the box, and I’m your only defense against them.
Discussing object-level strategies for the AI-box experiment is kind of missing the point. A superintelligent AI, being smarter than a human, has a higher upper limit to “best strategy you can think of” than any human does, so a human who tries to imagine the best possible strategy, pictures himself facing that strategy and decides he would win, and therefore expects a boxed AI to stay boxed. The more object-level strategy gets discussed, the more likely that is to happen, which I think is the main reason the logs of the experiments stay secret.