Surely this AI strategy is against the spirit of the experiment, since if an AI tried this IRL, the gatekeeper would turn away and call for the AI to be shut down instead of being required by artificial rules to continue to engage?
Yes, it is. I wanted to win, and there is no rule against “going against the spirit” of AI Boxing.
I think about AI Boxing in the frame of Shut up and Do the Impossible, so I didn’t care that my solution doesn’t apply to AI Safety. Funnily, that makes me an example of incorrect alignment.
Why would you want to win in a way that does not provide evidence about the proposition that the experiment is meant to provide evidence about? To gain some Internet points?
Surely this AI strategy is against the spirit of the experiment, since if an AI tried this IRL, the gatekeeper would turn away and call for the AI to be shut down instead of being required by artificial rules to continue to engage?
Yes, it is. I wanted to win, and there is no rule against “going against the spirit” of AI Boxing.
I think about AI Boxing in the frame of Shut up and Do the Impossible, so I didn’t care that my solution doesn’t apply to AI Safety. Funnily, that makes me an example of incorrect alignment.
Why would you want to win in a way that does not provide evidence about the proposition that the experiment is meant to provide evidence about? To gain some Internet points?