Surely, if we knew that the AI was unfriendly
We are not living in a world of certain knowledge about what is and what isn’t dangerous.
The AI box experiment shows an agent that is constantly begging or persuading its gatekeeper to be released, including threats to torture simulated copies of its creators, and I would think such psychotic behavior is already a pretty big red flag.
There’s a good reason that Eliezer decided not to provide any transcript for the original experiment. It seems that those confused you about what the experiment is about.