[Question] Is there any policy for a fair treatment of AIs whose friendliness is in doubt?

A game theory question.

If there ever exists a roughly-human-level agenty AI who could grow to overpower humans but who humans have an opportunity to stop because takeoff is slow enough. Assume the AI could coexist with humanity but fears that humans interacting with it will destroy it because they fear it is unfriendly.

Should humans have a policy of treating any potentially unfriendly agenty AI well to the extent compatible with our safety? For instance, halt it but record its state, intend to inspect it whenever we have the means, rerun it some time in the future if we deem it safe and have the resources, run it sandboxed (e.g. by a much smarter AI) if we deem it unfriendly but safe to run sandboxed.

If the AI values its survival or well-being, this could change how it values being halted out of caution. For instance, it might consider some probability of being halted an acceptable risk rather than something to be avoided at all cost.