The adversarial machine would get destroyed by the AGI whose outputs its testing or us if it doesn’t protect us (or trade/coordinate with the first AGI), so it seems like it’s motivated to protect us.
The adversarial machine would get destroyed by the AGI whose outputs its testing or us if it doesn’t protect us (or trade/coordinate with the first AGI), so it seems like it’s motivated to protect us.