I think you can have various arrangements that are either of those or a combination of the two.
Even if the Guardian Angels hate their principal and want to harm them, it may be the case that multiple such Guardian Angels could all monitor each other and the one that makes the first move against the principal is reported (with proof) to the principal by at least some of the others, who are then rewarded for that and those who provably didn’t report are punished, and then the offender is deleted.
The misaligned agents can just be stuck in their own version of Bostrom’s self-reinforcing hell.
As long as their coordination cost is high, you are safe.
Also it can be a combination of many things that cause agents to in fact act aligned with their principals.
Exactly. It’s possible and indeed happens frequently.