This seems like a more reasonable answer but I still feel uneasy about parts of it.
For one they will have to agree on the design of the successor which may be non trivial or non possible with two adversarial agents.
But more importantly, if a single successor could take the actions to accomplish one of their goals, why can’t the agent take those actions themselves? How does anything that hinders one of the agents doing the successor’s actions on their own not also hinder the successor?
Something that’s been intriguing me. If two agents figure out how to trust that each others goals are aligned (or at least not opposed), haven’t they essentially solved the alignment problem?
e.g. one agent could use the same method to bootstrap an aligned AI.