Very impressive work, both the output and how you iterate on it.
Some thoughts about the cross-examination issue, prompted by your “Implementation 2 for human debaters: teams of two”. It occurred to me that B* could win if it could predict A and B’s future behaviour, and match up it’s answer with B.
I’d prefer that such an option not exist; that B could answer the question directly, without needing to rewind. Hence prediction won’t help.
Cross-examination still helps: A can cross examine as soon as they suspect B is shielding behind an ambiguity. This means that A might have to abandon their current question line, and start again on the other one. This seems more secure (if longer).