Stuart_Armstrong comments on Writeup: Progress on AI Safety via Debate

Stuart_Armstrong 3 Mar 2020 14:26 UTC
LW: 4 AF: 2
0
AF
Very impressive work, both the output and how you iterate on it.

Some thoughts about the cross-examination issue, prompted by your “Implementation 2 for human debaters: teams of two”. It occurred to me that B* could win if it could predict A and B’s future behaviour, and match up it’s answer with B.

I’d prefer that such an option not exist; that B could answer the question directly, without needing to rewind. Hence prediction won’t help.

Cross-examination still helps: A can cross examine as soon as they suspect B is shielding behind an ambiguity. This means that A might have to abandon their current question line, and start again on the other one. This seems more secure (if longer).