I think you should mentally model the use case of AI safety debate protocols being in high stakes settings, and that after applying tons of optimisation pressure, for the AI debates to look very different from human debates in the limit. SO debate protocols in particular try to take advantage of self-play, which you can’t do well with humans, so introducing asymmetry through additional roles and rules may make it hard to reason about theoretically (and also possibly weaken the benefit of self-play). So they’d need to be pretty well motivated.
I take the point that in the self-play context this could drift off-course! I suppose (linking this back to the MATS research) I’m suggesting it would be good to measure that beside a more naïve protocol.
I also think MUN is bad.
I think you should mentally model the use case of AI safety debate protocols being in high stakes settings, and that after applying tons of optimisation pressure, for the AI debates to look very different from human debates in the limit. SO debate protocols in particular try to take advantage of self-play, which you can’t do well with humans, so introducing asymmetry through additional roles and rules may make it hard to reason about theoretically (and also possibly weaken the benefit of self-play). So they’d need to be pretty well motivated.
I take the point that in the self-play context this could drift off-course! I suppose (linking this back to the MATS research) I’m suggesting it would be good to measure that beside a more naïve protocol.