tr5tn

Karma: 12

Applied AI security practitioner with a cybersecurity background, and formative rationalist grounding in philosopohy and competitive debate

tr5tn 29 May 2026 12:16 UTC
1 point
0
in reply to: David Africa’s comment on: Suggestions for improving debate protocols in AI safety
I take the point that in the self-play context this could drift off-course! I suppose (linking this back to the MATS research) I’m suggesting it would be good to measure that beside a more naïve protocol.

tr5tn 29 May 2026 11:47 UTC
1 point
0
in reply to: David Africa’s comment on: Suggestions for improving debate protocols in AI safety
@David Africa thanks! Many of those points are certainly worth focusing on. For what it’s worth, I was also an awarded speaker in the Model UN, but I found that format to be far more arbitrary, susceptible to being gamed by speaking skill and rhetoric, and IMO less likely to arrive at something desirable (I led an uprising of militarised third-party countries to vote down all disarmament proposals).
Ultimately, the actual plans, counterplans, kritiks and topicality discussions in policy debate are ridiculous. Every debater I’ve met would acknowledge that. And ultimately, it is a game, so IMO that is to be expected. So I am certainly not agitating for AI Safety outcomes that resemble policy debate verdicts, but I think the game itself is good reference since one of the current problems in AI Safety debate protocols is that they are being gamed. Distinctions between Constructives, Rebuttals and Cross-Examination are really fundamental to policy debate, and we get similar constructs in legal proceedings.
I’m conscious this is all one American reference and I’m not offering empirical findings, but I do think the field should consider these games as reference protocols (as well as cross-cultural legal rules) since they are a result of refinement over decades. They are practices that already exist.
--
Edited to add: not wishing to be dismissive of your empirical findings. I’d love to read more about the difficulties with training or inference-time persona adoption, but also, I don’t know that current negative findings should preclude focus on those problems.

Suggestions for improving debate protocols in AI safety

tr5tn29 May 2026 0:23 UTC

13 points

4 comments5 min readLW link

tr5tn

Sugges­tions for im­prov­ing de­bate pro­to­cols in AI safety

Suggestions for improving debate protocols in AI safety