Rohin Shah comments on AI Unsafety via Non-Zero-Sum Debate

Rohin Shah 13 Jul 2020 22:11 UTC
LW: 7 AF: 5
0
AF
Planned summary for the Alignment Newsletter:
This post points out that <@debate@>(@AI safety via debate@) relies crucially on creating a zero-sum game in order to ensure that the debaters point out flaws in each other’s arguments. For example, if you modified debate so that both agents are penalized for an inconclusive debate, then an agent may decide not to point out a flaw in an argument if it believes that it has some chance of confusing the judge.