PeterMcCluskey comments on Briefly thinking through some analogs of debate

PeterMcCluskey 13 Sep 2022 16:16 UTC
2 points
0
The default outcome of debate does not look promising. But there’s a good deal of room to improve on the default.

Maybe half the problem with public discourse is that people have social goals that distract them from reality. I’m not confident that AI researchers will be more truth-oriented, but I see plenty of room for hope.

Drexler’s CAIS paper describes some approaches that are likely needed to make debate work: Section 25:

Optimized advice need not be optimized to induce its acceptance Advice optimized to produce results may be manipulative, optimized to induce a client’s acceptance; advice optimized to produce results conditioned on its acceptance will be neutral in this regard.

Section 20:

Collusion among superintelligent oracles can readily be avoided

C1) To improve the quality of answers, it is natural to implement multiple, diverse (and implicitly competing) systems to propose alternatives.
C2) To identify low-quality or misleading answers, it is natural to employ diverse critics, any one of which could disrupt deceptive collusion.
C3) Systems of diverse, competing proposers and critics naturally implement both independent and adversarial objectives.
C4) It is natural to apply fixed (hence memory-free) system instantiations to multiple problems, incidentally yielding a series of history-blind, single-move decisions.
C5) It is natural to provide differentiated, task-relevant information to systems solving different problems, typically omitting knowledge of general circumstances.

Some of these approaches are costly to implement. That might doom debate.

Success with debate likely depends on the complexity of key issues to be settled by debate, and/or the difficulty of empirically checking proposals.

Eliezer sometimes talks as if we’d be stuck evaluating proposals that are way too complex for humans to fully understand. I expect alignment can be achieved by evaluating some relatively simple, high-level principles. I expect we can reject proposals from AI debaters that are too complex, and select simpler proposals until we can understand them fairly well. But I won’t be surprised if we’re still plagued by doubts at the key junctures.