Presumably the term “recursive oversight” also includes oversight schemes which leverage assistance from AIs of similar strengths (rather than weaker AIs) to oversee some AI? (E.g., debate, recursive reward modeling.)
Note that I was pointing to a somewhat broader category than this which includes stuff like “training your human overseers more effectively” or “giving your human overseers better software (non-AI) tools”. But point taken.
Yeah, maybe I should have defined “recursive oversight” as “techniques that attempt to bootstrap from weak oversight to stronger oversight.” This would include IDA and task decomposition approaches (e.g. RRM). It wouldn’t seem to include debate, and that seems fine from my perspective. (And I indeed find it plausible that debate-shaped approaches could in fact scale arbitrarily, though I don’t think that existing debate schemes are likely to work without substantial new ideas.)
Presumably the term “recursive oversight” also includes oversight schemes which leverage assistance from AIs of similar strengths (rather than weaker AIs) to oversee some AI? (E.g., debate, recursive reward modeling.)
Note that I was pointing to a somewhat broader category than this which includes stuff like “training your human overseers more effectively” or “giving your human overseers better software (non-AI) tools”. But point taken.
Yeah, maybe I should have defined “recursive oversight” as “techniques that attempt to bootstrap from weak oversight to stronger oversight.” This would include IDA and task decomposition approaches (e.g. RRM). It wouldn’t seem to include debate, and that seems fine from my perspective. (And I indeed find it plausible that debate-shaped approaches could in fact scale arbitrarily, though I don’t think that existing debate schemes are likely to work without substantial new ideas.)