ryan_greenblatt comments on ryan_greenblatt’s Shortform

ryan_greenblatt 19 Jun 2025 14:09 UTC
LW: 2 AF: 2
0
AF
Presumably the term “recursive oversight” also includes oversight schemes which leverage assistance from AIs of similar strengths (rather than weaker AIs) to oversee some AI? (E.g., debate, recursive reward modeling.)

Note that I was pointing to a somewhat broader category than this which includes stuff like “training your human overseers more effectively” or “giving your human overseers better software (non-AI) tools”. But point taken.
- Sam Marks 19 Jun 2025 21:06 UTC
  LW: 6 AF: 5
  0
  AF Parent
  Yeah, maybe I should have defined “recursive oversight” as “techniques that attempt to bootstrap from weak oversight to stronger oversight.” This would include IDA and task decomposition approaches (e.g. RRM). It wouldn’t seem to include debate, and that seems fine from my perspective. (And I indeed find it plausible that debate-shaped approaches could in fact scale arbitrarily, though I don’t think that existing debate schemes are likely to work without substantial new ideas.)