Sam Marks comments on ryan_greenblatt’s Shortform

Sam Marks 19 Jun 2025 7:01 UTC
LW: 4 AF: 3
1
AF
On terminology, I prefer to say “recursive oversight” to refer to methods that leverage assistance from weaker AIs to oversee stronger AIs. IDA is a central example here. Like you, I’m skeptical of recursive oversight schemes scaling to arbitrarily powerful models.
However, I think it’s plausible that other oversight strategies (e.g. ELK-style strategies that attempt to elicit and leverage the strong learner’s own knowledge) could succeed at scaling to arbitrarily powerful models, or at least to substantially superhuman models. This is the regime that I typically think about and target with my work, and I think it’s reasonable for others to do so as well.
- Buck 19 Jun 2025 17:05 UTC
  LW: 2 AF: 2
  0
  AF Parent
  I agree with preferring “recursive oversight”.
- ryan_greenblatt 19 Jun 2025 14:09 UTC
  LW: 2 AF: 2
  0
  AF Parent
  Presumably the term “recursive oversight” also includes oversight schemes which leverage assistance from AIs of similar strengths (rather than weaker AIs) to oversee some AI? (E.g., debate, recursive reward modeling.)
  
  Note that I was pointing to a somewhat broader category than this which includes stuff like “training your human overseers more effectively” or “giving your human overseers better software (non-AI) tools”. But point taken.
  - Sam Marks 19 Jun 2025 21:06 UTC
    LW: 6 AF: 5
    0
    AF Parent
    Yeah, maybe I should have defined “recursive oversight” as “techniques that attempt to bootstrap from weak oversight to stronger oversight.” This would include IDA and task decomposition approaches (e.g. RRM). It wouldn’t seem to include debate, and that seems fine from my perspective. (And I indeed find it plausible that debate-shaped approaches could in fact scale arbitrarily, though I don’t think that existing debate schemes are likely to work without substantial new ideas.)