johnswentworth comments on Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy

johnswentworth 26 Jul 2023 21:32 UTC
LW: -2 AF: -1
−14
AF
I’m not really sure what you mean by “oversight, but add an epicycle” or how to determine if this is a good summary.
Something like: the OP is proposing oversight of the overseer, and it seems like the obvious next move would be to add an overseer of the oversight-overseer. And then an overseer of the oversight-oversight-overseer. Etc.
And the implicit criticism is something like: sure, this would probably marginally improve oversight, but it’s a kind of marginal improvement which does not really move us closer in idea-space to whatever the next better paradigm will be which replaces oversight (and is therefore not really marginal progress in the sense which matters more). In the same way that adding epicycles to a model of circular orbits does make the model match real orbits marginally better (for a little while, in a way which does not generalize to longer timespans), but doesn’t really move closer in idea space to the better model which eventually replaces circular orbits (and the epicycles are therefore not really marginal progress in the sense which matters more).
- Buck 27 Jul 2023 15:51 UTC
  LW: 10 AF: 5
  3
  AF Parent
  the OP is proposing oversight of the overseer,
  I don’t think this is right, at least in the way I usually use the terms. We’re proposing a strategy for conservatively estimating the quality of an “overseer” (i.e. a system which is responsible for estimating the goodness of model actions). I think that you aren’t summarizing the basic point if you try to use the word “oversight” for both of those.
  - johnswentworth 27 Jul 2023 16:30 UTC
    LW: 2 AF: 2
    0
    AF Parent
    That’s useful, thanks.