TurnTrout comments on Is ELK enough? Diamond, Matrix and Child AI

TurnTrout 18 Feb 2022 19:19 UTC
LW: 2 AF: 2
0
AF
Why would the planner have pressure to choose something which looks good to the predictor, but is secretly bad, given that it selects a plan based on what the reporter says? Is this a Goodhart’s curse issue, where the curse afflicts not the reporter (which is assumed conservative, if it’s the direct translator), but the predictor’s own understanding of the situation?
- Rohin Shah 18 Feb 2022 19:26 UTC
  LW: 2 AF: 2
  0
  AF Parent
  given that it selects a plan based on what the reporter says?
  … What makes you think it does this? That wasn’t part of my picture.
  - TurnTrout 25 Feb 2022 19:18 UTC
    LW: 2 AF: 2
    0
    AF Parent
    Hm. I’ve often imagined a “keep the diamond safe” planner just choosing a plan which a narrow-ELK-solving reporter says is OK.
    How do you imagine the reporter being used?
    - Rohin Shah 25 Feb 2022 19:50 UTC
      LW: 2 AF: 2
      0
      AF Parent
      Hm. I’ve often imagined a “keep the diamond safe” planner just choosing a plan which a narrow-ELK-solving reporter says is OK.
      But where does the plan come from? If you’re imagining that the planner creates N different plans and then executes the one that the reporter says is OK, then I have the same objection:
      The planner “knows” how and why it chose the action sequence while the predictor doesn’t, and so it’s very plausible that this allows the planner to choose some bad / deceptive sequence that looks good to the predictor. (The classic example is that plagiarism is easy to commit but hard to detect just from the output; see this post.)
      How do you imagine the reporter being used?
      Planner proposes some actions, call them A. The human raters use the reporter to understand the probable consequences of A, how those consequences should be valued, etc. This allows them to provide good feedback on A, creating a powerful and aligned oversight process that can be used as a training signal for the planner.