Steven Byrnes comments on Goodhart Ethology

Steven Byrnes 17 Sep 2021 18:58 UTC
LW: 6 AF: 4
AF
now suppose this curve represents the human ratings of different courses of action, and you choose the action that your model says will have the highest rating. You’re going to predictably mess up again, because of the optimizer’s curse (or regressional Goodhart on the correlation between modeled rating and actual rating).
It’s not obvious to me how the optimizer’s curse fits in here (if at all). If each of the evaluations has the same noise, then picking the action that the model says will have the highest rating is the right thing to do. The optimizer’s curse says that the model is likely to overestimate how good this “best” action is, but so what? “Mess up” conventionally means “the AI picked the wrong action”, and the optimizer’s curse is not related to that (unless there’s variable noise across different choices and the AI didn’t correct for that). Sorry if I’m misunderstanding.
- Charlie Steiner 17 Sep 2021 21:29 UTC
  LW: 4 AF: 2
  AF Parent
  Yeah, this is right. The variable uncertainty comes in for free when doing curve fitting—close to the datapoints your models tend to agree, far away they can shoot off in different directions. So if you have a probability distribution over different models, applying the correction for the optimizer’s curse has the very sensible effect of telling you to stick close to the training data.
  - Steven Byrnes 17 Sep 2021 21:51 UTC
    LW: 2 AF: 1
    AF Parent
    Oh, yup, makes sense thanks
    - Charlie Steiner 17 Sep 2021 22:20 UTC
      LW: 4 AF: 2
      AF Parent
      np, I’m just glad someone is reading/commenting :)