NickyP comments on Ban development of unpredictable powerful models?

NickyP 25 Jun 2023 12:37 UTC
LW: 3 AF: 3
0
AF
Maybe not fully understanding, but one issue I see is that without requiring “perfect prediction”, one could potentially Goodhart on on the proposal. I could imagine something like:

In training GPT-5, add a term that upweights very basic bigram statistics. In “evaluation”, use your bigram statistics table to “predict” most topk outputs just well enough to pass.

This would probably have a negative impact to performance, but this could possibly be tuned to be just sufficient to pass. Alternatively, one could use a toy model trained on the side that is easy to understand, and regularise the predictions on that instead of exactly using bigram statistics, just enough to pass the test, but still only understanding the toy model.
- TurnTrout 26 Jun 2023 18:27 UTC
  2 points
  0
  Parent
  I’m worried about Goodharting on the proposal, but don’t feel concerned by the specific avenue you propose. I think the bigram term would really dent performance, as you say.