You know, I feel like trying to avoid Goodhart divergences may be neglecting the underlying principle/agent alignment problem in pursuit of better results on one specific metric.
You know, I feel like trying to avoid Goodhart divergences may be neglecting the underlying principle/agent alignment problem in pursuit of better results on one specific metric.