Jeremy Gillen comments on Soft optimization makes the value target bigger

Jeremy Gillen 3 Jan 2023 10:20 UTC
6 points
0
Good point, policies that have upward errors will still be preferentially selected for (a little). However, with this approach, the amount of Goodharting should be constant as the proxy quality (and hence optimization power) scales up.
I agree with your second point, although I think there’s a slight benefit over original quantilizers because $q$ is set theoretically, rather than arbitrarily by hand. Hopefully this makes it less tempting to mess with it.