Noosphere89 comments on satchlj’s Shortform

Noosphere89 6 May 2025 20:29 UTC
5 points
0
This is correct, indeed there’s a proof that so long as your errors are Gaussian or Sub-Gaussian distributions, no matter what the distribution of valuable things are, Goodhart errors do not blow up the proxy.
Similarly, there’s a proof that so long as the tails of valuable things are heavier than the tails of errors, Goodhart’s curse also cannot occur.
The key caveat is that it does assume independence, and thus only protects against Regressional goodhart, and in particular definitely requires unrealistic conditions for this theorem to work:
https://www.lesswrong.com/posts/fuSaKr6t6Zuh6GKaQ/when-is-goodhart-catastrophic
https://www.lesswrong.com/posts/GdkixRevWpEanYgou/catastrophic-regressional-goodhart-appendix
~~In more realistic settings, the most likely way to prevent Goodhart will be to either make reward functions bounded, or to use stuff like quantilizers.~~