This is correct, indeed there’s a proof that so long as your errors are Gaussian or Sub-Gaussian distributions, no matter what the distribution of valuable things are, Goodhart errors do not blow up the proxy.
Similarly, there’s a proof that so long as the tails of valuable things are heavier than the tails of errors, Goodhart’s curse also cannot occur.
The key caveat is that it does assume independence, and thus only protects against Regressional goodhart, and in particular definitely requires unrealistic conditions for this theorem to work:
In more realistic settings, the most likely way to prevent Goodhart will be to either make reward functions bounded, or to use stuff like quantilizers.
This is correct, indeed there’s a proof that so long as your errors are Gaussian or Sub-Gaussian distributions, no matter what the distribution of valuable things are, Goodhart errors do not blow up the proxy.
Similarly, there’s a proof that so long as the tails of valuable things are heavier than the tails of errors, Goodhart’s curse also cannot occur.
The key caveat is that it does assume independence, and thus only protects against Regressional goodhart, and in particular definitely requires unrealistic conditions for this theorem to work:
https://www.lesswrong.com/posts/fuSaKr6t6Zuh6GKaQ/when-is-goodhart-catastrophic
https://www.lesswrong.com/posts/GdkixRevWpEanYgou/catastrophic-regressional-goodhart-appendix
In more realistic settings, the most likely way to prevent Goodhart will be to either make reward functions bounded, or to use stuff like quantilizers.