alkexr comments on The reverse Goodhart problem

alkexr 9 Jun 2021 12:53 UTC
2 points
I ended up using mathematical language because I found it really difficult to articulate my intuitions. My intuition told me that something like this had to be true mathematically, but the fact that you don’t seem to know about it makes me consider this significantly less likely.
If we have a collection of variables ${v}$ , and $V = max (v)$ , then $V$ is positively correlated in practice with most $U$ expressed simply in terms of the variables.
Yes, but $V$ also happens to be very strongly correlated with most $U$ that are equal to $V$ . That’s where you do the cheating. Goodhart’s law, as I understand it, isn’t a claim about any single proxy-goal pair. That would be equivalent to claiming that “there are no statistical regularities, period”. Rather, it’s a claim about the nature of the set of all potential proxies.
In a Bayesian language, Goodhart’s law sets the prior probability of any seemingly good proxy being a good proxy, which is virtually 0. If you have additional evidence, like knowing that your proxy can be expressed in a simple way using your goal, then obviously the probabilities are going to shift.
And that’s how your $V$ and $V^{'}$ are different. In the case of $V$ , the selection of $U$ is arbitrary. In the case of $V^{'}$ , the selection of $U$ isn’t arbitrary, because it was already fixed when you selected $V^{'}$ . But again, if you select a seemingly good proxy $U^{'}$ at random, it won’t be an actually good proxy.