ADifferentAnonymous comments on The reverse Goodhart problem

ADifferentAnonymous 9 Jun 2021 20:01 UTC
1 point
Hmm… I find the scaling aspect a bit fishy (maybe an ordinal vs cardinal utility issue?). The goodness of a proxy should be measured by the actions it guides, and a V-maximizer, a log(V) maximizer and an $e^{V}$ maximizer will all take the same actions (barring uncertain outcomes).
That said, reverse Goodhart remains possible. I’d characterize it as a matter of being below a proxy’s range of validity, whereas the more familiar Goodhart problem involves ending up above it. E.g. if V = $X^{2}$ + Y, then U = X is a reverse-Goodhart proxy for V—the higher X gets, the less you’ll lose (relatively) by neglecting Y. (Though we’d have to specify some assumptions about the available actions to make that a theorem).
An intuitive example might be a game with an expert strategy and a beginner strategy—‘skill at the expert strategy’ being a reverse-Goodhart proxy for skill at the game.