P.S. John, how valuable do you think it would be for someone to do an “abstraction newsletter” covering both classic and new posts on the topic, like with the alignment newsletter?
Um. Said utility function requires that you already know the true underlying value function[1].
If you already know the true underlying value function, Goodhart’s law doesn’t apply anyway. The tricky bit with Goodhart’s law is trying to find said true underlying value function in the first place—close is not good enough.
Well, strictly speaking it needs to know both the proxy and the difference between the proxy and the true underlying value function, which is sufficient to recreate the true underlying value function.
John, how valuable do you think it would be for someone to do an “abstraction newsletter” covering both classic and new posts on the topic, like with the alignment newsletter?
I could imagine that being quite valuable, though I am admittedly the most biased person one could possibly ask. Certainly there is a lot of material which would benefit from distillation.
For people who think that Goodharting is inveitable, they should read Stuart’s posts on the topic. He provides an example of a utility function and optimisation system for which Goodharting is not a big issue. Also, he notes that the fact that we fear Goodharting is a useful signal to an AI about the structure of our preferences.
P.S. John, how valuable do you think it would be for someone to do an “abstraction newsletter” covering both classic and new posts on the topic, like with the alignment newsletter?
Um. Said utility function requires that you already know the true underlying value function[1].
If you already know the true underlying value function, Goodhart’s law doesn’t apply anyway. The tricky bit with Goodhart’s law is trying to find said true underlying value function in the first place—close is not good enough.
Well, strictly speaking it needs to know both the proxy and the difference between the proxy and the true underlying value function, which is sufficient to recreate the true underlying value function.
I could imagine that being quite valuable, though I am admittedly the most biased person one could possibly ask. Certainly there is a lot of material which would benefit from distillation.