Curated. Goodhart’s Law is an old core concept for LessWrong, and I love when someone(s) come along and add more resolution and rigor to our understanding, and all the more so when they start pointing to how this has practical implications. Would be very cool if this leads to articulation of disagreements between people that allow for progress in the discussion there, e.g. John vs Paul, Jan, etc.
And extra bonus points for exercises at the end too. All in all, good stuff, looking forward to seeing more – especially the results as your vary more of the assumptions (e.g. independence) to line up more with scenarios we anticipate in, e.g. Alignment scenarios.
Curated. Goodhart’s Law is an old core concept for LessWrong, and I love when someone(s) come along and add more resolution and rigor to our understanding, and all the more so when they start pointing to how this has practical implications. Would be very cool if this leads to articulation of disagreements between people that allow for progress in the discussion there, e.g. John vs Paul, Jan, etc.
And extra bonus points for exercises at the end too. All in all, good stuff, looking forward to seeing more – especially the results as your vary more of the assumptions (e.g. independence) to line up more with scenarios we anticipate in, e.g. Alignment scenarios.