Hmm. Nobody’s ever asked me to try to teach them that before, but here’s my advice:
Think about what dimensions or components success at the task will include. E.g., if you’re trying to play a song on the guitar, you might decide that a well-played song will have the correct chords played with the correct fingering and the correct rhythm.
Think about what steps are involved in each of the components of success, with an eye toward ordering those steps in terms of which steps are easiest to learn and which steps are logical prerequisites for the others. E.g., in order to learn how to play a rhythm, you first need an understanding of rhythmic concepts like beats and meters. Then, once you have a language that you can use to describe a rhythm, you need some concrete examples of rhythms, e.g., a half note followed by two quarter-notes. Then you need to translate that into the physical motions taken on the guitar, e.g., downstrokes and upstrokes with greater or lesser emphasis. Those are two different steps; first you teach the difference between a downstroke and an upstroke, and then you teach the difference between a stressed beat and an unstressed beat. You might change the order of those steps if you are working with a student who’s more comfortable with physical techniques than with language, e.g., demonstrate some rhythms first, and then only after that explain what they mean in words. In general, most values will have a vocabulary that lets you describe them, a series of examples that help you understand them, and a set of elements that constitute them; using each new word in the vocabulary and recognizing each type of example and recognizing each element and using each element is a separate step in learning the technique.
Leave some room at the end for integration, e.g., if you’ve learned rhythm and fingering and chords, you still need some time to practice using all three of those correctly at once. This may include learning how to make trade-offs among the various components, e.g., if you’ve got some very tricky fingering in one measure, maybe you simplify the chord to make that easier.
I appreciate how much detail you’ve used to lay out why you think a lack of human agency is a problem—compared to our earlier conversations, I now have a better sense of what concrete problem you’re trying to solve and why that problem might be important. I can imagine that, e.g., it’s quite difficult to tell how well you’ve fit a curve if the context in which you’re supposed to fit that curve is vulnerable to being changed in ways whose goodness or badness is difficult to specify. I look forward to reading the later posts in this sequence so that I can get a sense of exactly what technical problems are arising and how serious they are.
That said, until I see a specific technical problem that seems really threatening, I’m sticking by my opinion that it’s OK that human preferences vary with human environments, so long as (a) we have a coherent set of preferences for each individual environment, and (b) we have a coherent set of preferences about which environments we would like to be in. Right, like, in the ancestral environment I prefer to eat apples, in the modern environment I prefer to eat Doritos, and in the transhuman environment I prefer to eat simulated wafers that trigger artificial bliss. That’s fine; just make sure to check what environment I’m in before feeding me, and then select the correct food based on my environment. What do you do if you have control over my environment? No big deal, just put me in my preferred environment, which is the transhuman environment.
What happens if my preferred environment depends on the environment I’m currently inhabiting, e.g., modern me wants to migrate to the transhumanist environment, but ancestral me thinks you’re scary and just wants you to go away and leave me alone? Well, that’s an inconsistency in my preferences—but it’s no more or less problematic than any other inconsistency. If I prefer oranges when I’m holding an apple, but I prefer apples when I’m holding an orange, that’s just as annoying as the environment problem. We do need a technique for resolving problems of utility that are sensitive to initial conditions when those initial conditions appear arbitrary, but we need that technique anyway—it’s not some special feature of humans that makes that technique necessary; any beings with any type of varying preferences would need that technique in order to have their utility fully optimized.
It’s certainly worth noting that standard solutions to Goodhart’s law won’t work without modification, because human preferences vary with their environments—but at the moment such modifications seem extremely feasible to me. I don’t understand why your objections are meant to be fatal to the utility of the overall framework of Goodhart’s Law, and I hope you’ll explain that in the next post.