When you say the human decision procedure causes human values, what I hear is that the human decision procedure (and its surrounding way of describing the world) is more ontologically basic than human values (and their surrounding way if describing the world).
Our decision procedure is “the reason for our values” in the same way that the motion of electric charge in your computer is the reason it plays videogames (even though “the electric charge is moving” and “it’s playing a game” might be describing the same physical event). The arrow between them isn’t the most typical causal arrow between two peers in a singular way of describing the world, it’s an arrow of reduction/emergence, between things at different levels of abstraction.
I think I basically agree with this and think it’s right. In some ways you might say focusing too much on “values” acts like a barrier to deeper investigation of the mechanisms at work here, and I think looking deeper is necessary because I expect that optimization against the value abstraction layer alone will result in Goodharting.
When you say the human decision procedure causes human values, what I hear is that the human decision procedure (and its surrounding way of describing the world) is more ontologically basic than human values (and their surrounding way if describing the world).
Our decision procedure is “the reason for our values” in the same way that the motion of electric charge in your computer is the reason it plays videogames (even though “the electric charge is moving” and “it’s playing a game” might be describing the same physical event). The arrow between them isn’t the most typical causal arrow between two peers in a singular way of describing the world, it’s an arrow of reduction/emergence, between things at different levels of abstraction.
I think I basically agree with this and think it’s right. In some ways you might say focusing too much on “values” acts like a barrier to deeper investigation of the mechanisms at work here, and I think looking deeper is necessary because I expect that optimization against the value abstraction layer alone will result in Goodharting.