Speaking as a character, I too think the player can just go jump in a lake.
My response to this post is to think about something else instead, so if you’ll excuse me getting on a hobby horse...
I agree that when we look at someone making bizarre rationalizations, “their values” are not represented consciously, and we have to jump to a different level to find human values. But I think that conscious->unconscious is the wrong level jump to make.
Instead, the jump I’ve been thinking about recently is to our own model of their behavior. In this case, our explanation of their behavior relies on the unconscious mind, but in other cases, I predict that we’ll identify values with conscious desires when that is a more parsimonious explanation of behavior. An AI learning human values would then not merely be modeling humans, but modeling humans’ models of humans. But I think it might be okay if it makes those models out of completely alien concepts (at least outside of deliberately self-referential special cases—there might be an analogy here to the recursive modeling of Gricean communication).
Speaking as a character, I too think the player can just go jump in a lake.
My response to this post is to think about something else instead, so if you’ll excuse me getting on a hobby horse...
I agree that when we look at someone making bizarre rationalizations, “their values” are not represented consciously, and we have to jump to a different level to find human values. But I think that conscious->unconscious is the wrong level jump to make.
Instead, the jump I’ve been thinking about recently is to our own model of their behavior. In this case, our explanation of their behavior relies on the unconscious mind, but in other cases, I predict that we’ll identify values with conscious desires when that is a more parsimonious explanation of behavior. An AI learning human values would then not merely be modeling humans, but modeling humans’ models of humans. But I think it might be okay if it makes those models out of completely alien concepts (at least outside of deliberately self-referential special cases—there might be an analogy here to the recursive modeling of Gricean communication).