Some values don’t change. Maybe sometimes that’s because a system isn’t “goal seeking.” For example, AlphaZero doesn’t change its value of “board-state = win.” (Thankfully! Because if that changed to “board-state = not lose,” then a reasonable instrumental goal might be to just kill its opponent.)
But I’m a goal seeking system. Shard theory seems to posit terminal values that constantly pop up and vy for position in humans like me. But certain values of mine seem impossible to change. Like, I can’t decide to desire my own misery/pain/suffering.
So if terminal values aren’t static, what about the values that are? Are these even more terminal? Or is it something else?
We had many posts trying to answer this question. One of the candidates is the master-slave model of the master setting the slave’s shorter-term allegedly terminal values so that the slave did actions satisfying the master’s longer-term values, which are approval from a circle of people, health, sex, power and proxies like sweet food.
That being said, approval from a circle of people is itself hard to define. For example, it could change with a change of the circle. Or the role of the circle could be played by media forming the user’s opinions on some subjects with no feedback from the user. An additional value could be consistence of the worldview with lived experiences...
Thanks for the links! In addition to Shard Theory, I have seen Steven’s work and it is helpful. Both approaches seem to suggest human terminal values change...I don’t know what they’d say about the idea that some (human) terminal values are unchanging.
If Evolution is the master and humans are the slave in Wei Dai’s model, that seems to suggest that we don’t have unchangeable terminal values. But while the concept makes sense at the evolutionary scale, it doesn’t make sense to me that it implies within-lifespan terminal value changeability (or really any values...if I want pizza for dinner, evolution can’t suddenly make me want burgers). What do you think?