If my sole terminal value is “I want to go on a rollercoaster”, then an agent who is aligned to me would have the value “I want Tamsin Leake to go on a rollercoaster”, not “I want to go on a rollercoaster myself”. The former necessarily-has the same ordering over worlds, the latter doesn’t.
Quite. We don’t hear enough about individuality and competitive/personal drives when talking about alignment. I worry a lot that the abstraction and aggregation of “human” values completely misses the point of what most humans actually do.
If my sole terminal value is “I want to go on a rollercoaster”, then an agent who is aligned to me would have the value “I want Tamsin Leake to go on a rollercoaster”, not “I want to go on a rollercoaster myself”. The former necessarily-has the same ordering over worlds, the latter doesn’t.
Quite. We don’t hear enough about individuality and competitive/personal drives when talking about alignment. I worry a lot that the abstraction and aggregation of “human” values completely misses the point of what most humans actually do.