This is interesting. I wonder what a CEV-implementing AI would do with such cases. There seems to be a point where you’re inevitably going to hit the bottom of it. And in a way, this is at the same time going to be a self-fulfilling prophecy, because once you start identifying with this new image/goal of yours, it becomes your terminal value. Maybe you’d have to do separate evaluations of the preferences of all agent-moments and then formalise a distinction between “changing view based on valid input” and “changing view because of a failure of goal-preservation”. I’m not entirely sure whether such a distinction will hold up in the end.
I wonder what a CEV-implementing AI would do with such cases.
Even if it does turn out that my current conception of personal identity isn’t the same as my old one, but is rather I similar concept I adopted after realizing my values were incoherent, the AI might still find that the CEVs of my past and present selves concur. This is because, if I truly did adopt a new concept of identity because of it’s similarity to my old one, this suggests I possess some sort of meta-value that values taking my incoherent values and replacing them with coherent ones that are as similar as possible to the original. If this is the case the AI would extrapolate that meta-value and give me a nice new coherent sense of personal identity, like the one I currently possess.
Of course, if I am right and my current conception of personal identity is based on my simply figuring out what I meant all along by “identity,” then the AI would just extrapolate that.
This is because, if I truly did adopt a new concept of identity because of it’s similarity to my old one, this suggests I possess some sort of meta-value that values taking my incoherent values and replacing them with coherent ones that are as similar as possible to the original. If this is the case the AI would extrapolate that meta-value and give me a nice new coherent sense of personal identity, like the one I currently possess.
Maybe, but I doubt whether “as similar as possible” is (or can be made) uniquely denoting in all specific cases. This might sink it.
This is interesting. I wonder what a CEV-implementing AI would do with such cases. There seems to be a point where you’re inevitably going to hit the bottom of it. And in a way, this is at the same time going to be a self-fulfilling prophecy, because once you start identifying with this new image/goal of yours, it becomes your terminal value. Maybe you’d have to do separate evaluations of the preferences of all agent-moments and then formalise a distinction between “changing view based on valid input” and “changing view because of a failure of goal-preservation”. I’m not entirely sure whether such a distinction will hold up in the end.
Even if it does turn out that my current conception of personal identity isn’t the same as my old one, but is rather I similar concept I adopted after realizing my values were incoherent, the AI might still find that the CEVs of my past and present selves concur. This is because, if I truly did adopt a new concept of identity because of it’s similarity to my old one, this suggests I possess some sort of meta-value that values taking my incoherent values and replacing them with coherent ones that are as similar as possible to the original. If this is the case the AI would extrapolate that meta-value and give me a nice new coherent sense of personal identity, like the one I currently possess.
Of course, if I am right and my current conception of personal identity is based on my simply figuring out what I meant all along by “identity,” then the AI would just extrapolate that.
Maybe, but I doubt whether “as similar as possible” is (or can be made) uniquely denoting in all specific cases. This might sink it.