Now, once someone realises this, he/she can either choose to group all the consciousness-moments together that trigger an intuitive notion of “same person” and care about that, even though it is now different from what they thought it was
I don’t know if your analysis is right or not, but I can tell you that that isn’t what it felt like I was doing when I was developing my concepts of personal identity and preferences. What it felt like I was doing was elucidating a concept I already cared about, and figured out exactly what I meant when I said “same person” and “personal identity.” When I thought about what such concepts mean I felt a thrill of discovery, like I was learning something new about myself I had never articulated before.
It might be that you are right and that my feelings are illusory, that what I was really doing was realizing a concept I cared about was incoherent and reaching about until I found a concept that was similar, but coherent. But I can tell you that’s not what it felt like.
EDIT: Let me make an analogy. Ancient people had some weird ideas about the concept of “strength.” They thought that it was somehow separate from the body of a person, and could be transferred by magic, or by eating a strong person or animal. Now, of course, we understand that that is not how strength works. It is caused by the complex interaction of a system of muscles, bones, tendons, and nerves, and you can’t transfer that complex system from one entity to another without changing many of the properties of the entity you’re sending it to.
Now, considering that fact, would you say that ancient people didn’t want anything coherent when they said they wanted to be strong? I don’t think so. They were mistaken about some aspects about how strength works, but they were working from a coherent concept. Once they understood how strength worked better they didn’t consider their previous desire for strength to be wrong.
I see personal identity as somewhat analagous to that. We had some weird ideas about it in the past, like that it was detached from physical matter. But I think that people have always cared about how they are going to change from one moment to the next, and had concrete preferences about it. And I think when I refined my concepts of personal identity I was making preferences I already had more explicit, not swapping out some incoherent preferences and replacing them with similar coherent ones.
I’m 100% sure that there is something I mean by “suffering”, and that it matters. I’m only maybe 10-20% sure that I’d also want to care about preferences if I knew everything there is to know.
I am 100% certain that there are things I want to do that will make me suffer (learning unpleasant truths for instance), but that I want to do anyway, because that is what I prefer to do.
Suffering seems relevant to me too. But I have to admit, sometimes when something is making me suffer, what dominates my thoughts is not a desire for it to stop, but rather annoyance that this suffering is disrupting my train of thought and making it hard for me to think and get the goals I have set for myself accomplished. And I’m not talking about mild suffering, the example in particular that I am thinking of is throwing up two days after having my entire abdomen cut open and sewn back together.
This is interesting. I wonder what a CEV-implementing AI would do with such cases. There seems to be a point where you’re inevitably going to hit the bottom of it. And in a way, this is at the same time going to be a self-fulfilling prophecy, because once you start identifying with this new image/goal of yours, it becomes your terminal value. Maybe you’d have to do separate evaluations of the preferences of all agent-moments and then formalise a distinction between “changing view based on valid input” and “changing view because of a failure of goal-preservation”. I’m not entirely sure whether such a distinction will hold up in the end.
I wonder what a CEV-implementing AI would do with such cases.
Even if it does turn out that my current conception of personal identity isn’t the same as my old one, but is rather I similar concept I adopted after realizing my values were incoherent, the AI might still find that the CEVs of my past and present selves concur. This is because, if I truly did adopt a new concept of identity because of it’s similarity to my old one, this suggests I possess some sort of meta-value that values taking my incoherent values and replacing them with coherent ones that are as similar as possible to the original. If this is the case the AI would extrapolate that meta-value and give me a nice new coherent sense of personal identity, like the one I currently possess.
Of course, if I am right and my current conception of personal identity is based on my simply figuring out what I meant all along by “identity,” then the AI would just extrapolate that.
This is because, if I truly did adopt a new concept of identity because of it’s similarity to my old one, this suggests I possess some sort of meta-value that values taking my incoherent values and replacing them with coherent ones that are as similar as possible to the original. If this is the case the AI would extrapolate that meta-value and give me a nice new coherent sense of personal identity, like the one I currently possess.
Maybe, but I doubt whether “as similar as possible” is (or can be made) uniquely denoting in all specific cases. This might sink it.
I don’t know if your analysis is right or not, but I can tell you that that isn’t what it felt like I was doing when I was developing my concepts of personal identity and preferences. What it felt like I was doing was elucidating a concept I already cared about, and figured out exactly what I meant when I said “same person” and “personal identity.” When I thought about what such concepts mean I felt a thrill of discovery, like I was learning something new about myself I had never articulated before.
It might be that you are right and that my feelings are illusory, that what I was really doing was realizing a concept I cared about was incoherent and reaching about until I found a concept that was similar, but coherent. But I can tell you that’s not what it felt like.
EDIT: Let me make an analogy. Ancient people had some weird ideas about the concept of “strength.” They thought that it was somehow separate from the body of a person, and could be transferred by magic, or by eating a strong person or animal. Now, of course, we understand that that is not how strength works. It is caused by the complex interaction of a system of muscles, bones, tendons, and nerves, and you can’t transfer that complex system from one entity to another without changing many of the properties of the entity you’re sending it to.
Now, considering that fact, would you say that ancient people didn’t want anything coherent when they said they wanted to be strong? I don’t think so. They were mistaken about some aspects about how strength works, but they were working from a coherent concept. Once they understood how strength worked better they didn’t consider their previous desire for strength to be wrong.
I see personal identity as somewhat analagous to that. We had some weird ideas about it in the past, like that it was detached from physical matter. But I think that people have always cared about how they are going to change from one moment to the next, and had concrete preferences about it. And I think when I refined my concepts of personal identity I was making preferences I already had more explicit, not swapping out some incoherent preferences and replacing them with similar coherent ones.
I am 100% certain that there are things I want to do that will make me suffer (learning unpleasant truths for instance), but that I want to do anyway, because that is what I prefer to do.
Suffering seems relevant to me too. But I have to admit, sometimes when something is making me suffer, what dominates my thoughts is not a desire for it to stop, but rather annoyance that this suffering is disrupting my train of thought and making it hard for me to think and get the goals I have set for myself accomplished. And I’m not talking about mild suffering, the example in particular that I am thinking of is throwing up two days after having my entire abdomen cut open and sewn back together.
This is interesting. I wonder what a CEV-implementing AI would do with such cases. There seems to be a point where you’re inevitably going to hit the bottom of it. And in a way, this is at the same time going to be a self-fulfilling prophecy, because once you start identifying with this new image/goal of yours, it becomes your terminal value. Maybe you’d have to do separate evaluations of the preferences of all agent-moments and then formalise a distinction between “changing view based on valid input” and “changing view because of a failure of goal-preservation”. I’m not entirely sure whether such a distinction will hold up in the end.
Even if it does turn out that my current conception of personal identity isn’t the same as my old one, but is rather I similar concept I adopted after realizing my values were incoherent, the AI might still find that the CEVs of my past and present selves concur. This is because, if I truly did adopt a new concept of identity because of it’s similarity to my old one, this suggests I possess some sort of meta-value that values taking my incoherent values and replacing them with coherent ones that are as similar as possible to the original. If this is the case the AI would extrapolate that meta-value and give me a nice new coherent sense of personal identity, like the one I currently possess.
Of course, if I am right and my current conception of personal identity is based on my simply figuring out what I meant all along by “identity,” then the AI would just extrapolate that.
Maybe, but I doubt whether “as similar as possible” is (or can be made) uniquely denoting in all specific cases. This might sink it.