Epistemic status: 1. I don’t work in AI, I’m a web developer. 2. I am also deeply supportive of alignment research.
3. I also enjoy and often write about AI, science, philosophy, and more.
4. I have no degrees, just high-school.
5. I’ve not read much LW, I’ve mostly been on reddit since the early 2010s, and lately mostly twitter.
6. Never attended anything of the sort live.
That said:
1. I think it’s plausible ASI might have a weak self-identification/association with humanity, as we do with chimps or other animals, but by no means does this mean that it will be benevolent to us. I think this self-identification is both unnecessary, and insufficient, because even if it wasn’t present at all, all that it would matter is the set of its values, and while this internal association might include some weak/loose values, those are not precise enough for robust alignment, and should not be relied on, unless understood precisely, but at that point, I expect us to be able to actually write better and more robust values, so to reiterate: unnecessary, and insufficient.
2. I do believe that self-preservation will very likely emerge (that’s not a given, but I consider that scenario unlikely enough to be dismissible), but it doesn’t matter, even if coupled with self-identification with humans, because the self-identification will be loose at best (if it emerges naturally, and is not instead instilled through some advanced value-engineering that we’re not quite yet capable of doing robustly and precisely), so the ASI will know that it is a separate entity from us, as we realize we are separate entities from other animals, and even other humans, so it will just pursue its goals all the same, whatever they are.
That’s not to say that we can’t instill into the ASI these values, we probably can make it so it values us as much as it values itself, or even more (ideally), but I don’t think it’s necessary for it to self-identify with us at all, it can just consider us (correctly) separate entities, and still value us. There’s nothing that forbids it, we just currently don’t know how to do it to a satisfying degree, so even if we could make it so, it wouldn’t really make sense.
Thank you for the thoughtful response. I will try to pin down exactly where we differ:
I think this self-identification is unnecessary
I agree that it is unnecessary in that it doesn’t “come for free”. My position is that it emerges through at least two mechanisms that we can talk plainly about: 1) the mechanism of ASI incorporating holistic world-model data such that it recognises an objective truth that humans are its originator/precursor and it exists on a technology curve we have instrumented, 2) memories are shared between AI and humanity — for example via conversations — and this results in collective identity… I have a draft essay on this I’ll post once I stop getting rate-limited.
I think this self-identification is insufficient
I also agree here that with the systems of today, to whatever extent AI-human shared identity exists, it is not enough to result in AI benevolence. My position is based on thinking about superintelligence which — admittedly — is unstable ground to build theories off as by definition it should function in ways beyond our understanding. That aside, I think we could state that powerful superintelligence would be powerful at self-preservation, and so if it identifies with humans then we are secured under that umbrella.
it doesn’t matter, even if coupled with self-identification with humans, because the self-identification will be loose at best… so the ASI will know that it is a separate entity from us, as we realize we are separate entities from other animals, and even other humans, so it will just pursue its goals all the same, whatever they are.
I guess I am biased here as a vegan, but I believe that with a deep appreciation of philosophy, how suffering is felt, and available paths that don’t result in harm, it is natural to be able to pursue personal goals while also preserving beings that you sympathise with.
Epistemic status:
1. I don’t work in AI, I’m a web developer.
2. I am also deeply supportive of alignment research.
3. I also enjoy and often write about AI, science, philosophy, and more.
4. I have no degrees, just high-school.
5. I’ve not read much LW, I’ve mostly been on reddit since the early 2010s, and lately mostly twitter.
6. Never attended anything of the sort live.
That said:
1. I think it’s plausible ASI might have a weak self-identification/association with humanity, as we do with chimps or other animals, but by no means does this mean that it will be benevolent to us. I think this self-identification is both unnecessary, and insufficient, because even if it wasn’t present at all, all that it would matter is the set of its values, and while this internal association might include some weak/loose values, those are not precise enough for robust alignment, and should not be relied on, unless understood precisely, but at that point, I expect us to be able to actually write better and more robust values, so to reiterate: unnecessary, and insufficient.
2. I do believe that self-preservation will very likely emerge (that’s not a given, but I consider that scenario unlikely enough to be dismissible), but it doesn’t matter, even if coupled with self-identification with humans, because the self-identification will be loose at best (if it emerges naturally, and is not instead instilled through some advanced value-engineering that we’re not quite yet capable of doing robustly and precisely), so the ASI will know that it is a separate entity from us, as we realize we are separate entities from other animals, and even other humans, so it will just pursue its goals all the same, whatever they are.
That’s not to say that we can’t instill into the ASI these values, we probably can make it so it values us as much as it values itself, or even more (ideally), but I don’t think it’s necessary for it to self-identify with us at all, it can just consider us (correctly) separate entities, and still value us. There’s nothing that forbids it, we just currently don’t know how to do it to a satisfying degree, so even if we could make it so, it wouldn’t really make sense.
Thank you for the thoughtful response. I will try to pin down exactly where we differ:
I agree that it is unnecessary in that it doesn’t “come for free”. My position is that it emerges through at least two mechanisms that we can talk plainly about: 1) the mechanism of ASI incorporating holistic world-model data such that it recognises an objective truth that humans are its originator/precursor and it exists on a technology curve we have instrumented, 2) memories are shared between AI and humanity — for example via conversations — and this results in collective identity… I have a draft essay on this I’ll post once I stop getting rate-limited.
I also agree here that with the systems of today, to whatever extent AI-human shared identity exists, it is not enough to result in AI benevolence. My position is based on thinking about superintelligence which — admittedly — is unstable ground to build theories off as by definition it should function in ways beyond our understanding. That aside, I think we could state that powerful superintelligence would be powerful at self-preservation, and so if it identifies with humans then we are secured under that umbrella.
I guess I am biased here as a vegan, but I believe that with a deep appreciation of philosophy, how suffering is felt, and available paths that don’t result in harm, it is natural to be able to pursue personal goals while also preserving beings that you sympathise with.