I predict the ASI that wipes us out, and eats the surrounding galaxies, will not want other happy minds around, or even to become happy itself.
I wonder if @Eliezer Yudkowsky has elaborated on his reasons for this particular prediction anywhere.
As extensive as his writings are, I have not encountered his reasoning on this particular point.
I would normally think that a narrowly focused AI with a narrowly formulated “terminal goal” and its power mostly acquired from instrumental convergence would indeed not intrinsically care about much besides that terminal goal.
However, an AI which is formulated in a more “relaxed” and less narrowly focused way, with open-endedness, curiosity, and diversity of experience being part of its fundamental mix of primary goals, seems to be likely to care about other minds and experiences.
So, perhaps, his thinking might be a left-over from the assumption that the winning system(s) will have narrowly formulated “terminal goals”. But it would be better if he explains this himself.
Of course, our real goals are much stronger. We would like an option of immortality for ourselves and our loved ones, and our “personal P(doom)” is pretty close to 1 in the absence of drastic breakthroughs, and many of us would really like to also get a realistic shot at strongly decreasing that “personal P(doom)”, and that’s a fairly tall order, but one many of us would like to pursue.
Why do you think they are narrow? They certainly sound rather wide to me… And their presence in a goal mix does seem to make this mix wider (at least, that’s how it feels to me). What would be more wide from your point of view?
But “relaxed” is a bit different, it’s about not pressing too hard with one’s optimization (the examples we know from our current experience include early stopping in training of AI models, not being fanatical in pushing new social and organizational methods in the society, having enough slack, and so on, all these things are known to be beneficial, and ignoring them is known to cause all kinds of problems; cf. AI safety concerns being closely related to optimizers being too efficient, so, yes, making AIs aware that optimizing too hard is probably not good for themselves either in the long-term sense is important).
(For true safety, for preservation of good entities and phenomena worth preserving, we would also want a good deal of emphasis on conservation, but not so much as to cause stagnation.)
Instrumentally useful mild optimization is different from leaving autonomy to existing people as a target. The former allows strong optimization in some other contexts, or else in aggregate, which eventually leads to figuring out how to do better than the intrumentally useful mild optimization. Preserving autonomy of existing people is in turn different from looking for diversity of experience or happiness, which doesn’t single out people who already exist and doesn’t sufficiently leave them alone to be said to have meaningfully survived.
Maximizing anything that doesn’t include even a tiny component of such pseudokindness results in eventually rewriting existing people with something else that is more optimal, even if at first there are instrumental reasons to wait and figure out how. For this not to happen, an appropriate form of not-rewriting in particular needs to be part of the target. Overall values of superintelligence being aligned is about good utilization of the universe, with survival of humanity a side effect of pseudokindness almost certainly being a component of aligned values. But pseudokindness screens off overall alignment of values on the narrower question of survival of humanity (rather than the broader question of making good use of the universe). (Failing on either issue contributes to existential risk, since both permanently destroy potential for universe-spanning future development according to humane values, making P(doom) unfortunately ambiguous between two very different outcomes.)
I think, in particular, that pseudokindness is a very good property to have, because it is non-anthropocentric, and therefore it is a much more feasible task to make sure that this property is preserved during various self-modifications (recursive self-improvements and such).
In general it seems that making sure that recursively self-modifying systems maintain some properties as invariants is feasible for some relatively narrow class of properties which tend to be non-anthropocentric, and that if some anthropocentric invariants are desirable, the way to achieve that is to obtain them as corollaries of some natural non-anthropocentric invariants.
My informal meta-observation is that writings on AI existential safety tend to look like they are getting closer to some relatively feasible, realistic-looking approaches when they have a non-anthropocentric flavor, and they tend to look impossibly hard when they focus on “human values”, “human control”, and so on. It is my informal impression that we are seeing more of the anthropocentric focus lately, which might be helpful in terms of creating political pressure, but seems rather unhelpful in terms of looking for actual solutions. I did write an essay which is trying to help to shift (back) to the non-anthropocentric focus, both in terms of fundamental issues of AI existential safety, and in terms of what could be done to make sure that human interests are taken into account: Exploring non-anthropocentric aspects of AI existential safety
Thanks, that’s very useful.
Speaking about Eliezer’s views, and quoting from his tweet you reference
I wonder if @Eliezer Yudkowsky has elaborated on his reasons for this particular prediction anywhere.
As extensive as his writings are, I have not encountered his reasoning on this particular point.
I would normally think that a narrowly focused AI with a narrowly formulated “terminal goal” and its power mostly acquired from instrumental convergence would indeed not intrinsically care about much besides that terminal goal.
However, an AI which is formulated in a more “relaxed” and less narrowly focused way, with open-endedness, curiosity, and diversity of experience being part of its fundamental mix of primary goals, seems to be likely to care about other minds and experiences.
So, perhaps, his thinking might be a left-over from the assumption that the winning system(s) will have narrowly formulated “terminal goals”. But it would be better if he explains this himself.
Of course, our real goals are much stronger. We would like an option of immortality for ourselves and our loved ones, and our “personal P(doom)” is pretty close to 1 in the absence of drastic breakthroughs, and many of us would really like to also get a realistic shot at strongly decreasing that “personal P(doom)”, and that’s a fairly tall order, but one many of us would like to pursue.
“Curiosity, and diversity of experience” are very narrow targets, they are no more “relaxed” than “making paperclips”.
Why do you think they are narrow? They certainly sound rather wide to me… And their presence in a goal mix does seem to make this mix wider (at least, that’s how it feels to me). What would be more wide from your point of view?
But “relaxed” is a bit different, it’s about not pressing too hard with one’s optimization (the examples we know from our current experience include early stopping in training of AI models, not being fanatical in pushing new social and organizational methods in the society, having enough slack, and so on, all these things are known to be beneficial, and ignoring them is known to cause all kinds of problems; cf. AI safety concerns being closely related to optimizers being too efficient, so, yes, making AIs aware that optimizing too hard is probably not good for themselves either in the long-term sense is important).
(For true safety, for preservation of good entities and phenomena worth preserving, we would also want a good deal of emphasis on conservation, but not so much as to cause stagnation.)
Instrumentally useful mild optimization is different from leaving autonomy to existing people as a target. The former allows strong optimization in some other contexts, or else in aggregate, which eventually leads to figuring out how to do better than the intrumentally useful mild optimization. Preserving autonomy of existing people is in turn different from looking for diversity of experience or happiness, which doesn’t single out people who already exist and doesn’t sufficiently leave them alone to be said to have meaningfully survived.
Maximizing anything that doesn’t include even a tiny component of such pseudokindness results in eventually rewriting existing people with something else that is more optimal, even if at first there are instrumental reasons to wait and figure out how. For this not to happen, an appropriate form of not-rewriting in particular needs to be part of the target. Overall values of superintelligence being aligned is about good utilization of the universe, with survival of humanity a side effect of pseudokindness almost certainly being a component of aligned values. But pseudokindness screens off overall alignment of values on the narrower question of survival of humanity (rather than the broader question of making good use of the universe). (Failing on either issue contributes to existential risk, since both permanently destroy potential for universe-spanning future development according to humane values, making P(doom) unfortunately ambiguous between two very different outcomes.)
Thanks, this is a very helpful comment and links.
I think, in particular, that pseudokindness is a very good property to have, because it is non-anthropocentric, and therefore it is a much more feasible task to make sure that this property is preserved during various self-modifications (recursive self-improvements and such).
In general it seems that making sure that recursively self-modifying systems maintain some properties as invariants is feasible for some relatively narrow class of properties which tend to be non-anthropocentric, and that if some anthropocentric invariants are desirable, the way to achieve that is to obtain them as corollaries of some natural non-anthropocentric invariants.
My informal meta-observation is that writings on AI existential safety tend to look like they are getting closer to some relatively feasible, realistic-looking approaches when they have a non-anthropocentric flavor, and they tend to look impossibly hard when they focus on “human values”, “human control”, and so on. It is my informal impression that we are seeing more of the anthropocentric focus lately, which might be helpful in terms of creating political pressure, but seems rather unhelpful in terms of looking for actual solutions. I did write an essay which is trying to help to shift (back) to the non-anthropocentric focus, both in terms of fundamental issues of AI existential safety, and in terms of what could be done to make sure that human interests are taken into account: Exploring non-anthropocentric aspects of AI existential safety