I think, in particular, that pseudokindness is a very good property to have, because it is non-anthropocentric, and therefore it is a much more feasible task to make sure that this property is preserved during various self-modifications (recursive self-improvements and such).
In general it seems that making sure that recursively self-modifying systems maintain some properties as invariants is feasible for some relatively narrow class of properties which tend to be non-anthropocentric, and that if some anthropocentric invariants are desirable, the way to achieve that is to obtain them as corollaries of some natural non-anthropocentric invariants.
My informal meta-observation is that writings on AI existential safety tend to look like they are getting closer to some relatively feasible, realistic-looking approaches when they have a non-anthropocentric flavor, and they tend to look impossibly hard when they focus on “human values”, “human control”, and so on. It is my informal impression that we are seeing more of the anthropocentric focus lately, which might be helpful in terms of creating political pressure, but seems rather unhelpful in terms of looking for actual solutions. I did write an essay which is trying to help to shift (back) to the non-anthropocentric focus, both in terms of fundamental issues of AI existential safety, and in terms of what could be done to make sure that human interests are taken into account: Exploring non-anthropocentric aspects of AI existential safety
Thanks, this is a very helpful comment and links.
I think, in particular, that pseudokindness is a very good property to have, because it is non-anthropocentric, and therefore it is a much more feasible task to make sure that this property is preserved during various self-modifications (recursive self-improvements and such).
In general it seems that making sure that recursively self-modifying systems maintain some properties as invariants is feasible for some relatively narrow class of properties which tend to be non-anthropocentric, and that if some anthropocentric invariants are desirable, the way to achieve that is to obtain them as corollaries of some natural non-anthropocentric invariants.
My informal meta-observation is that writings on AI existential safety tend to look like they are getting closer to some relatively feasible, realistic-looking approaches when they have a non-anthropocentric flavor, and they tend to look impossibly hard when they focus on “human values”, “human control”, and so on. It is my informal impression that we are seeing more of the anthropocentric focus lately, which might be helpful in terms of creating political pressure, but seems rather unhelpful in terms of looking for actual solutions. I did write an essay which is trying to help to shift (back) to the non-anthropocentric focus, both in terms of fundamental issues of AI existential safety, and in terms of what could be done to make sure that human interests are taken into account: Exploring non-anthropocentric aspects of AI existential safety