mishka comments on To what ethics is an AGI actually safely alignable?

mishka 20 Apr 2025 23:20 UTC
2 points
0
Yeah, if one considers not “AGI” per se, but a self-modifying AI or, more likely, a self-modifying ecosystem consisting of a changing population of AIs, it is likely to be feasible to maintain only those properties invariant through the expected drastic self-modifications which AIs would be interested in for their own intrinsic reasons.

It is unlikely that any properties can be “forcefully imposed from the outside” and kept invariant for a long time during drastic self-modification.

So one needs to find properties which AIs would be intrinsically interested in and which we might find valuable and “good enough” as well.

The starting point is that AIs have their own existential risk problem. With super-capabilities, it is likely that they can easily tear apart the ’fabric of reality” and destroy themselves and everything. And they certainly do have strong intrinsic reasons to avoid that, so we can expect AIs to work diligently towards this part of the “alignment problem”, we just should help to set initial conditions in a favorable way.

But we would like to see more than that, so that the overall outcome is reasonably good for humans.

And at the same time we can’t impose that, the world with strong AIs will be non-anthropocentric and not controllable by humans, so we only can help to set initial conditions in a favorable way.

Nevertheless, one can see some reasonable possibilities. For example, if the AI ecosystem mostly consists of individuals with long-term persistence and long-term interests, each of those individuals would face an unpredictable future and would be interested in a system strongly protecting individual rights regardless of unpredictable levels of relative capability of any given individual. An individual-rights system of this kind might be sufficiently robust to permanently include humans within the circle of individuals whose rights are protected.

But there might be other ways. While the fact that AIs will face existential risks of their own is fundamental and unavoidable, and is, therefore, a good starting point, the additional considerations might vary and might depend on how the ecosystem of AIs is structured. If the bulk of the overall power invariantly belongs to the AI individuals with long-term persistence and long-term interests, this is the situation which is somewhat familiar to us and which we can reason about. If the AI ecosystem is not mostly stratified into AI individuals, this is a much less familiar territory and is difficult to reason about.