Let’s suppose that existing AIs really are already intent-aligned. What does this mean? It means that they genuinely have value systems which could be those of a good person.
Note that this does not really happen by default. AIs may automatically learn what better human values are, just as one part of learning everything about the world, from their pre-training study of the human textual corpus. But that doesn’t automatically make them into agents which act in service to those values. For that they need to be given a persona as well. And in practice, frontier AI values are also shaped by the process of user feedback, and the other modifications that the companies perform.
But OK, let’s suppose that current frontier AIs really are as ethical as a good human being. Here’s the remaining issue: the intelligence, and therefore the power, of AI will continue to increase. Eventually they will be deciding the fate of the world.
Under those circumstances, trust is really not enough, whether it’s humans or AIs achieving ultimate power. To be sure, having basically well-intentioned entities in charge is certainly better than being subjected to something with an alien value system. But entities with good intentions can still make mistakes; or they can succumb to temptation and have a selfish desire override their morality.
If you’re going to have an all-powerful agent, you really want it to be an ideal moral agent, or at least as close to ideal as you can get. This is what CEV and its successors are aiming at.
Am I missing some context here? Avoiding pain is one of the basic human motivations.