From the evolutionary moral point of view, human values are instrumental convergent values, similar to the one described by Omohundro. Human basic drives are survive, replicate and dominate in a social group, and we more or less understand how they appear as a result of natural selection. AI will also have similar convergent instrumental goals independently of its final goals.
As a result, we could have an AI with approximately the same set of “goals” as humans, if both could be viewed as a product of the similar evolutionary pressure. This produces some counterfactual “alignment”: humans want to survive, and AI wants to survive, humans want money, and AI want money, humans want to mildly wirehead their utility function by some simple and safe pleasures, and AI will search shortest ways to maximum utility.
Such counterfactual alignment is per se useless and dangerous as while AI have the same goals as humans, it has different subject of the goals, and it would play zero sum game with humans. However, some external game set could give an AI incentive to collaborate with humans, like earn money together: for example, a (weak) paperclipper have to sell something useful on the market to earn money for buying paperclips. This is obviously unstable solution as than the papercliper gains more power than humans, it could quit the game and exterminate humans. Maybe it could be made more stable via running several different AIs with different final goals or via human enhancement.
From the evolutionary moral point of view, human values are instrumental convergent values, similar to the one described by Omohundro. Human basic drives are survive, replicate and dominate in a social group, and we more or less understand how they appear as a result of natural selection. AI will also have similar convergent instrumental goals independently of its final goals.
As a result, we could have an AI with approximately the same set of “goals” as humans, if both could be viewed as a product of the similar evolutionary pressure. This produces some counterfactual “alignment”: humans want to survive, and AI wants to survive, humans want money, and AI want money, humans want to mildly wirehead their utility function by some simple and safe pleasures, and AI will search shortest ways to maximum utility.
Such counterfactual alignment is per se useless and dangerous as while AI have the same goals as humans, it has different subject of the goals, and it would play zero sum game with humans. However, some external game set could give an AI incentive to collaborate with humans, like earn money together: for example, a (weak) paperclipper have to sell something useful on the market to earn money for buying paperclips. This is obviously unstable solution as than the papercliper gains more power than humans, it could quit the game and exterminate humans. Maybe it could be made more stable via running several different AIs with different final goals or via human enhancement.