a quick thought about AI alignment
Epistemic status: I am new to AI alignment and still just learning the lit; please forgive if this is obvious or well-trodden ground, but I hadn’t yet come across this point so far.
“Before the law, there was no sin” (attr. Romans 5:13)
In nature, animals often do harm and cause suffering to animals of other species in the course of meeting their own needs. Many also do harm to (some) other members of the same species to advance their own (or kin) fitness. This is unsurprising from an evolutionary standpoint, and is not subject to ethical judgement.
Humans, uniquely, arrived at ethical principles to prohibit harming other humans for their own gain. Humans, uniquely, arrived at the idea that we should prohibit or limit causing harm or suffering to animals of other species for our own gain. These advances seem attributable to humans’ exceptional intelligence and the emergence of conceptual thought and rationality. It remains an uphill battle for such principles to become accepted and enacted by humans. This difficulty seems to stem from two limitations: insufficient application of rationality, and the fact that human brains are not very far diverged from our pre-human ancestors. We still experience contrary motivations and impulses baked in by millennia of selection pressure to optimally exploit all resources, including other animals and conspecifics.
So the thought occurred to me that if intelligence and rationality are what led humans to at least try to override exploitative impulses in favor of honoring rights and extending compassion to other creatures, then by extrapolation, if AGI were to become even more intelligent than humans, wouldn’t it if anything trend even further in that direction? And an intelligence with a non-biological physical substrate might not have (or more easily escape) the conflicting instincts embedded in our animal brains. In short, it occurred to me that maybe the default expectation should be that AGIs would be more aligned with (rational) human values than humans are.
This is by no means a strong argument, and I’m not at all inclined to defend this thesis. I just thought the idea was interesting to consider. I will be happy to receive pointers to where this may have been previously articulated or debated.
Yes it is very well trodden, and the https://www.alignmentforum.org/w/orthogonality-thesis tries to disagree with it. This is heavily debated and controversial still. As you say if you take moral realism seriously and build a very superhuman AI you would expect it to be more moral than us, just as it is more intelligent.
thanks for the term for this and the link
Maybe this is true and bad
useful, thanks