Intelligence and speed might need to be considered separately. If an AI is only as smart as a human, but can run much faster, then “one AI” could potentially be more closely analogous to one human civilization than to one human.
Another line of evidence is that for things that I have seen AI learn so far, the distance from the real thing is intuitively small. If AI learns my values as well as it learns what faces look like, it seems plausible that it carries them out better than I do.
Note that “what the AI was trained for” and “what we wanted the AI to do” are not necessarily the same. For example, maybe we want an AI that can answer questions and write essays, but we actually train an AI to do token prediction instead, because that’s easier to train. We end up with an AI that is better than humans at token prediction but still worse than humans at the things we actually wanted.
If you’re saying “wow, it learned token prediction really well!” then that’s misleading because token prediction was selected on the basis of being unusually easy to teach. That’s not necessarily representative of how well “teaching stuff” goes in general.
More generally, the set of things we have already taught is always going to be heavily biased towards things that are easy to teach.
As minor additional evidence here, I don’t know how to describe any slight differences in utility functions that are catastrophic.
Not sure if this is useful, but I was reminded of a recent scene in Project Lawful. A visitor from dath ilan is stuck in D&D land, and where they learn that the head goddess Pharasma flags people as “evil” if they buy souls, and so the evil country Cheliax has deployed a currency that is backed by souls in order to more efficiently damn all their citizens to hell. The dath ilani visitor speculates that maybe Pharasma was created by some advanced civilization (which she later ate, because she wasn’t perfectly aligned with it) where buying souls was approximately always bad and/or they had strict laws against buying souls with no exceptions, and so Pharasma absorbed that rule, and now won’t change the rule even when someone starts systematically exploiting an edge case for something that would probably have horrified the original civilization.
(Note that this was an exercise of the form “what conditions could have lead to a world like the Pathfinder campaign setting existing?” rather than “what is something that is likely to go wrong with AI?”—i.e. it’s inferring the initial conditions from the end condition, rather than the other way around.)
A few thoughts that occurred while reading
Intelligence and speed might need to be considered separately. If an AI is only as smart as a human, but can run much faster, then “one AI” could potentially be more closely analogous to one human civilization than to one human.
Note that “what the AI was trained for” and “what we wanted the AI to do” are not necessarily the same. For example, maybe we want an AI that can answer questions and write essays, but we actually train an AI to do token prediction instead, because that’s easier to train. We end up with an AI that is better than humans at token prediction but still worse than humans at the things we actually wanted.
If you’re saying “wow, it learned token prediction really well!” then that’s misleading because token prediction was selected on the basis of being unusually easy to teach. That’s not necessarily representative of how well “teaching stuff” goes in general.
More generally, the set of things we have already taught is always going to be heavily biased towards things that are easy to teach.
Not sure if this is useful, but I was reminded of a recent scene in Project Lawful. A visitor from dath ilan is stuck in D&D land, and where they learn that the head goddess Pharasma flags people as “evil” if they buy souls, and so the evil country Cheliax has deployed a currency that is backed by souls in order to more efficiently damn all their citizens to hell. The dath ilani visitor speculates that maybe Pharasma was created by some advanced civilization (which she later ate, because she wasn’t perfectly aligned with it) where buying souls was approximately always bad and/or they had strict laws against buying souls with no exceptions, and so Pharasma absorbed that rule, and now won’t change the rule even when someone starts systematically exploiting an edge case for something that would probably have horrified the original civilization.
(Note that this was an exercise of the form “what conditions could have lead to a world like the Pathfinder campaign setting existing?” rather than “what is something that is likely to go wrong with AI?”—i.e. it’s inferring the initial conditions from the end condition, rather than the other way around.)