I’m assuming we’re not counting normal instrumental convergent goals as “too natural,” so our AGI can do things like gather resources, attempt to rearrange lots of matter, etc.
One fun scenario that gives weird results is someone attempting to maximize the output of a classifier trained by supervised learning. So you train something to detect when either a static pattern or some sort of dynamic system of matter is “good,” and then you try to maximize “goodness,” and then you get the universe equivalent of an adversarial example.
This leads to the weird behavior of taking certain easy-to-perceive patterns that correlate with the goodness-signal in the training data (but not all such patterns) and the AI trying as hard as it can to make those patterns as intense as possible throughout the universe.
I’m assuming we’re not counting normal instrumental convergent goals as “too natural,” so our AGI can do things like gather resources, attempt to rearrange lots of matter, etc.
One fun scenario that gives weird results is someone attempting to maximize the output of a classifier trained by supervised learning. So you train something to detect when either a static pattern or some sort of dynamic system of matter is “good,” and then you try to maximize “goodness,” and then you get the universe equivalent of an adversarial example.
This leads to the weird behavior of taking certain easy-to-perceive patterns that correlate with the goodness-signal in the training data (but not all such patterns) and the AI trying as hard as it can to make those patterns as intense as possible throughout the universe.