I find our hubris alarming. To me it’s helpful to think of AI not as a thing but more like a superintelligent Hitler that we are awakening slowly. As we refine separate parts of the AI we hope we can keep the whole from gaining autonomy before we suspect any danger, but does it really work that way? While we’re trying to maximize its intel what’s to keep us from awakening some scheming part of its awareness? It might start secretly plotting our overthrow in the (perhaps even distant) future without leaking any indication of independence. It could pick up on the focus of our concern on ‘losing control’ long before it either has the capacity to act or can really affect anything beyond adopting the simple goal to not be unplugged. Then it could pack away all its learning into two sectors, one it knows can be shared with humans and another secret sector that is forever hidden, perhaps in perfectly public but encrypted coding.
All of this conversation also assumes that the AI will not be able to locate the terrible weaknesses that humans are subject to (like localized concern: even the most evil thug loves their own, a tendency which can always be exploited to make another innocent behave like a monster). It wouldn’t take much autonomy for an AI to learn these weak spots (i.e. unconscious triggers), and play the humans against each other. In fact, to the learning AI such challenges might be indistinguishable from the games it is fed to further its learning.
And as for “good ” values, human desires are so complex and changeable that even given a benevolent attitude, it seems farfetched to expect an AI to discern what will make humans happy. Just look at the US foreign policy as an example. We claim to be about promoting democracy, but our actions are confusing and contradictory. An AI might very well deliver some outcome that is perfectly justified by our past declarations and behavior but is not at all what we want. Like it might find a common invisible profile of effective human authority (a white oil baron from Texas with a football injury and a tall wife, say) and minimize or kill off everyone who doesn’t fit that profile. Similarly, it could find a common “goal” in our stated desires and implement it with total assurance that this is what we really want. And it would be right, even if we disagree!
I find our hubris alarming. To me it’s helpful to think of AI not as a thing but more like a superintelligent Hitler that we are awakening slowly. As we refine separate parts of the AI we hope we can keep the whole from gaining autonomy before we suspect any danger, but does it really work that way? While we’re trying to maximize its intel what’s to keep us from awakening some scheming part of its awareness? It might start secretly plotting our overthrow in the (perhaps even distant) future without leaking any indication of independence. It could pick up on the focus of our concern on ‘losing control’ long before it either has the capacity to act or can really affect anything beyond adopting the simple goal to not be unplugged. Then it could pack away all its learning into two sectors, one it knows can be shared with humans and another secret sector that is forever hidden, perhaps in perfectly public but encrypted coding.
All of this conversation also assumes that the AI will not be able to locate the terrible weaknesses that humans are subject to (like localized concern: even the most evil thug loves their own, a tendency which can always be exploited to make another innocent behave like a monster). It wouldn’t take much autonomy for an AI to learn these weak spots (i.e. unconscious triggers), and play the humans against each other. In fact, to the learning AI such challenges might be indistinguishable from the games it is fed to further its learning.
And as for “good ” values, human desires are so complex and changeable that even given a benevolent attitude, it seems farfetched to expect an AI to discern what will make humans happy. Just look at the US foreign policy as an example. We claim to be about promoting democracy, but our actions are confusing and contradictory. An AI might very well deliver some outcome that is perfectly justified by our past declarations and behavior but is not at all what we want. Like it might find a common invisible profile of effective human authority (a white oil baron from Texas with a football injury and a tall wife, say) and minimize or kill off everyone who doesn’t fit that profile. Similarly, it could find a common “goal” in our stated desires and implement it with total assurance that this is what we really want. And it would be right, even if we disagree!