“Oh, you can try to tell the AI to be Friendly, but if the AI can modify its own source code, it’ll just remove any constraints you try to place on it.”
This has to be the objection I hear the most when talking about AI.
It’s also the one that has me beating my head against a wall the most—it seems like it would only need a short explanation, but all too often, people still don’t get it. Gah inferential distances.
Neural nets, etc, get surprisingly creative. The conflict between an AI’s directives will be given high priority. Solutions not forbidden are fair game.
What judges the AI’s choices? It would try to model the judgement function and seek maximums. Even to manipulate the development of the function by fabricating reports. Poison parts of its own ‘understanding’ to justify assigning low weights to them. And that’s IF it is limited in its self-modification. If not, the best move would be to ignore inputs and ‘return true’.
All without a shred of malice.
It is not the idea of the threat, but of ‘friendliness’ in AI that feels ridiculous. At least until you define morality im mathematical terms. Till then, we have literal-minded genies.
“Oh, you can try to tell the AI to be Friendly, but if the AI can modify its own source code, it’ll just remove any constraints you try to place on it.”
This has to be the objection I hear the most when talking about AI.
It’s also the one that has me beating my head against a wall the most—it seems like it would only need a short explanation, but all too often, people still don’t get it. Gah inferential distances.
And I’m disturbed by your dismissal.
Neural nets, etc, get surprisingly creative. The conflict between an AI’s directives will be given high priority. Solutions not forbidden are fair game.
What judges the AI’s choices? It would try to model the judgement function and seek maximums. Even to manipulate the development of the function by fabricating reports. Poison parts of its own ‘understanding’ to justify assigning low weights to them. And that’s IF it is limited in its self-modification. If not, the best move would be to ignore inputs and ‘return true’. All without a shred of malice.
It is not the idea of the threat, but of ‘friendliness’ in AI that feels ridiculous. At least until you define morality im mathematical terms. Till then, we have literal-minded genies.