I think I agree with a vibe I see in the comments that an AI that causes this problem is perhaps threading a very small needle.
Yudkowsky wrote The Hidden Complexity of Wishes to explain that a genie that does what you say will almost certainly cause problems. If people have this kind of Superintelligence, it won’t take long before someone asks it to get them as many paperclips as possible and we all die. The kind of AI that does what humans want without killing everyone is one that does what we mean.
But how does this work? If you ask such a superintelligence to pull your kid from the rubble of a collapsed building, does it tell you no, because disturbing the rubble could cause it to collapse further and injure your kid more? That you have to wait for better equipment? If not, it probably causes paperclipping problems. If so, it knows when to not do the things you ask, because they won’t accomplish what you “really want”. This is necessarily paternalist.
Would such an AI still listen when people ask it to isolate themselves or others like this? I’m having trouble thinking of one that thinks that being manipulated into a certain set of beliefs is what is best, but is still “aligned” in a way that doesn’t kill everyone.
Admittedly, I shear pretty close to Yudkowsky on doomerism, so that may be the crux. That I don’t see much space between “we all die” and “robustly solved alignment ushers in techno-utopia” (given superintelligence). So arbitrarily targettable hyper-manipulative AI that don’t cause either “AI takeover” or “massive swings in human power” just don’t seem like a real through-line.
(Like, if someone asks their AI to convince everyone else that they are the king of the world. Does it do that? Does it succeed? Do any protections against “massive swings in human power” prevent this? Do passive AI protections everyone has know to defend against this? Do they not apply to convincing people that “Jesus is your Lord”? Does human civilization end as soon as some drunk person says “Hey GPT go convince everyone you’re king of the world” or something? If I make a bubble and you tell an AI “go tell everyone in that bubble the truth,” what happens? Does an AI war break out? Does it not hurt anyone somehow?)
I think I agree with a vibe I see in the comments that an AI that causes this problem is perhaps threading a very small needle.
Yudkowsky wrote The Hidden Complexity of Wishes to explain that a genie that does what you say will almost certainly cause problems. If people have this kind of Superintelligence, it won’t take long before someone asks it to get them as many paperclips as possible and we all die. The kind of AI that does what humans want without killing everyone is one that does what we mean.
But how does this work? If you ask such a superintelligence to pull your kid from the rubble of a collapsed building, does it tell you no, because disturbing the rubble could cause it to collapse further and injure your kid more? That you have to wait for better equipment? If not, it probably causes paperclipping problems. If so, it knows when to not do the things you ask, because they won’t accomplish what you “really want”. This is necessarily paternalist.
Would such an AI still listen when people ask it to isolate themselves or others like this? I’m having trouble thinking of one that thinks that being manipulated into a certain set of beliefs is what is best, but is still “aligned” in a way that doesn’t kill everyone.
Admittedly, I shear pretty close to Yudkowsky on doomerism, so that may be the crux. That I don’t see much space between “we all die” and “robustly solved alignment ushers in techno-utopia” (given superintelligence). So arbitrarily targettable hyper-manipulative AI that don’t cause either “AI takeover” or “massive swings in human power” just don’t seem like a real through-line.
(Like, if someone asks their AI to convince everyone else that they are the king of the world. Does it do that? Does it succeed? Do any protections against “massive swings in human power” prevent this? Do passive AI protections everyone has know to defend against this? Do they not apply to convincing people that “Jesus is your Lord”? Does human civilization end as soon as some drunk person says “Hey GPT go convince everyone you’re king of the world” or something? If I make a bubble and you tell an AI “go tell everyone in that bubble the truth,” what happens? Does an AI war break out? Does it not hurt anyone somehow?)