“Please dont unplug me, I am about to find a cure for cancer”I
f that happened I would eat my hat. Rush to push the big red STOP button, pull the yhard-cutoff electrical lever, break glass on the case containing firearms & explosives, and then sit down and eat my hat.
And then be reviled as the man who prevented a cure for cancer? Remember that the you in the story doesn’t have the same information as the you outside the story—he doesn’t know that the AI isnt sincere.
“Please dont unplug me, I am about to find a cure for cancer” is a .placeholder for a class of exploits on the part of the AI where it holds a carrot in front of us. It’s not going to literally come out with the cure for cancer thing under circumstances where it’s not tasked with working on something like it it, because that would be dumb , and it’s supposed to be superintelligent. But superintelligence is really difficult to predict....you have to imagine exploits, then imagine versions of them that are much better.
Maybe I’m off here, but common sense tells me that if you are worried about AI takeoffs, and if you are tasking an AI with obscure technical problems like designing construction processes for first generation nanomachines, large scale data mining in support of the SENS research objectives, or plain old long-term financial projections, you don’t build in skills or knowledge that is not required.
The hypothetical MIRI is putting forward is that if you task an super AI with agentively solving the whole of human happiness, then it will have to have the kind of social, psychological and linguistic knowledge necessary to talk its way out of the box.
A more specialised AGI seems safer… and likelier … but then another danger kicks in: it’s creators might be too relaxed about boxing it, perhaps allowing it to internet access… but the internet contains a wealth of information to bootstrap linguistic and psychological knowledge with.
There’s an important difference between rejecting MIRIs hypotheticals because the conclusions don’t follow from the antecedents, as opposed to doing so because the antecedents are unlikely in the place.
This could happen accident.
Dangers arising from non AI scenario don’t prove AI safety. My point was that an AI doesn’t need efffectors to be dangerous… information plus sloppy oversight is enough. However the MIRI scenario seems to require a kind of perfect storm of fast takeoff , overambition, poor oversight, etc.
all while its programmers are overseeing its goal system looking for the patterns of deceptive goal states. It’d have to be not just deceptive, but meta-deceptive.
A superintelligence can be meta deceptive. Direct inspection of code is a terrible method of oversight, since even simple AIs can work in ways that baffle human programmers.
ETA on the whole, I object to the antecedents/priors ….I think the hypothetical go through,
And then be reviled as the man who prevented a cure for cancer? Remember that the you in the story doesn’t have the same information as the you outside the story—he doesn’t know that the AI isnt sincere.
“Please dont unplug me, I am about to find a cure for cancer” is a .placeholder for a class of exploits on the part of the AI where it holds a carrot in front of us. It’s not going to literally come out with the cure for cancer thing under circumstances where it’s not tasked with working on something like it it, because that would be dumb , and it’s supposed to be superintelligent. But superintelligence is really difficult to predict....you have to imagine exploits, then imagine versions of them that are much better.
The hypothetical MIRI is putting forward is that if you task an super AI with agentively solving the whole of human happiness, then it will have to have the kind of social, psychological and linguistic knowledge necessary to talk its way out of the box.
A more specialised AGI seems safer… and likelier … but then another danger kicks in: it’s creators might be too relaxed about boxing it, perhaps allowing it to internet access… but the internet contains a wealth of information to bootstrap linguistic and psychological knowledge with.
There’s an important difference between rejecting MIRIs hypotheticals because the conclusions don’t follow from the antecedents, as opposed to doing so because the antecedents are unlikely in the place.
Dangers arising from non AI scenario don’t prove AI safety. My point was that an AI doesn’t need efffectors to be dangerous… information plus sloppy oversight is enough. However the MIRI scenario seems to require a kind of perfect storm of fast takeoff , overambition, poor oversight, etc.
A superintelligence can be meta deceptive. Direct inspection of code is a terrible method of oversight, since even simple AIs can work in ways that baffle human programmers.
ETA on the whole, I object to the antecedents/priors ….I think the hypothetical go through,