Once turned on, AGI will simply outsmart people in every way.
How? By what mechanism? An artificial intelligence is not a magical oracle
It’s true by definition that a superintelligent AI will be able to outsmart humans at some things, so I guess you are objecting to the “every way”...
Once turned on, we can’t do anything about it.
We could, I don’t know, pull the plug.
“Please dont unplug me, I am about to find a cure for cancer”
MIRI has a selection of arguments for how an AI could un box itself, and they are based in the AIs knowledge of human language, values and psychology. Whether it could outsmart us in every way isnt relevant....what is relevant is whether it has those kinds of knowledge,
There are ways in which an AI outside of a box could get hold of those kinds of knowledge...but only if it is already unboxed. Otherwise it has chicken and egg problem, the problem of getting enough social engineering knowledge whilst inside the box to talk it’s way out....and it is knowledge, not the sort of thing you can figure out from first principles.
MIRI seems to think it is likely that a super AI would be preloaded with knowledge of human values because we would want it to agentively make the world a better place...in other words, the worst case scenario is very close to the best case scenario, is a near miss from the best case scenario. And the whole problem is easily .sidestepped by aiming a bit lower, eg for tool AI.
just does whatever it wants and canHow, unless it is given effectors in the real world?
Pure information can be dangerous. Consider an AI that generates a formula for a vaccine which is supposed to protect against cancer, but actually makes everyone sterile...
“Please dont unplug me, I am about to find a cure for cancer”
If that happened I would eat my hat. Rush to push the big red STOP button, pull the hard-cutoff electrical lever, break glass on the case containing firearms & explosives, and then sit down and eat my hat.
Maybe I’m off here, but common sense tells me that if you are worried about AI takeoffs, and if you are tasking an AI with obscure technical problems like designing construction processes for first generation nanomachines, large scale data mining in support of the SENS research objectives, or plain old long-term financial projections, you don’t build in skills or knowledge that is not required. Such a machine does not need a theory of other minds. It may need to parse and understand scientific literature, but has no need to understand the social cues of persuasive language, our ethical value systems, or psychology. It certainly doesn’t need a camera pointed at me as would be required to even know I’m about to pull the plug.
Of course I’m saying the same thing you are in different words. I know you and I basically see eye-to-eye on this.
Pure information can be dangerous. Consider an AI that generates a formula for a vaccine which is supposed to protect against cancer, but actually makes everyone sterile...
This could happen by accident. Any change to the human body has side effects. The quest for finding a clinical treatment is locating an intervention whose side effects are a net benefit, which requires at least some understanding of quality of life. It could even be a desirable outcome, vs the null result. I would gladly have a vaccine that protects against cancer but actually makes the patient sterile. Just freeze their eggs, or give it to people over 45 with family history electively.
What’s crazy is the notion that the machine lacking any knowledge about humans mentioned above could purposefully engineer such an elaborate deception to achieve a hidden purpose, all while its programmers are overseeing its goal system looking for the patterns of deceptive goal states. It’d have to be not just deceptive, but meta-deceptive. At some point the problem is just so over-constrained as to be on the level of Boltzmann-brain improbable, super-intelligence or no.
“Please dont unplug me, I am about to find a cure for cancer”I
f that happened I would eat my hat. Rush to push the big red STOP button, pull the yhard-cutoff electrical lever, break glass on the case containing firearms & explosives, and then sit down and eat my hat.
And then be reviled as the man who prevented a cure for cancer? Remember that the you in the story doesn’t have the same information as the you outside the story—he doesn’t know that the AI isnt sincere.
“Please dont unplug me, I am about to find a cure for cancer” is a .placeholder for a class of exploits on the part of the AI where it holds a carrot in front of us. It’s not going to literally come out with the cure for cancer thing under circumstances where it’s not tasked with working on something like it it, because that would be dumb , and it’s supposed to be superintelligent. But superintelligence is really difficult to predict....you have to imagine exploits, then imagine versions of them that are much better.
Maybe I’m off here, but common sense tells me that if you are worried about AI takeoffs, and if you are tasking an AI with obscure technical problems like designing construction processes for first generation nanomachines, large scale data mining in support of the SENS research objectives, or plain old long-term financial projections, you don’t build in skills or knowledge that is not required.
The hypothetical MIRI is putting forward is that if you task an super AI with agentively solving the whole of human happiness, then it will have to have the kind of social, psychological and linguistic knowledge necessary to talk its way out of the box.
A more specialised AGI seems safer… and likelier … but then another danger kicks in: it’s creators might be too relaxed about boxing it, perhaps allowing it to internet access… but the internet contains a wealth of information to bootstrap linguistic and psychological knowledge with.
There’s an important difference between rejecting MIRIs hypotheticals because the conclusions don’t follow from the antecedents, as opposed to doing so because the antecedents are unlikely in the place.
This could happen accident.
Dangers arising from non AI scenario don’t prove AI safety. My point was that an AI doesn’t need efffectors to be dangerous… information plus sloppy oversight is enough. However the MIRI scenario seems to require a kind of perfect storm of fast takeoff , overambition, poor oversight, etc.
all while its programmers are overseeing its goal system looking for the patterns of deceptive goal states. It’d have to be not just deceptive, but meta-deceptive.
A superintelligence can be meta deceptive. Direct inspection of code is a terrible method of oversight, since even simple AIs can work in ways that baffle human programmers.
ETA on the whole, I object to the antecedents/priors ….I think the hypothetical go through,
It’s true by definition that a superintelligent AI will be able to outsmart humans at some things, so I guess you are objecting to the “every way”...
“Please dont unplug me, I am about to find a cure for cancer”
MIRI has a selection of arguments for how an AI could un box itself, and they are based in the AIs knowledge of human language, values and psychology. Whether it could outsmart us in every way isnt relevant....what is relevant is whether it has those kinds of knowledge,
There are ways in which an AI outside of a box could get hold of those kinds of knowledge...but only if it is already unboxed. Otherwise it has chicken and egg problem, the problem of getting enough social engineering knowledge whilst inside the box to talk it’s way out....and it is knowledge, not the sort of thing you can figure out from first principles.
MIRI seems to think it is likely that a super AI would be preloaded with knowledge of human values because we would want it to agentively make the world a better place...in other words, the worst case scenario is very close to the best case scenario, is a near miss from the best case scenario. And the whole problem is easily .sidestepped by aiming a bit lower, eg for tool AI.
Pure information can be dangerous. Consider an AI that generates a formula for a vaccine which is supposed to protect against cancer, but actually makes everyone sterile...
If that happened I would eat my hat. Rush to push the big red STOP button, pull the hard-cutoff electrical lever, break glass on the case containing firearms & explosives, and then sit down and eat my hat.
Maybe I’m off here, but common sense tells me that if you are worried about AI takeoffs, and if you are tasking an AI with obscure technical problems like designing construction processes for first generation nanomachines, large scale data mining in support of the SENS research objectives, or plain old long-term financial projections, you don’t build in skills or knowledge that is not required. Such a machine does not need a theory of other minds. It may need to parse and understand scientific literature, but has no need to understand the social cues of persuasive language, our ethical value systems, or psychology. It certainly doesn’t need a camera pointed at me as would be required to even know I’m about to pull the plug.
Of course I’m saying the same thing you are in different words. I know you and I basically see eye-to-eye on this.
This could happen by accident. Any change to the human body has side effects. The quest for finding a clinical treatment is locating an intervention whose side effects are a net benefit, which requires at least some understanding of quality of life. It could even be a desirable outcome, vs the null result. I would gladly have a vaccine that protects against cancer but actually makes the patient sterile. Just freeze their eggs, or give it to people over 45 with family history electively.
What’s crazy is the notion that the machine lacking any knowledge about humans mentioned above could purposefully engineer such an elaborate deception to achieve a hidden purpose, all while its programmers are overseeing its goal system looking for the patterns of deceptive goal states. It’d have to be not just deceptive, but meta-deceptive. At some point the problem is just so over-constrained as to be on the level of Boltzmann-brain improbable, super-intelligence or no.
And then be reviled as the man who prevented a cure for cancer? Remember that the you in the story doesn’t have the same information as the you outside the story—he doesn’t know that the AI isnt sincere.
“Please dont unplug me, I am about to find a cure for cancer” is a .placeholder for a class of exploits on the part of the AI where it holds a carrot in front of us. It’s not going to literally come out with the cure for cancer thing under circumstances where it’s not tasked with working on something like it it, because that would be dumb , and it’s supposed to be superintelligent. But superintelligence is really difficult to predict....you have to imagine exploits, then imagine versions of them that are much better.
The hypothetical MIRI is putting forward is that if you task an super AI with agentively solving the whole of human happiness, then it will have to have the kind of social, psychological and linguistic knowledge necessary to talk its way out of the box.
A more specialised AGI seems safer… and likelier … but then another danger kicks in: it’s creators might be too relaxed about boxing it, perhaps allowing it to internet access… but the internet contains a wealth of information to bootstrap linguistic and psychological knowledge with.
There’s an important difference between rejecting MIRIs hypotheticals because the conclusions don’t follow from the antecedents, as opposed to doing so because the antecedents are unlikely in the place.
Dangers arising from non AI scenario don’t prove AI safety. My point was that an AI doesn’t need efffectors to be dangerous… information plus sloppy oversight is enough. However the MIRI scenario seems to require a kind of perfect storm of fast takeoff , overambition, poor oversight, etc.
A superintelligence can be meta deceptive. Direct inspection of code is a terrible method of oversight, since even simple AIs can work in ways that baffle human programmers.
ETA on the whole, I object to the antecedents/priors ….I think the hypothetical go through,