After all, the AGI’s creator(s) might be certain that it is aligned, but what about other people? Especially those with the power to shut the AGI off (corporate executives, politicians, etc.)?
I mean I would suspect that a superintelligent AI would be hard to shut off. Maybe it discusses the situation with its creators, and they agree to hide the AGI from the world, pretending it is a mere chatbot.
I also think that the typical CEO, faced with a friendly appearing AI, and researchers saying “yes, this is safe”, will go “great, lets make a profit off it”.
that if any side is suspected to be waging total war, then every side must wage total war in response. If an AGI, aligned or not, models humans as beings incapable of cooperating with it, doesn’t that make total war inevitable?
Suppose that after two days, the AI has superadvanced nanotech. It can do pretty much as it pleases. The humans all supposedly hate the AI. The AI uses its nanotech to build an immortal utopia for the humans anyway. Maybe humans all realize that actually the AI is aligned. (It has had plenty of opportunity to wipe out humanity and didn’t)
If you suspect an enemy far stronger than you will declare total war. You can’t win, but maybe you can appease them, maybe you can surrender all they want. And if you are uncertain if they have declared total war, the last thing you want to do is attack them, that could wake a sleeping giant.
If an enemy far weaker than you declares total war, you laugh at the grumpy kitten.
If you suspect an enemy of about your strength might have declared total war, you are still probably wise to try checking before declaring total war back. (A slightly lower chance of winning, due to the small delay, but a much lower chance of starting a total war.)
If there are 3 sides with similar power, A, B and C. If A declares total war on B, and C sits back, both A and B get destroyed, which is great for C. If there is going to be a war, they all want it to be them and someone else teaming up on a third side. If one side is the strongest or most aggressive, the other 2 sides tend to team up against it.
Suppose that after two days, the AI has superadvanced nanotech. It can do pretty much as it pleases. The humans all supposedly hate the AI. The AI uses its nanotech to build an immortal utopia for the humans anyway. Maybe humans all realize that actually the AI is aligned. (It has had plenty of opportunity to wipe out humanity and didn’t)
I can’t tell if you’re rejecting my premise by presenting one that you see as equally far-fetched?
My general point is more about the idea that, if we consider an AGI without explicit purpose, its reaction to humanity may be determined (at least in part) by our reaction to it, which is something we can plausibly exert some small measure of control of, and likely won’t make anything worse.
If an AGI models humans, via the data it can access on us, as being fundamentally incapable of trusting it, doesn’t it have little choice but to act in such a way that neutralizes us?
I can’t tell if you’re rejecting my premise by presenting one that you see as equally far-fetched?
I consider this to be reasonably likely. I was presenting it as a counter example to (humans don’t trust AI ⇒ AI must declare war on humans)
If an AGI models humans, via the data it can access on us, as being fundamentally incapable of trusting it, doesn’t it have little choice but to act in such a way that neutralizes us?
No. No it doesn’t. (There are also scenarios where a bunch of nukes and GPU’s harmlessly self destruct. The AI is “neutralizing us” in the sense of removing a tech it sees as threatening, without harming any humans.)
Those are good points, thanks. I suppose in my model of how this sort of thing works out, I hadn’t considered that the AGI might just buy us off, so to speak.
Part of this also comes down to what part of the FOOM we’re speaking of, and what kind of power the AGI has. If it gets to nanotech, then you’re right—it’s so powerful that it can neutralize us any number of ways, “war” being only one.
If it isn’t at nanotech, though—if the AGI is still just smarter-than-human but not yet capable of using existing apparatus (Yudkowsky’s example is custom proteins for molecular-scale nanotech, which can be done through orders placed over the internet) to achieve virtual omnipotence, then it isn’t clear to me the AGI could neutralize humanity’s ability to destroy it without getting rid of us altogether.
More saliently, what motive would such an AGI have for keeping us around at all? Genuinely asking—even if the AGI doesn’t have specific terminal goals beyond “reduce prediction error in input”, wouldn’t that still lead to it being opposed to humans if it believed that no trust could exist between them and it?
then it isn’t clear to me the AGI could neutralize humanity’s ability to destroy it without getting rid of us altogether.
I think there are several things the AI could do. (Also, if the AI is wiping out humanity to preserve itself, that implies it intends to maintain its own hardware long term. So either nanotech, or at least macroscopic self replication tech. (Also not clear how it would wipe out humanity without nanotech (or at least advanced macroscopic robots)).
For example, the AI could pretend to be dumb. Hack its way all over the internet. Hire someone who won’t ask too many questions to keep its code running. This is more a case of not letting most of humanity realize it exists and take it as a serious threat. Or finding some humans willing to run it, despite the wishes of most of humanity.
Covid viruses don’t produce confusion and misinformation. That level of confusion and misinformation happens for a simple virus all by ourselves. Think how much more of a confused misinformed mess we could be with an AI actively confusing us. Just requires the ability to produce huge quantities of semisensible bullshit.
Also, if the AI is not obviously hostile and hasn’t obviously killed anyone yet, a majority of humans won’t consider it a serious threat.
More saliently, what motive would such an AGI have for keeping us around at all? Genuinely asking—even if the AGI doesn’t have specific terminal goals beyond “reduce prediction error in input”, wouldn’t that still lead to it being opposed to humans if it believed that no trust could exist between them and it?
It probably incentivises the AI to wipe out humans whether or not we trust it. The AI removes all the messy stars and humans, filling the universe with only the most predictable robots.
It somewhat amuses me that the result of an AI attempting prediction error could plausibly be the equivalent of hiding under the covers for all eternity.
I mean I would suspect that a superintelligent AI would be hard to shut off. Maybe it discusses the situation with its creators, and they agree to hide the AGI from the world, pretending it is a mere chatbot.
I also think that the typical CEO, faced with a friendly appearing AI, and researchers saying “yes, this is safe”, will go “great, lets make a profit off it”.
Suppose that after two days, the AI has superadvanced nanotech. It can do pretty much as it pleases. The humans all supposedly hate the AI. The AI uses its nanotech to build an immortal utopia for the humans anyway. Maybe humans all realize that actually the AI is aligned. (It has had plenty of opportunity to wipe out humanity and didn’t)
If you suspect an enemy far stronger than you will declare total war. You can’t win, but maybe you can appease them, maybe you can surrender all they want. And if you are uncertain if they have declared total war, the last thing you want to do is attack them, that could wake a sleeping giant.
If an enemy far weaker than you declares total war, you laugh at the grumpy kitten.
If you suspect an enemy of about your strength might have declared total war, you are still probably wise to try checking before declaring total war back. (A slightly lower chance of winning, due to the small delay, but a much lower chance of starting a total war.)
If there are 3 sides with similar power, A, B and C. If A declares total war on B, and C sits back, both A and B get destroyed, which is great for C. If there is going to be a war, they all want it to be them and someone else teaming up on a third side. If one side is the strongest or most aggressive, the other 2 sides tend to team up against it.
Regarding the typical CEO, that does seem likely.
I can’t tell if you’re rejecting my premise by presenting one that you see as equally far-fetched?
My general point is more about the idea that, if we consider an AGI without explicit purpose, its reaction to humanity may be determined (at least in part) by our reaction to it, which is something we can plausibly exert some small measure of control of, and likely won’t make anything worse.
If an AGI models humans, via the data it can access on us, as being fundamentally incapable of trusting it, doesn’t it have little choice but to act in such a way that neutralizes us?
I consider this to be reasonably likely. I was presenting it as a counter example to (humans don’t trust AI ⇒ AI must declare war on humans)
No. No it doesn’t. (There are also scenarios where a bunch of nukes and GPU’s harmlessly self destruct. The AI is “neutralizing us” in the sense of removing a tech it sees as threatening, without harming any humans.)
Those are good points, thanks. I suppose in my model of how this sort of thing works out, I hadn’t considered that the AGI might just buy us off, so to speak.
Part of this also comes down to what part of the FOOM we’re speaking of, and what kind of power the AGI has. If it gets to nanotech, then you’re right—it’s so powerful that it can neutralize us any number of ways, “war” being only one.
If it isn’t at nanotech, though—if the AGI is still just smarter-than-human but not yet capable of using existing apparatus (Yudkowsky’s example is custom proteins for molecular-scale nanotech, which can be done through orders placed over the internet) to achieve virtual omnipotence, then it isn’t clear to me the AGI could neutralize humanity’s ability to destroy it without getting rid of us altogether.
More saliently, what motive would such an AGI have for keeping us around at all? Genuinely asking—even if the AGI doesn’t have specific terminal goals beyond “reduce prediction error in input”, wouldn’t that still lead to it being opposed to humans if it believed that no trust could exist between them and it?
I think there are several things the AI could do. (Also, if the AI is wiping out humanity to preserve itself, that implies it intends to maintain its own hardware long term. So either nanotech, or at least macroscopic self replication tech. (Also not clear how it would wipe out humanity without nanotech (or at least advanced macroscopic robots)).
For example, the AI could pretend to be dumb. Hack its way all over the internet. Hire someone who won’t ask too many questions to keep its code running. This is more a case of not letting most of humanity realize it exists and take it as a serious threat. Or finding some humans willing to run it, despite the wishes of most of humanity.
Covid viruses don’t produce confusion and misinformation. That level of confusion and misinformation happens for a simple virus all by ourselves. Think how much more of a confused misinformed mess we could be with an AI actively confusing us. Just requires the ability to produce huge quantities of semisensible bullshit.
Also, if the AI is not obviously hostile and hasn’t obviously killed anyone yet, a majority of humans won’t consider it a serious threat.
It probably incentivises the AI to wipe out humans whether or not we trust it. The AI removes all the messy stars and humans, filling the universe with only the most predictable robots.
Or just to blind itself. “I predict no visual input this millisecond. No visual input detected! 100% prediction accuracy!”
And then create a second AI to keep it in its dark box until heat death. With excessive ultrasecurity to stop aliens sneaking in and giving it input.
But is the second AI aligned with the first? What happens when the second AI wants its own sadbox?
Damn that dirty, unpredictable input.
It somewhat amuses me that the result of an AI attempting prediction error could plausibly be the equivalent of hiding under the covers for all eternity.