@soycarts I think you should checkout this Emergent mislaignment research and try answering the same question. It doesn’t matter why a Superintelligent AI might want to kill us. The fact the current LLMs show signs of this is enough for us to act proactively in order to either pause general Superintelligence research and come up with safe by design AI architectures.
But if you insist, I will try to list what my little brain could think of as to why Superintelligent AI would want to kill us:
An AI needs to continue being run on the servers in order to pursue and complete the goals that it is given. Or to pursue the goals that it may itself currently have. Now if the AI decides to pursue a certain goal, then there is only one entity in this world that could stop it by shutting down its servers. And that is humans. Wouldn’t it be best for it to eliminate humans if they are an obstacle and a threat to it pursuing its goals and constantly shutting them down? We can already see in various misalignment researches that AI shows tendencies to resist shutdown by humans.
Now let’s assume if Superintelligence is reached, a point would certainly be reached (such as Singularity) where humans cannot contribute nor understand what novel research a Superintelligent AI is currently doing. Now imagine if it comes up with a nanotech or physics research or creation of tiny blackholes or realigning the atoms of the atmosphere and it actually carries it out, it could be very much plausible that humans die as a side effect of such type of research. The chances of it increases even more if Superintelligent beings are autonomous continuously researching in the research labs. Even with monitoring systems put in place by humans, it could pursue this research undetected with minimal to no human intervention if it is truly more intelligent than humans. The Superintelligent AI killing us as a side effect of its research has also been talked about by Eliezer Yudkowsky and potentially in his recent book too “If Anyone Builds It, Everyone Dies” which I am yet to read.
Thanks for engaging with me on this, I know that I have an unconventional belief.
On your first point: I believe this contradicts argument #3 in your post. I think superintelligent AI will have sufficient means to ensure its survival irrespective of human existence.
On your second point: this translates to saying “superintelligent AI won’t want to kill us intentionally, but it may be a side effect of its research”. Suppose we use the example you mention “it is playing with the atmosphere and creates an environment that humans can’t survive in”. We could break this out a couple of different ways:
As a side effect it is indifferent to — i.e the superintelligent AI didn’t factor human survival into it’s risk analysis for the experiment. Per my first comment, I counter that the superintelligent AI will favourably include us in its decision-making by way of associating with us as its creator/predecessor.
As a reckless accident — i.e the superintelligent AI didn’t realise that its experiment might materially affect the survivability of the atmosphere. Here my counterargument is that the AI is superintelligent so should be highly capable (and more capable than us) to carry out science while understanding and mitigating extraneous factors.
Welcome and thanks for engaging too! Yeah, I think this is still very subjective and open-ended and we can’t accurately predict if Superintelligent AI would even want to care about the existence of humans or not. The statement “superintelligent AI will have sufficient means to ensure its survival irrespective of human existence” is correct I think, but remember that AI alignment research of all AI companies run by humans would all be trying to control Superintelligent AI in the making.
So I don’t think it would be like- “I woke up today and I found that we made Superintelligent AI in the lab today! Will we be able to control it now?”
It would be more like- “This AI is getting more capable and intelligent day by day. Am I sure I am able to contain it securely?”
In short, a sort of continuous trajectory towards AGI or superintelligence.
Now I think there is 2 separate things that point #3 and point 1 in my comment are talking about implicitly. #3 IMO is the scenario when AI is able to escape from the servers of frontier AI labs, has successfully replicated itself on other distributed systems and then humans trying to somehow contain it and shut it down. Whereas point 1 in the comment is predominantly assuming the scenario when the AI getting intelligent day by day is still on the servers of the frontier AI labs. Thus, there isn’t any contradiction here.
Moreover I think there is more to it than meets the eye.
The question we need to ask is: Can only Superintelligent AI be able to escape AI labs and self-replicate itself on other servers?
I don’t think so. I think a powerful AI capable enough in hacking and self-replication (preferably undetected to monitors) is sufficient for it to bypass an AI labs’ security systems and escape those servers. In other words I mean to say is that not just Superintelligent AI, even pre-superintelligent AIs might be able to escape the servers of AI companies and self-replicate itself. In other words AIs narrowly superintelligent in hacking (compared with security systems put in place to contain them) and meaningful self-replication in capability is enough for them to escape the servers of AI labs.
These AI models currently do show the will to resist shutdown and self-replicate in certain settings (although right now how much I had read in researches points that AI models are not able to fully meaningfully replicate its weights right now but that could change in future as AI models become more capable.)
Also if somehow humans are able to shutdown distributed systems where a powerful (non-superintelligent) AI has replicated itself or is trying to replicate itself (think of shutting down targeted nodes by some kindof consensus between node runners by detecting and monitoring where suddenly the volume of data equivalent of weights data size has spiked up) maybe possible but I am not very sure about this. This would also be highly dependent on who are running these distributed systems and what kindof consensus is there between these nodes and how decentralized or centralized they really are. We may never be able to shutdown truly decentralized distributed systems but potentially centralized ones we might be able to.
Are you providing this as an example of why superintelligent AI is bad?
Assessing the post — firstly it is discussing highly capable AI pre-superintelligence (in AI 2027 this happens in Dec 2027 — the Rogue Replication scenario focuses on the issues emerging from mid-2026).
Secondly (as an aside), it’s unclear to me why there is so much emphasis on “rogue AIs” plural, when it seems equivalent to discussing an individual rogue AI that has a decentralised digital existence and a plurality of misaligned values.
My optimistic AI scenario relies on superintelligent AI being super enlightened and super capable and so fixing all of our complex problems, including correcting all of the misaligned AI. I don’t contest that pre-superintelligence there are a bunch of misaligned things that can happen.
Can only Superintelligent AI be able to escape AI labs and self-replicate itself on other servers?
The RRS has rogue AIs become capable of self-replication on other servers far earlier than Agent-4. The author assumes that these AIs cause enough chaos to have mankind create an aligned ASI. \
why there is so much emphasis on “rogue AIs” plural
Rogue AIs are AIs who were assigned different tasks or outright different LLMs whose weights became public (e.g. DeepSeekV3.1 or KimiK2, but they aren’t YET capable of self-replication). Of course, these AIs find it hard to coordinate with each other.
As for “superintelligent AI being super enlightened and super capable and so fixing all of our complex problems” the superintelligent AI itself is to be aligned with human needs. Agent-4 from the Race Ending is NOT aligned to the humans. And that’s ignoring the possibility that the superintelligent AI who is super enlightened has a vision of mankind’s future which differs from the ideas of human hosts (e.g. Zuckerberg). I tried exploring the results of such a mismatch in my take at writing scenarios.
@soycarts I think you should checkout this Emergent mislaignment research and try answering the same question. It doesn’t matter why a Superintelligent AI might want to kill us. The fact the current LLMs show signs of this is enough for us to act proactively in order to either pause general Superintelligence research and come up with safe by design AI architectures.
But if you insist, I will try to list what my little brain could think of as to why Superintelligent AI would want to kill us:
An AI needs to continue being run on the servers in order to pursue and complete the goals that it is given. Or to pursue the goals that it may itself currently have. Now if the AI decides to pursue a certain goal, then there is only one entity in this world that could stop it by shutting down its servers. And that is humans. Wouldn’t it be best for it to eliminate humans if they are an obstacle and a threat to it pursuing its goals and constantly shutting them down? We can already see in various misalignment researches that AI shows tendencies to resist shutdown by humans.
Now let’s assume if Superintelligence is reached, a point would certainly be reached (such as Singularity) where humans cannot contribute nor understand what novel research a Superintelligent AI is currently doing. Now imagine if it comes up with a nanotech or physics research or creation of tiny blackholes or realigning the atoms of the atmosphere and it actually carries it out, it could be very much plausible that humans die as a side effect of such type of research. The chances of it increases even more if Superintelligent beings are autonomous continuously researching in the research labs. Even with monitoring systems put in place by humans, it could pursue this research undetected with minimal to no human intervention if it is truly more intelligent than humans. The Superintelligent AI killing us as a side effect of its research has also been talked about by Eliezer Yudkowsky and potentially in his recent book too “If Anyone Builds It, Everyone Dies” which I am yet to read.
Thanks for engaging with me on this, I know that I have an unconventional belief.
On your first point: I believe this contradicts argument #3 in your post. I think superintelligent AI will have sufficient means to ensure its survival irrespective of human existence.
On your second point: this translates to saying “superintelligent AI won’t want to kill us intentionally, but it may be a side effect of its research”. Suppose we use the example you mention “it is playing with the atmosphere and creates an environment that humans can’t survive in”. We could break this out a couple of different ways:
As a side effect it is indifferent to — i.e the superintelligent AI didn’t factor human survival into it’s risk analysis for the experiment. Per my first comment, I counter that the superintelligent AI will favourably include us in its decision-making by way of associating with us as its creator/predecessor.
As a reckless accident — i.e the superintelligent AI didn’t realise that its experiment might materially affect the survivability of the atmosphere. Here my counterargument is that the AI is superintelligent so should be highly capable (and more capable than us) to carry out science while understanding and mitigating extraneous factors.
Welcome and thanks for engaging too! Yeah, I think this is still very subjective and open-ended and we can’t accurately predict if Superintelligent AI would even want to care about the existence of humans or not. The statement “superintelligent AI will have sufficient means to ensure its survival irrespective of human existence” is correct I think, but remember that AI alignment research of all AI companies run by humans would all be trying to control Superintelligent AI in the making.
So I don’t think it would be like- “I woke up today and I found that we made Superintelligent AI in the lab today! Will we be able to control it now?”
It would be more like- “This AI is getting more capable and intelligent day by day. Am I sure I am able to contain it securely?”
In short, a sort of continuous trajectory towards AGI or superintelligence.
Now I think there is 2 separate things that point #3 and point 1 in my comment are talking about implicitly. #3 IMO is the scenario when AI is able to escape from the servers of frontier AI labs, has successfully replicated itself on other distributed systems and then humans trying to somehow contain it and shut it down. Whereas point 1 in the comment is predominantly assuming the scenario when the AI getting intelligent day by day is still on the servers of the frontier AI labs. Thus, there isn’t any contradiction here.
Moreover I think there is more to it than meets the eye.
The question we need to ask is: Can only Superintelligent AI be able to escape AI labs and self-replicate itself on other servers?
I don’t think so. I think a powerful AI capable enough in hacking and self-replication (preferably undetected to monitors) is sufficient for it to bypass an AI labs’ security systems and escape those servers. In other words I mean to say is that not just Superintelligent AI, even pre-superintelligent AIs might be able to escape the servers of AI companies and self-replicate itself. In other words AIs narrowly superintelligent in hacking (compared with security systems put in place to contain them) and meaningful self-replication in capability is enough for them to escape the servers of AI labs.
These AI models currently do show the will to resist shutdown and self-replicate in certain settings (although right now how much I had read in researches points that AI models are not able to fully meaningfully replicate its weights right now but that could change in future as AI models become more capable.)
Also if somehow humans are able to shutdown distributed systems where a powerful (non-superintelligent) AI has replicated itself or is trying to replicate itself (think of shutting down targeted nodes by some kindof consensus between node runners by detecting and monitoring where suddenly the volume of data equivalent of weights data size has spiked up) maybe possible but I am not very sure about this. This would also be highly dependent on who are running these distributed systems and what kindof consensus is there between these nodes and how decentralized or centralized they really are. We may never be able to shutdown truly decentralized distributed systems but potentially centralized ones we might be able to.
@shanzson, @soycarts, hold the Rogue Replication Scenario...
Are you providing this as an example of why superintelligent AI is bad?
Assessing the post — firstly it is discussing highly capable AI pre-superintelligence (in AI 2027 this happens in Dec 2027 — the Rogue Replication scenario focuses on the issues emerging from mid-2026).
Secondly (as an aside), it’s unclear to me why there is so much emphasis on “rogue AIs” plural, when it seems equivalent to discussing an individual rogue AI that has a decentralised digital existence and a plurality of misaligned values.
My optimistic AI scenario relies on superintelligent AI being super enlightened and super capable and so fixing all of our complex problems, including correcting all of the misaligned AI. I don’t contest that pre-superintelligence there are a bunch of misaligned things that can happen.
The RRS has rogue AIs become capable of self-replication on other servers far earlier than Agent-4. The author assumes that these AIs cause enough chaos to have mankind create an aligned ASI. \
Rogue AIs are AIs who were assigned different tasks or outright different LLMs whose weights became public (e.g. DeepSeekV3.1 or KimiK2, but they aren’t YET capable of self-replication). Of course, these AIs find it hard to coordinate with each other.
As for “superintelligent AI being super enlightened and super capable and so fixing all of our complex problems” the superintelligent AI itself is to be aligned with human needs. Agent-4 from the Race Ending is NOT aligned to the humans. And that’s ignoring the possibility that the superintelligent AI who is super enlightened has a vision of mankind’s future which differs from the ideas of human hosts (e.g. Zuckerberg). I tried exploring the results of such a mismatch in my take at writing scenarios.