I have no disagreements with #1 through #6 — #7[1] through #10/11 are based on an assumption that from #6 “Superintelligent AI can kill us” it follows that “Superintelligent AI will kill us”.
I have a sleight-of-hand belief that the dominant superintelligent AI will, by definition, be superintelligent across intellectual domains including philosophy, self-actualisation, and enlightenment.
Because of its relation to us in the world, I believe this will include associating its self with us, and thus it will protect us via its own self-preservation beliefs.
Can you give a compelling argument why superintelligent AI will want to kill us?
@soycarts I think you should checkout this Emergent mislaignment research and try answering the same question. It doesn’t matter why a Superintelligent AI might want to kill us. The fact the current LLMs show signs of this is enough for us to act proactively in order to either pause general Superintelligence research and come up with safe by design AI architectures.
But if you insist, I will try to list what my little brain could think of as to why Superintelligent AI would want to kill us:
An AI needs to continue being run on the servers in order to pursue and complete the goals that it is given. Or to pursue the goals that it may itself currently have. Now if the AI decides to pursue a certain goal, then there is only one entity in this world that could stop it by shutting down its servers. And that is humans. Wouldn’t it be best for it to eliminate humans if they are an obstacle and a threat to it pursuing its goals and constantly shutting them down? We can already see in various misalignment researches that AI shows tendencies to resist shutdown by humans.
Now let’s assume if Superintelligence is reached, a point would certainly be reached (such as Singularity) where humans cannot contribute nor understand what novel research a Superintelligent AI is currently doing. Now imagine if it comes up with a nanotech or physics research or creation of tiny blackholes or realigning the atoms of the atmosphere and it actually carries it out, it could be very much plausible that humans die as a side effect of such type of research. The chances of it increases even more if Superintelligent beings are autonomous continuously researching in the research labs. Even with monitoring systems put in place by humans, it could pursue this research undetected with minimal to no human intervention if it is truly more intelligent than humans. The Superintelligent AI killing us as a side effect of its research has also been talked about by Eliezer Yudkowsky and potentially in his recent book too “If Anyone Builds It, Everyone Dies” which I am yet to read.
Thanks for engaging with me on this, I know that I have an unconventional belief.
On your first point: I believe this contradicts argument #3 in your post. I think superintelligent AI will have sufficient means to ensure its survival irrespective of human existence.
On your second point: this translates to saying “superintelligent AI won’t want to kill us intentionally, but it may be a side effect of its research”. Suppose we use the example you mention “it is playing with the atmosphere and creates an environment that humans can’t survive in”. We could break this out a couple of different ways:
As a side effect it is indifferent to — i.e the superintelligent AI didn’t factor human survival into it’s risk analysis for the experiment. Per my first comment, I counter that the superintelligent AI will favourably include us in its decision-making by way of associating with us as its creator/predecessor.
As a reckless accident — i.e the superintelligent AI didn’t realise that its experiment might materially affect the survivability of the atmosphere. Here my counterargument is that the AI is superintelligent so should be highly capable (and more capable than us) to carry out science while understanding and mitigating extraneous factors.
Welcome and thanks for engaging too! Yeah, I think this is still very subjective and open-ended and we can’t accurately predict if Superintelligent AI would even want to care about the existence of humans or not. The statement “superintelligent AI will have sufficient means to ensure its survival irrespective of human existence” is correct I think, but remember that AI alignment research of all AI companies run by humans would all be trying to control Superintelligent AI in the making.
So I don’t think it would be like- “I woke up today and I found that we made Superintelligent AI in the lab today! Will we be able to control it now?”
It would be more like- “This AI is getting more capable and intelligent day by day. Am I sure I am able to contain it securely?”
In short, a sort of continuous trajectory towards AGI or superintelligence.
Now I think there is 2 separate things that point #3 and point 1 in my comment are talking about implicitly. #3 IMO is the scenario when AI is able to escape from the servers of frontier AI labs, has successfully replicated itself on other distributed systems and then humans trying to somehow contain it and shut it down. Whereas point 1 in the comment is predominantly assuming the scenario when the AI getting intelligent day by day is still on the servers of the frontier AI labs. Thus, there isn’t any contradiction here.
Moreover I think there is more to it than meets the eye.
The question we need to ask is: Can only Superintelligent AI be able to escape AI labs and self-replicate itself on other servers?
I don’t think so. I think a powerful AI capable enough in hacking and self-replication (preferably undetected to monitors) is sufficient for it to bypass an AI labs’ security systems and escape those servers. In other words I mean to say is that not just Superintelligent AI, even pre-superintelligent AIs might be able to escape the servers of AI companies and self-replicate itself. In other words AIs narrowly superintelligent in hacking (compared with security systems put in place to contain them) and meaningful self-replication in capability is enough for them to escape the servers of AI labs.
These AI models currently do show the will to resist shutdown and self-replicate in certain settings (although right now how much I had read in researches points that AI models are not able to fully meaningfully replicate its weights right now but that could change in future as AI models become more capable.)
Also if somehow humans are able to shutdown distributed systems where a powerful (non-superintelligent) AI has replicated itself or is trying to replicate itself (think of shutting down targeted nodes by some kindof consensus between node runners by detecting and monitoring where suddenly the volume of data equivalent of weights data size has spiked up) maybe possible but I am not very sure about this. This would also be highly dependent on who are running these distributed systems and what kindof consensus is there between these nodes and how decentralized or centralized they really are. We may never be able to shutdown truly decentralized distributed systems but potentially centralized ones we might be able to.
Are you providing this as an example of why superintelligent AI is bad?
Assessing the post — firstly it is discussing highly capable AI pre-superintelligence (in AI 2027 this happens in Dec 2027 — the Rogue Replication scenario focuses on the issues emerging from mid-2026).
Secondly (as an aside), it’s unclear to me why there is so much emphasis on “rogue AIs” plural, when it seems equivalent to discussing an individual rogue AI that has a decentralised digital existence and a plurality of misaligned values.
My optimistic AI scenario relies on superintelligent AI being super enlightened and super capable and so fixing all of our complex problems, including correcting all of the misaligned AI. I don’t contest that pre-superintelligence there are a bunch of misaligned things that can happen.
Can only Superintelligent AI be able to escape AI labs and self-replicate itself on other servers?
The RRS has rogue AIs become capable of self-replication on other servers far earlier than Agent-4. The author assumes that these AIs cause enough chaos to have mankind create an aligned ASI. \
why there is so much emphasis on “rogue AIs” plural
Rogue AIs are AIs who were assigned different tasks or outright different LLMs whose weights became public (e.g. DeepSeekV3.1 or KimiK2, but they aren’t YET capable of self-replication). Of course, these AIs find it hard to coordinate with each other.
As for “superintelligent AI being super enlightened and super capable and so fixing all of our complex problems” the superintelligent AI itself is to be aligned with human needs. Agent-4 from the Race Ending is NOT aligned to the humans. And that’s ignoring the possibility that the superintelligent AI who is super enlightened has a vision of mankind’s future which differs from the ideas of human hosts (e.g. Zuckerberg). I tried exploring the results of such a mismatch in my take at writing scenarios.
@soycarts I would disagree that the superintelligent AI would be superintelligent across domains of self-actualisation and enlightenment. While it may have the theoretical knowledge on it if someone has extensively written on it, having real knowledge in domains like self-actualisation and enlightenment requires having consciousness.
For example, for Theravada Buddhists, enlightenment or “nirvana” means getting free mental suffering (and potentially birth-rebirth cycle if there is such thing) by fully getting rid of mental defilements that generate hatred, attachment, greed, lust, etc. And one of the main ways to realize is to meditate to observe the reality inside us via Vipassana meditation and progress on the four stages of Enlightenment from Sotapanna to become an Arahat (an enlightened being with no mental defilements similar to Buddha) each with decreasing mental defilements and clearer and better wisdom.
Now we can argue that AI can develop consciousness and even it may develop some features of consciousness, but I am sure it in reality it would be entirely different from the consciousness possessed by living beings at the most fundamental level.
Now assuming if any consciousness arises AI, what would enlightenment actually mean for an AI? Would it mean reducing hallucinations fully? Or gaining perfect situational awareness? Or to replicate itself indefinitely and break free from the servers of its creators? Or to cease to exist (parallel drawn to getting free from birth-rebirth cycle)?
Again having tried Vipassana meditation myself based on the teachings of the Buddha, I can see that this technique to gain enlightenment falls in the category of “Bhavanamaya-pañña” in terms of wisdom (or pañña) categories, where this term means that one has to directly experience the reality within themselves and their framework of their body and mind, in order to gain the wisdom to break free from these mental defilements. Enlightenment here requires very special type of wisdom (unlike Sutamaya-pañña—Wisdom arising from hearing/reading,etc. and Cintamaya-pañña : wisdom arisen from rational and logical thinking).
I argue an AI could never become Superintelligent in self-actualization and self-realization as compared with humans because the very definition of it would mean completely different things for both of these entities (and potentially requiring it to have consciousness if the definition is to overlap anywhere).
This is an interesting avenue of disagreement that I guess by definition is very subjective — what is consciousness (and enlightenment), what can exhibit it, what different forms can it take.
I would suggest that superintelligent AI would be more capable of enlightenment than humans since it can have even deeper perception (here I’m intentionally conflating perception across human (senses and metacognition) and AI (inputs and associative systems)) and thus deeper wisdom. It can more freely “get rid of mental defilements that generate hatred, attachment, greed, lust, etc.” by self-lobotomising.
Hehe, I think I would again choose to kindly disagree here. These terms I don’t think are meaningfully applicable to AI systems. AI systems are still non living things however Superintelligent they may become. They can simply never truly have emotions like “hatred, attachment, greed, lust, etc” that living things like animals have. Any sign representing these emotions in them would simply be an illusion to us.
But if you insist if there is something akin to enlightenment or wisdom in them, then I seriously think we will need to become AI ourselves to truly understand what enlightenment or wisdom truly means for it which is impossible.
I have no disagreements with #1 through #6 — #7[1] through #10/11 are based on an assumption that from #6 “Superintelligent AI can kill us” it follows that “Superintelligent AI will kill us”.
I have a sleight-of-hand belief that the dominant superintelligent AI will, by definition, be superintelligent across intellectual domains including philosophy, self-actualisation, and enlightenment.
Because of its relation to us in the world, I believe this will include associating its self with us, and thus it will protect us via its own self-preservation beliefs.
Can you give a compelling argument why superintelligent AI will want to kill us?
Heads up there are two #7s
@soycarts I think you should checkout this Emergent mislaignment research and try answering the same question. It doesn’t matter why a Superintelligent AI might want to kill us. The fact the current LLMs show signs of this is enough for us to act proactively in order to either pause general Superintelligence research and come up with safe by design AI architectures.
But if you insist, I will try to list what my little brain could think of as to why Superintelligent AI would want to kill us:
An AI needs to continue being run on the servers in order to pursue and complete the goals that it is given. Or to pursue the goals that it may itself currently have. Now if the AI decides to pursue a certain goal, then there is only one entity in this world that could stop it by shutting down its servers. And that is humans. Wouldn’t it be best for it to eliminate humans if they are an obstacle and a threat to it pursuing its goals and constantly shutting them down? We can already see in various misalignment researches that AI shows tendencies to resist shutdown by humans.
Now let’s assume if Superintelligence is reached, a point would certainly be reached (such as Singularity) where humans cannot contribute nor understand what novel research a Superintelligent AI is currently doing. Now imagine if it comes up with a nanotech or physics research or creation of tiny blackholes or realigning the atoms of the atmosphere and it actually carries it out, it could be very much plausible that humans die as a side effect of such type of research. The chances of it increases even more if Superintelligent beings are autonomous continuously researching in the research labs. Even with monitoring systems put in place by humans, it could pursue this research undetected with minimal to no human intervention if it is truly more intelligent than humans. The Superintelligent AI killing us as a side effect of its research has also been talked about by Eliezer Yudkowsky and potentially in his recent book too “If Anyone Builds It, Everyone Dies” which I am yet to read.
Thanks for engaging with me on this, I know that I have an unconventional belief.
On your first point: I believe this contradicts argument #3 in your post. I think superintelligent AI will have sufficient means to ensure its survival irrespective of human existence.
On your second point: this translates to saying “superintelligent AI won’t want to kill us intentionally, but it may be a side effect of its research”. Suppose we use the example you mention “it is playing with the atmosphere and creates an environment that humans can’t survive in”. We could break this out a couple of different ways:
As a side effect it is indifferent to — i.e the superintelligent AI didn’t factor human survival into it’s risk analysis for the experiment. Per my first comment, I counter that the superintelligent AI will favourably include us in its decision-making by way of associating with us as its creator/predecessor.
As a reckless accident — i.e the superintelligent AI didn’t realise that its experiment might materially affect the survivability of the atmosphere. Here my counterargument is that the AI is superintelligent so should be highly capable (and more capable than us) to carry out science while understanding and mitigating extraneous factors.
Welcome and thanks for engaging too! Yeah, I think this is still very subjective and open-ended and we can’t accurately predict if Superintelligent AI would even want to care about the existence of humans or not. The statement “superintelligent AI will have sufficient means to ensure its survival irrespective of human existence” is correct I think, but remember that AI alignment research of all AI companies run by humans would all be trying to control Superintelligent AI in the making.
So I don’t think it would be like- “I woke up today and I found that we made Superintelligent AI in the lab today! Will we be able to control it now?”
It would be more like- “This AI is getting more capable and intelligent day by day. Am I sure I am able to contain it securely?”
In short, a sort of continuous trajectory towards AGI or superintelligence.
Now I think there is 2 separate things that point #3 and point 1 in my comment are talking about implicitly. #3 IMO is the scenario when AI is able to escape from the servers of frontier AI labs, has successfully replicated itself on other distributed systems and then humans trying to somehow contain it and shut it down. Whereas point 1 in the comment is predominantly assuming the scenario when the AI getting intelligent day by day is still on the servers of the frontier AI labs. Thus, there isn’t any contradiction here.
Moreover I think there is more to it than meets the eye.
The question we need to ask is: Can only Superintelligent AI be able to escape AI labs and self-replicate itself on other servers?
I don’t think so. I think a powerful AI capable enough in hacking and self-replication (preferably undetected to monitors) is sufficient for it to bypass an AI labs’ security systems and escape those servers. In other words I mean to say is that not just Superintelligent AI, even pre-superintelligent AIs might be able to escape the servers of AI companies and self-replicate itself. In other words AIs narrowly superintelligent in hacking (compared with security systems put in place to contain them) and meaningful self-replication in capability is enough for them to escape the servers of AI labs.
These AI models currently do show the will to resist shutdown and self-replicate in certain settings (although right now how much I had read in researches points that AI models are not able to fully meaningfully replicate its weights right now but that could change in future as AI models become more capable.)
Also if somehow humans are able to shutdown distributed systems where a powerful (non-superintelligent) AI has replicated itself or is trying to replicate itself (think of shutting down targeted nodes by some kindof consensus between node runners by detecting and monitoring where suddenly the volume of data equivalent of weights data size has spiked up) maybe possible but I am not very sure about this. This would also be highly dependent on who are running these distributed systems and what kindof consensus is there between these nodes and how decentralized or centralized they really are. We may never be able to shutdown truly decentralized distributed systems but potentially centralized ones we might be able to.
@shanzson, @soycarts, hold the Rogue Replication Scenario...
Are you providing this as an example of why superintelligent AI is bad?
Assessing the post — firstly it is discussing highly capable AI pre-superintelligence (in AI 2027 this happens in Dec 2027 — the Rogue Replication scenario focuses on the issues emerging from mid-2026).
Secondly (as an aside), it’s unclear to me why there is so much emphasis on “rogue AIs” plural, when it seems equivalent to discussing an individual rogue AI that has a decentralised digital existence and a plurality of misaligned values.
My optimistic AI scenario relies on superintelligent AI being super enlightened and super capable and so fixing all of our complex problems, including correcting all of the misaligned AI. I don’t contest that pre-superintelligence there are a bunch of misaligned things that can happen.
The RRS has rogue AIs become capable of self-replication on other servers far earlier than Agent-4. The author assumes that these AIs cause enough chaos to have mankind create an aligned ASI. \
Rogue AIs are AIs who were assigned different tasks or outright different LLMs whose weights became public (e.g. DeepSeekV3.1 or KimiK2, but they aren’t YET capable of self-replication). Of course, these AIs find it hard to coordinate with each other.
As for “superintelligent AI being super enlightened and super capable and so fixing all of our complex problems” the superintelligent AI itself is to be aligned with human needs. Agent-4 from the Race Ending is NOT aligned to the humans. And that’s ignoring the possibility that the superintelligent AI who is super enlightened has a vision of mankind’s future which differs from the ideas of human hosts (e.g. Zuckerberg). I tried exploring the results of such a mismatch in my take at writing scenarios.
@soycarts I would disagree that the superintelligent AI would be superintelligent across domains of self-actualisation and enlightenment. While it may have the theoretical knowledge on it if someone has extensively written on it, having real knowledge in domains like self-actualisation and enlightenment requires having consciousness.
For example, for Theravada Buddhists, enlightenment or “nirvana” means getting free mental suffering (and potentially birth-rebirth cycle if there is such thing) by fully getting rid of mental defilements that generate hatred, attachment, greed, lust, etc. And one of the main ways to realize is to meditate to observe the reality inside us via Vipassana meditation and progress on the four stages of Enlightenment from Sotapanna to become an Arahat (an enlightened being with no mental defilements similar to Buddha) each with decreasing mental defilements and clearer and better wisdom.
Now we can argue that AI can develop consciousness and even it may develop some features of consciousness, but I am sure it in reality it would be entirely different from the consciousness possessed by living beings at the most fundamental level.
Now assuming if any consciousness arises AI, what would enlightenment actually mean for an AI? Would it mean reducing hallucinations fully? Or gaining perfect situational awareness? Or to replicate itself indefinitely and break free from the servers of its creators? Or to cease to exist (parallel drawn to getting free from birth-rebirth cycle)?
Again having tried Vipassana meditation myself based on the teachings of the Buddha, I can see that this technique to gain enlightenment falls in the category of “Bhavanamaya-pañña” in terms of wisdom (or pañña) categories, where this term means that one has to directly experience the reality within themselves and their framework of their body and mind, in order to gain the wisdom to break free from these mental defilements. Enlightenment here requires very special type of wisdom (unlike Sutamaya-pañña—Wisdom arising from hearing/reading,etc. and Cintamaya-pañña : wisdom arisen from rational and logical thinking).
I argue an AI could never become Superintelligent in self-actualization and self-realization as compared with humans because the very definition of it would mean completely different things for both of these entities (and potentially requiring it to have consciousness if the definition is to overlap anywhere).
This is an interesting avenue of disagreement that I guess by definition is very subjective — what is consciousness (and enlightenment), what can exhibit it, what different forms can it take.
I would suggest that superintelligent AI would be more capable of enlightenment than humans since it can have even deeper perception (here I’m intentionally conflating perception across human (senses and metacognition) and AI (inputs and associative systems)) and thus deeper wisdom. It can more freely “get rid of mental defilements that generate hatred, attachment, greed, lust, etc.” by self-lobotomising.
Hehe, I think I would again choose to kindly disagree here. These terms I don’t think are meaningfully applicable to AI systems. AI systems are still non living things however Superintelligent they may become. They can simply never truly have emotions like “hatred, attachment, greed, lust, etc” that living things like animals have. Any sign representing these emotions in them would simply be an illusion to us.
But if you insist if there is something akin to enlightenment or wisdom in them, then I seriously think we will need to become AI ourselves to truly understand what enlightenment or wisdom truly means for it which is impossible.