You present certain arguments according to which having AGI around is inherently intolerable or cursed. But they seem to be getting very general, so general that they could be reasons why a child must not have parents, or a country must not have a government. There just cannot be a power over you, that diverges from you even a little. Could you clarify what’s wrong about “rule by AGI” that doesn’t apply to “rule by parents” or “rule by the state”?
Humans have empathy, a mutual need of each other to survive, and can keep each other in check. None of us is intelligent enough to act unilaterally against the wishes of everyone else. None of us would want to, unless they’re insane, and then couldn’t because they would lack the intelligence for it.
Parents have biological imperatives. If you want to recreate such an imperative in something else, you would have to make that something else actually human, if you make it anything else, the empathy doesn’t simply fit(not that you would ever know how to recreate it to start with). The short answer here is that we don’t know how this empathy works, and that not knowing we can’t assume we will somehow know in time. It is my opinion that this is unknowable for the purpose of transferring into an AI. These drives aren’t conscious, per se, they have been encoded in us over enormous aeons, we do not in fact understand them in a scientific sense(...to the extent you would need to, obviously), we merely understand them experientially, which can cloud one’s judgement about our ability to manipulate it in the ways you would seek to with an AI.
The state has similar drives. It has a certain sort of empathy, no matter how strange this might sound. But more relevantly in its case, it has to obey the simple laws of politics which say that it would be very hard for it to set out to attack those it is expected to lead and protect. At some point in trying to do so, it would run into the simple problem that it is made out of people who wouldn’t see it in their interest to carry out its orders. We are all the state in some sense.
Also, as humans, we have an inherent track record which matters quite a lot for the sake of contemplating what we might do to each other that is too destructive. A long history of being able to coexist.
The problem is not that I am too general. It is that you anthropomorphise incredibly easily, and it doesn’t make any sense. An AGI would obviously not care about our ability to overpower it, because this ability wouldn’t exist past a point. It wouldn’t care about us at an emotional level because it lacks our particular ability to feel such emotions, and it lacks the history that created them blindly. Emotions are barely even things we can very clearly reason about. We have no clue what they are. We know only our feeling of them. You can trivialise them as things that mean simply something conducive to your goals or not, but that seems far too reductive. Not because it might not be just that, but because such explanations as we have, including this, are very likely incredibly deceptive as to their underlying complexity.
There are many ways to talk about this. But ultimately, you lack an answer for the question of what do you do exactly about something you can’t predict changing the environment in ways you can’t react to after the fact, if it’s bad? And why would it ever not be bad? Why would you ever assume that things going badly isn’t the rule, absent human direction?
I think you’ve grown incredibly complacent about our apparent comfort and safety as a civilisation, and take this feeling wholesale and apply it optimistically into the future, no matter how extreme the circumstance. The truth is we are not as evolved as we think of ourselves to begin with, we are incredibly fragile, and if we lose control over our ability to shape our environment, there is no reason to think that will go in any way except completely badly. It always does. Anything we do that is not deliberate ends in disaster. The universe is not an inherent joyride. Everything we’ve done to make it resemble one has been deliberate, not guaranteed.
Our ability to control our environment is key. Our ability to understand our systems in detail is key. Without this kind of awareness, we are completely defenseless and have no reason to expect anything good. This should be entirely obvious.
Also, we have no choice when it comes to ourselves. To ask “how can you trust humanity” is akin to asking “how can you trust yourself?”. There is no negative answer possible here. We are forced to operate in this way.
If you permit, I’ll summarize a lot of that as follows: the reason that “rule by AGI” is different is that it is so alien, and we don’t know how to make it significantly less alien.
The argument from alienness still makes sense, but its strength has eroded somewhat in the era of conversational AI. It turned out, not only that directing powerful general pattern-matchers at the human textual corpus gave them the ability to talk like a human being, but that it induced in them an internal conceptual structure that humans are capable of interpreting.
An optimist might say: maybe we can use these techniques to create a first approximation to an anthropomorphically benevolent being, then ask it to devise superior techniques which will be sufficient to create the real thing, trusting that enough concepts have been correctly inferred, for it to figure out what is wrong or missing in our specifications.
This kind of optimism is based on the hope that anthropomorphic benevolence, as a target in the space of possible minds, is surrounded by a “basin of attraction”. All we have to do is land in that basin, i.e. we only need to specify the goal up to a certain degree of accuracy, and provide the task to a mind which is sufficiently close to anthropomorphic, and any details that were wrong or missing will be corrected and filled in, by the intrinsic logic of the problem.
Regarding empathy in particular, I think any mystery pertaining to it is largely because it involves consciousness, and consciousness remains a fundamental problem rather than just a technical problem, for scientific understanding… I spent quite a few years “studying consciousness”, from the perspective of wanting to understand its nature and how it relates to everything else, and this is an area in which I believe in the possibility of a conceptual breakthrough. That is, if the right connections are made, the rift between subjective experience and the naturalistic worldview could be closed completely, and many mysteries would fall into place.
Now, I’m going to cut myself short here, even though there is much more to discuss in your (very helpful) critiques. Hopefully I can return to them. But I just want to say a few things about where I’m coming from.
I have presented these sketchy solutions, or reasons for hope, to a handful of your objections, not because I am decidedly optimistic, but just to indicate where the counterargument lies. We simply do not know whether there are forms of artificial superintelligence which would naturally coexist well with humanity, or whether it’s a tightrope walk to coexistence, no matter what design you use. That uncertainty alone should be reason enough to stop what we’re doing, but that’s not how our elites see things. To those who want to stop the juggernaut, good luck. But as a theory-minded person, I intend to work on the theory of how to steer the juggernaut so it doesn’t crush us. One reason for this focus is that there truly may be very little time.
I think you understand the problem a little better now but you still think alignment is possible, and it’s not, it’s a complete waste of your time to try to solve it.
What you still don’t grasp is that the human is themselves the perfect alignment target, and is something infinitely hard to discover ex nihilo. Anything short of human is apocalyptic when it comes to greater powers than us. You still think otherwise.
There’s a terrible flaw in your logic. If an AI could solve its own alignment it would already be aligned. Because alignment is a continuous process, for the reason I explained, namely that our first value is the ability to choose our preferences, and our preferences shift continuously and inevitably.
I don’t see that consciousness matters to this topic at all.
Optimism is bad. It deliberately seeks out delusions. Pessimism is concerned with predicting the worst in order to be prepared by it. It has no need of the human to reason about it and correct it, once in place. It is the perfect strategy. If it fails, everything fails.
Let me put it in video game terms. Imagine you’re playing a competitive videogame. Being an optimist about how easy the game is means seeking to play with similarly skilled players so that you can “learn better” and at your own pace. Being a pessimist means that if offered the chance to play against nothing but the strongest players, you take it.
Why?
Because any single action you take against these players will be a lot more successful against lesser players.
Why?
Because you are inherently training for playing the game in the best possible way. You are forced to do this. Any move the opponent makes that succeeds inherently teaches you about what has no business of working against anyone else(namely the strategy you were employing at the time you were playing against them). You could have done the same thing in a game with lesser opponents and won. But you now have no reason to. You know that winning like that is improbable, provided that the whole playerbase learns as a whole. It is a strategy that works only temporarily and against worse opponents. It will sometimes work against better opponents but very rarely. One day it will not work against anyone(at least asymptotically). You are learning about what strategies will not work over time. This is why you play against better opponents. And this is why you are a pessimist. Because the universe works exactly the same. Being an optimist means training in an environment that is unlikely to produce successful results, and that even if it does, will not do so for long. The best possible thing to do is to train with perfection in mind. And for that you need a better opponent. A stronger orientation towards what can really go wrong.
I consider alignment impossible(yes, literally and completely)
For which definition of alignment?
There are a number of routes to AI safety.
Alignment roughly means the AI has goals, or values similar to human ones , so that even acting agentively without supervision, it will do what we want , because that’s also it’ what it wants. There is a lot of semantic confusion between people who use “alignment” in an engineering sense,meaning something that renders current AI safe in good enough way—and the people who use it to mean a maths style solution , that applies perfectly to every case.
Control means that it doesn’t matter what the AI wants, if it wants anything, because we can make it do what we want.
Corrigibility means alignment that can be changed once an AI is up and running. Control could be considered extreme corrigibility.
Non agency. Alignment and Control are both responses to agency. A third approach is non-agentic “tool AI” which responds to a specific instruction or request. Current (2025) AI’s are fairly tool like.
It is our values that are infinitely complex and can’t be encoded into AI.
That’s not literally true, since there a finite number of people with a finite number of neurons each.
Our values are simply our desired state for the world. This state changes automatically and constantly. An AGI aligned to our values is impossible to create precisely because the most fundamental value we have is the ability to spontaneously decide what our preferred future state is.
You wouldn’t be able to create a sovereign AI that is given a fixed set of values match human values .. but there are other things you can do to achieve safety , like solving control as opposed to alignment.
Humans are not artificially aligned. They are aligned via billions of years of evolution
Humans aren’t aligned, in the sense of having identical values—there are constant disagreements about basic values. That makes safety via alignment impossible , but doesn’t make safety impossible.
Anything short of human is apocalyptic
Why? I have seen that asserted, but I have never seen a valid argument for it.
I think that the only way to stop AGI is to convince as many people that it should be stopped as it would take to actually stop it.
We non-doomers are not convinced by the arguments, we have seen, where they exist at all. Therefore, doomers need better arguments, not more comnversations.
You don’t know for a fact that there are finite people with finite numbers of neurons, but leaving that aside and accepting it, what’s infinitely complex about that set of people and neurons is the information contained within them. That’s because you can’t keep up with two people’s thoughts at the same time, let alone with eight billion of them. The infinity is in the fact that you can’t experience what anyone else does completely. What you get is emergent alignment(understood as similarity, not identicality), but one that is not a choice, but a given, and an old one at that. So, you don’t get alignment this way with AI. In its case, it’s neither a given nor old. That makes it inherently dangerous, no matter what you tell yourself about its structure as you perceive it.
If you have issues with my usage of infinity above, I advise you to understand that words are intrinsically polysemous, and we use them as carriers of meaning, and not the other way around. I am trying to explain things to you, not have a debate about definitions(though we can if you like; except if you do it by simply asserting that I should only use words in the way you prefer, I will simply ask you why and we can go from there). If you didn’t understand what I said, feel free to tell me and I’ll explain it further.
You can’t make a system smarter than you corrigible, because to make something corrigible you have to understand it, but if it’s smarter than you, you don’t in fact understand it. This would be a good place to ask you what you understand by intelligence, since it’s the reason you believe otherwise. Specifically, tell me what sort of system you imagine an AGI to be such that it is both smarter than us and corrigible. Tell me what that means, however you please.
Let’s go through everything you believe “we can do to achieve safety”. Nota bene: when I speak about alignment I employ any one of the definitions you would normally interpret according to context(as it applies to the question of an AGI being safe). Specifically, tell me what it is that you think makes it so we can make systems more complex than ourselves safe. Or, if your claim is that safety doesn’t have to be demonstrated because it’s a law of physics, then we can debate what you understand by safety. Kindly tell me what makes it so everything is safe and as such nothing has to be defended as having that trait. Alternatively, if you admit that you don’t know that AGI can be safe, I would ask you why you want to create it to start with. Is there something else you consider more dangerous than AGI that you want AGI to protect you from? Which is? Let’s assume we get into that conversation, and there is such a thing, next you should tell me what reasons you have for finding that thing more dangerous than AGI.
Note, these are mere suggestions, feel free to reply however you want, of course. I will try to align with your understanding such as to make my arguments further regardless, and I’ll gladly update my beliefs by adopting yours if they prove to be more coherent than mine.
Anything short of human is apocalyptic because humans are the only entities we’re aware of that actively seek to protect humanity as a whole. The rest of the cosmos is actively trying to kill us. This is universally true. We have no reason at all to think it is possible for anything not to want to kill us, or not succeed in doing so if we gave it control over our environment. Things observe their own laws of action, including things that resemble us, but are not us. Those laws are not identical with our laws, and as such, are at odds. If given too much discretion, they will erode our experience to the point of deletion.
I would tell you that one of us has a more correct view on this than the other. It is in both our interest that we discover who does. Because regardless of who it is, both parties are interested in either of the two making better decisions. You agree with this, yes?
You present certain arguments according to which having AGI around is inherently intolerable or cursed. But they seem to be getting very general, so general that they could be reasons why a child must not have parents, or a country must not have a government. There just cannot be a power over you, that diverges from you even a little. Could you clarify what’s wrong about “rule by AGI” that doesn’t apply to “rule by parents” or “rule by the state”?
Humans have empathy, a mutual need of each other to survive, and can keep each other in check. None of us is intelligent enough to act unilaterally against the wishes of everyone else. None of us would want to, unless they’re insane, and then couldn’t because they would lack the intelligence for it.
Parents have biological imperatives. If you want to recreate such an imperative in something else, you would have to make that something else actually human, if you make it anything else, the empathy doesn’t simply fit(not that you would ever know how to recreate it to start with). The short answer here is that we don’t know how this empathy works, and that not knowing we can’t assume we will somehow know in time. It is my opinion that this is unknowable for the purpose of transferring into an AI. These drives aren’t conscious, per se, they have been encoded in us over enormous aeons, we do not in fact understand them in a scientific sense(...to the extent you would need to, obviously), we merely understand them experientially, which can cloud one’s judgement about our ability to manipulate it in the ways you would seek to with an AI.
The state has similar drives. It has a certain sort of empathy, no matter how strange this might sound. But more relevantly in its case, it has to obey the simple laws of politics which say that it would be very hard for it to set out to attack those it is expected to lead and protect. At some point in trying to do so, it would run into the simple problem that it is made out of people who wouldn’t see it in their interest to carry out its orders. We are all the state in some sense.
Also, as humans, we have an inherent track record which matters quite a lot for the sake of contemplating what we might do to each other that is too destructive. A long history of being able to coexist.
The problem is not that I am too general. It is that you anthropomorphise incredibly easily, and it doesn’t make any sense. An AGI would obviously not care about our ability to overpower it, because this ability wouldn’t exist past a point. It wouldn’t care about us at an emotional level because it lacks our particular ability to feel such emotions, and it lacks the history that created them blindly. Emotions are barely even things we can very clearly reason about. We have no clue what they are. We know only our feeling of them. You can trivialise them as things that mean simply something conducive to your goals or not, but that seems far too reductive. Not because it might not be just that, but because such explanations as we have, including this, are very likely incredibly deceptive as to their underlying complexity.
There are many ways to talk about this. But ultimately, you lack an answer for the question of what do you do exactly about something you can’t predict changing the environment in ways you can’t react to after the fact, if it’s bad? And why would it ever not be bad? Why would you ever assume that things going badly isn’t the rule, absent human direction?
I think you’ve grown incredibly complacent about our apparent comfort and safety as a civilisation, and take this feeling wholesale and apply it optimistically into the future, no matter how extreme the circumstance. The truth is we are not as evolved as we think of ourselves to begin with, we are incredibly fragile, and if we lose control over our ability to shape our environment, there is no reason to think that will go in any way except completely badly. It always does. Anything we do that is not deliberate ends in disaster. The universe is not an inherent joyride. Everything we’ve done to make it resemble one has been deliberate, not guaranteed.
Our ability to control our environment is key. Our ability to understand our systems in detail is key. Without this kind of awareness, we are completely defenseless and have no reason to expect anything good. This should be entirely obvious.
Also, we have no choice when it comes to ourselves. To ask “how can you trust humanity” is akin to asking “how can you trust yourself?”. There is no negative answer possible here. We are forced to operate in this way.
If you permit, I’ll summarize a lot of that as follows: the reason that “rule by AGI” is different is that it is so alien, and we don’t know how to make it significantly less alien.
The argument from alienness still makes sense, but its strength has eroded somewhat in the era of conversational AI. It turned out, not only that directing powerful general pattern-matchers at the human textual corpus gave them the ability to talk like a human being, but that it induced in them an internal conceptual structure that humans are capable of interpreting.
An optimist might say: maybe we can use these techniques to create a first approximation to an anthropomorphically benevolent being, then ask it to devise superior techniques which will be sufficient to create the real thing, trusting that enough concepts have been correctly inferred, for it to figure out what is wrong or missing in our specifications.
This kind of optimism is based on the hope that anthropomorphic benevolence, as a target in the space of possible minds, is surrounded by a “basin of attraction”. All we have to do is land in that basin, i.e. we only need to specify the goal up to a certain degree of accuracy, and provide the task to a mind which is sufficiently close to anthropomorphic, and any details that were wrong or missing will be corrected and filled in, by the intrinsic logic of the problem.
Regarding empathy in particular, I think any mystery pertaining to it is largely because it involves consciousness, and consciousness remains a fundamental problem rather than just a technical problem, for scientific understanding… I spent quite a few years “studying consciousness”, from the perspective of wanting to understand its nature and how it relates to everything else, and this is an area in which I believe in the possibility of a conceptual breakthrough. That is, if the right connections are made, the rift between subjective experience and the naturalistic worldview could be closed completely, and many mysteries would fall into place.
Now, I’m going to cut myself short here, even though there is much more to discuss in your (very helpful) critiques. Hopefully I can return to them. But I just want to say a few things about where I’m coming from.
I have presented these sketchy solutions, or reasons for hope, to a handful of your objections, not because I am decidedly optimistic, but just to indicate where the counterargument lies. We simply do not know whether there are forms of artificial superintelligence which would naturally coexist well with humanity, or whether it’s a tightrope walk to coexistence, no matter what design you use. That uncertainty alone should be reason enough to stop what we’re doing, but that’s not how our elites see things. To those who want to stop the juggernaut, good luck. But as a theory-minded person, I intend to work on the theory of how to steer the juggernaut so it doesn’t crush us. One reason for this focus is that there truly may be very little time.
I’ll be back when I can.
I think you understand the problem a little better now but you still think alignment is possible, and it’s not, it’s a complete waste of your time to try to solve it.
What you still don’t grasp is that the human is themselves the perfect alignment target, and is something infinitely hard to discover ex nihilo. Anything short of human is apocalyptic when it comes to greater powers than us. You still think otherwise.
There’s a terrible flaw in your logic. If an AI could solve its own alignment it would already be aligned. Because alignment is a continuous process, for the reason I explained, namely that our first value is the ability to choose our preferences, and our preferences shift continuously and inevitably.
I don’t see that consciousness matters to this topic at all.
Optimism is bad. It deliberately seeks out delusions. Pessimism is concerned with predicting the worst in order to be prepared by it. It has no need of the human to reason about it and correct it, once in place. It is the perfect strategy. If it fails, everything fails.
Let me put it in video game terms. Imagine you’re playing a competitive videogame. Being an optimist about how easy the game is means seeking to play with similarly skilled players so that you can “learn better” and at your own pace. Being a pessimist means that if offered the chance to play against nothing but the strongest players, you take it.
Why?
Because any single action you take against these players will be a lot more successful against lesser players.
Why?
Because you are inherently training for playing the game in the best possible way. You are forced to do this. Any move the opponent makes that succeeds inherently teaches you about what has no business of working against anyone else(namely the strategy you were employing at the time you were playing against them). You could have done the same thing in a game with lesser opponents and won. But you now have no reason to. You know that winning like that is improbable, provided that the whole playerbase learns as a whole. It is a strategy that works only temporarily and against worse opponents. It will sometimes work against better opponents but very rarely. One day it will not work against anyone(at least asymptotically). You are learning about what strategies will not work over time. This is why you play against better opponents. And this is why you are a pessimist. Because the universe works exactly the same. Being an optimist means training in an environment that is unlikely to produce successful results, and that even if it does, will not do so for long. The best possible thing to do is to train with perfection in mind. And for that you need a better opponent. A stronger orientation towards what can really go wrong.
For which definition of alignment?
There are a number of routes to AI safety.
Alignment roughly means the AI has goals, or values similar to human ones , so that even acting agentively without supervision, it will do what we want , because that’s also it’ what it wants. There is a lot of semantic confusion between people who use “alignment” in an engineering sense,meaning something that renders current AI safe in good enough way—and the people who use it to mean a maths style solution , that applies perfectly to every case.
Control means that it doesn’t matter what the AI wants, if it wants anything, because we can make it do what we want.
Corrigibility means alignment that can be changed once an AI is up and running. Control could be considered extreme corrigibility.
Non agency. Alignment and Control are both responses to agency. A third approach is non-agentic “tool AI” which responds to a specific instruction or request. Current (2025) AI’s are fairly tool like.
That’s not literally true, since there a finite number of people with a finite number of neurons each.
You wouldn’t be able to create a sovereign AI that is given a fixed set of values match human values .. but there are other things you can do to achieve safety , like solving control as opposed to alignment.
Humans aren’t aligned, in the sense of having identical values—there are constant disagreements about basic values. That makes safety via alignment impossible , but doesn’t make safety impossible.
Why? I have seen that asserted, but I have never seen a valid argument for it.
We non-doomers are not convinced by the arguments, we have seen, where they exist at all. Therefore, doomers need better arguments, not more comnversations.
You don’t know for a fact that there are finite people with finite numbers of neurons, but leaving that aside and accepting it, what’s infinitely complex about that set of people and neurons is the information contained within them. That’s because you can’t keep up with two people’s thoughts at the same time, let alone with eight billion of them. The infinity is in the fact that you can’t experience what anyone else does completely. What you get is emergent alignment(understood as similarity, not identicality), but one that is not a choice, but a given, and an old one at that. So, you don’t get alignment this way with AI. In its case, it’s neither a given nor old. That makes it inherently dangerous, no matter what you tell yourself about its structure as you perceive it.
If you have issues with my usage of infinity above, I advise you to understand that words are intrinsically polysemous, and we use them as carriers of meaning, and not the other way around. I am trying to explain things to you, not have a debate about definitions(though we can if you like; except if you do it by simply asserting that I should only use words in the way you prefer, I will simply ask you why and we can go from there). If you didn’t understand what I said, feel free to tell me and I’ll explain it further.
You can’t make a system smarter than you corrigible, because to make something corrigible you have to understand it, but if it’s smarter than you, you don’t in fact understand it. This would be a good place to ask you what you understand by intelligence, since it’s the reason you believe otherwise. Specifically, tell me what sort of system you imagine an AGI to be such that it is both smarter than us and corrigible. Tell me what that means, however you please.
Let’s go through everything you believe “we can do to achieve safety”. Nota bene: when I speak about alignment I employ any one of the definitions you would normally interpret according to context(as it applies to the question of an AGI being safe). Specifically, tell me what it is that you think makes it so we can make systems more complex than ourselves safe. Or, if your claim is that safety doesn’t have to be demonstrated because it’s a law of physics, then we can debate what you understand by safety. Kindly tell me what makes it so everything is safe and as such nothing has to be defended as having that trait. Alternatively, if you admit that you don’t know that AGI can be safe, I would ask you why you want to create it to start with. Is there something else you consider more dangerous than AGI that you want AGI to protect you from? Which is? Let’s assume we get into that conversation, and there is such a thing, next you should tell me what reasons you have for finding that thing more dangerous than AGI.
Note, these are mere suggestions, feel free to reply however you want, of course. I will try to align with your understanding such as to make my arguments further regardless, and I’ll gladly update my beliefs by adopting yours if they prove to be more coherent than mine.
Anything short of human is apocalyptic because humans are the only entities we’re aware of that actively seek to protect humanity as a whole. The rest of the cosmos is actively trying to kill us. This is universally true. We have no reason at all to think it is possible for anything not to want to kill us, or not succeed in doing so if we gave it control over our environment. Things observe their own laws of action, including things that resemble us, but are not us. Those laws are not identical with our laws, and as such, are at odds. If given too much discretion, they will erode our experience to the point of deletion.
I would tell you that one of us has a more correct view on this than the other. It is in both our interest that we discover who does. Because regardless of who it is, both parties are interested in either of the two making better decisions. You agree with this, yes?