I consider alignment impossible(yes, literally and completely)
For which definition of alignment?
There are a number of routes to AI safety.
Alignment roughly means the AI has goals, or values similar to human ones , so that even acting agentively without supervision, it will do what we want , because that’s also it’ what it wants. There is a lot of semantic confusion between people who use “alignment” in an engineering sense,meaning something that renders current AI safe in good enough way—and the people who use it to mean a maths style solution , that applies perfectly to every case.
Control means that it doesn’t matter what the AI wants, if it wants anything, because we can make it do what we want.
Corrigibility means alignment that can be changed once an AI is up and running. Control could be considered extreme corrigibility.
Non agency. Alignment and Control are both responses to agency. A third approach is non-agentic “tool AI” which responds to a specific instruction or request. Current (2025) AI’s are fairly tool like.
It is our values that are infinitely complex and can’t be encoded into AI.
That’s not literally true, since there a finite number of people with a finite number of neurons each.
Our values are simply our desired state for the world. This state changes automatically and constantly. An AGI aligned to our values is impossible to create precisely because the most fundamental value we have is the ability to spontaneously decide what our preferred future state is.
You wouldn’t be able to create a sovereign AI that is given a fixed set of values match human values .. but there are other things you can do to achieve safety , like solving control as opposed to alignment.
Humans are not artificially aligned. They are aligned via billions of years of evolution
Humans aren’t aligned, in the sense of having identical values—there are constant disagreements about basic values. That makes safety via alignment impossible , but doesn’t make safety impossible.
Anything short of human is apocalyptic
Why? I have seen that asserted, but I have never seen a valid argument for it.
I think that the only way to stop AGI is to convince as many people that it should be stopped as it would take to actually stop it.
We non-doomers are not convinced by the arguments, we have seen, where they exist at all. Therefore, doomers need better arguments, not more comnversations.
You don’t know for a fact that there are finite people with finite numbers of neurons, but leaving that aside and accepting it, what’s infinitely complex about that set of people and neurons is the information contained within them. That’s because you can’t keep up with two people’s thoughts at the same time, let alone with eight billion of them. The infinity is in the fact that you can’t experience what anyone else does completely. What you get is emergent alignment(understood as similarity, not identicality), but one that is not a choice, but a given, and an old one at that. So, you don’t get alignment this way with AI. In its case, it’s neither a given nor old. That makes it inherently dangerous, no matter what you tell yourself about its structure as you perceive it.
If you have issues with my usage of infinity above, I advise you to understand that words are intrinsically polysemous, and we use them as carriers of meaning, and not the other way around. I am trying to explain things to you, not have a debate about definitions(though we can if you like; except if you do it by simply asserting that I should only use words in the way you prefer, I will simply ask you why and we can go from there). If you didn’t understand what I said, feel free to tell me and I’ll explain it further.
You can’t make a system smarter than you corrigible, because to make something corrigible you have to understand it, but if it’s smarter than you, you don’t in fact understand it. This would be a good place to ask you what you understand by intelligence, since it’s the reason you believe otherwise. Specifically, tell me what sort of system you imagine an AGI to be such that it is both smarter than us and corrigible. Tell me what that means, however you please.
Let’s go through everything you believe “we can do to achieve safety”. Nota bene: when I speak about alignment I employ any one of the definitions you would normally interpret according to context(as it applies to the question of an AGI being safe). Specifically, tell me what it is that you think makes it so we can make systems more complex than ourselves safe. Or, if your claim is that safety doesn’t have to be demonstrated because it’s a law of physics, then we can debate what you understand by safety. Kindly tell me what makes it so everything is safe and as such nothing has to be defended as having that trait. Alternatively, if you admit that you don’t know that AGI can be safe, I would ask you why you want to create it to start with. Is there something else you consider more dangerous than AGI that you want AGI to protect you from? Which is? Let’s assume we get into that conversation, and there is such a thing, next you should tell me what reasons you have for finding that thing more dangerous than AGI.
Note, these are mere suggestions, feel free to reply however you want, of course. I will try to align with your understanding such as to make my arguments further regardless, and I’ll gladly update my beliefs by adopting yours if they prove to be more coherent than mine.
Anything short of human is apocalyptic because humans are the only entities we’re aware of that actively seek to protect humanity as a whole. The rest of the cosmos is actively trying to kill us. This is universally true. We have no reason at all to think it is possible for anything not to want to kill us, or not succeed in doing so if we gave it control over our environment. Things observe their own laws of action, including things that resemble us, but are not us. Those laws are not identical with our laws, and as such, are at odds. If given too much discretion, they will erode our experience to the point of deletion.
I would tell you that one of us has a more correct view on this than the other. It is in both our interest that we discover who does. Because regardless of who it is, both parties are interested in either of the two making better decisions. You agree with this, yes?
For which definition of alignment?
There are a number of routes to AI safety.
Alignment roughly means the AI has goals, or values similar to human ones , so that even acting agentively without supervision, it will do what we want , because that’s also it’ what it wants. There is a lot of semantic confusion between people who use “alignment” in an engineering sense,meaning something that renders current AI safe in good enough way—and the people who use it to mean a maths style solution , that applies perfectly to every case.
Control means that it doesn’t matter what the AI wants, if it wants anything, because we can make it do what we want.
Corrigibility means alignment that can be changed once an AI is up and running. Control could be considered extreme corrigibility.
Non agency. Alignment and Control are both responses to agency. A third approach is non-agentic “tool AI” which responds to a specific instruction or request. Current (2025) AI’s are fairly tool like.
That’s not literally true, since there a finite number of people with a finite number of neurons each.
You wouldn’t be able to create a sovereign AI that is given a fixed set of values match human values .. but there are other things you can do to achieve safety , like solving control as opposed to alignment.
Humans aren’t aligned, in the sense of having identical values—there are constant disagreements about basic values. That makes safety via alignment impossible , but doesn’t make safety impossible.
Why? I have seen that asserted, but I have never seen a valid argument for it.
We non-doomers are not convinced by the arguments, we have seen, where they exist at all. Therefore, doomers need better arguments, not more comnversations.
You don’t know for a fact that there are finite people with finite numbers of neurons, but leaving that aside and accepting it, what’s infinitely complex about that set of people and neurons is the information contained within them. That’s because you can’t keep up with two people’s thoughts at the same time, let alone with eight billion of them. The infinity is in the fact that you can’t experience what anyone else does completely. What you get is emergent alignment(understood as similarity, not identicality), but one that is not a choice, but a given, and an old one at that. So, you don’t get alignment this way with AI. In its case, it’s neither a given nor old. That makes it inherently dangerous, no matter what you tell yourself about its structure as you perceive it.
If you have issues with my usage of infinity above, I advise you to understand that words are intrinsically polysemous, and we use them as carriers of meaning, and not the other way around. I am trying to explain things to you, not have a debate about definitions(though we can if you like; except if you do it by simply asserting that I should only use words in the way you prefer, I will simply ask you why and we can go from there). If you didn’t understand what I said, feel free to tell me and I’ll explain it further.
You can’t make a system smarter than you corrigible, because to make something corrigible you have to understand it, but if it’s smarter than you, you don’t in fact understand it. This would be a good place to ask you what you understand by intelligence, since it’s the reason you believe otherwise. Specifically, tell me what sort of system you imagine an AGI to be such that it is both smarter than us and corrigible. Tell me what that means, however you please.
Let’s go through everything you believe “we can do to achieve safety”. Nota bene: when I speak about alignment I employ any one of the definitions you would normally interpret according to context(as it applies to the question of an AGI being safe). Specifically, tell me what it is that you think makes it so we can make systems more complex than ourselves safe. Or, if your claim is that safety doesn’t have to be demonstrated because it’s a law of physics, then we can debate what you understand by safety. Kindly tell me what makes it so everything is safe and as such nothing has to be defended as having that trait. Alternatively, if you admit that you don’t know that AGI can be safe, I would ask you why you want to create it to start with. Is there something else you consider more dangerous than AGI that you want AGI to protect you from? Which is? Let’s assume we get into that conversation, and there is such a thing, next you should tell me what reasons you have for finding that thing more dangerous than AGI.
Note, these are mere suggestions, feel free to reply however you want, of course. I will try to align with your understanding such as to make my arguments further regardless, and I’ll gladly update my beliefs by adopting yours if they prove to be more coherent than mine.
Anything short of human is apocalyptic because humans are the only entities we’re aware of that actively seek to protect humanity as a whole. The rest of the cosmos is actively trying to kill us. This is universally true. We have no reason at all to think it is possible for anything not to want to kill us, or not succeed in doing so if we gave it control over our environment. Things observe their own laws of action, including things that resemble us, but are not us. Those laws are not identical with our laws, and as such, are at odds. If given too much discretion, they will erode our experience to the point of deletion.
I would tell you that one of us has a more correct view on this than the other. It is in both our interest that we discover who does. Because regardless of who it is, both parties are interested in either of the two making better decisions. You agree with this, yes?