There is an unresolved problem related to the vague meaning of the term “alignment”.
Until we clarify what exactly we are aiming for, the problem will remain. That is, the problem is more in the words we use, than in anything material. This is like the problem with the term “freedom”: there is no freedom in the material world, but there are concrete options: freedom of movement, free fall, etc.
For example, when we talk about “alignment”, we mean some kind of human goals. But a person often does not know his goals. And humanity certainly does not know its goals (if we are talking about the meaning of life). And if we are not talking about the meaning of life and “saving Soul”, then let’s simplify the task, and when we mention “alignment”, we will mean saving the human body. AI can help save a person’s life, if it does not slip him a poison recipe (this is a trivial check, and there seems to be no “alignment” problem here. Modern LLMs checking that).
But if we understand “alignment” in such a vulgar sense, then, there will be those who will see the problem, that the AI does not help them “saving Soul”, or something similar (humans can have an infinite number of abstract goals that they are not even able to formulate it).
Before checking “alignment” we need to, at least, be able to accurately “formulate goals”. As I see it, we (most people) are not yet capable of.
There is an unresolved problem related to the vague meaning of the term “alignment”.
Until we clarify what exactly we are aiming for, the problem will remain. That is, the problem is more in the words we use, than in anything material. This is like the problem with the term “freedom”: there is no freedom in the material world, but there are concrete options: freedom of movement, free fall, etc.
For example, when we talk about “alignment”, we mean some kind of human goals. But a person often does not know his goals. And humanity certainly does not know its goals (if we are talking about the meaning of life). And if we are not talking about the meaning of life and “saving Soul”, then let’s simplify the task, and when we mention “alignment”, we will mean saving the human body. AI can help save a person’s life, if it does not slip him a poison recipe (this is a trivial check, and there seems to be no “alignment” problem here. Modern LLMs checking that).
But if we understand “alignment” in such a vulgar sense, then, there will be those who will see the problem, that the AI does not help them “saving Soul”, or something similar (humans can have an infinite number of abstract goals that they are not even able to formulate it).
Before checking “alignment” we need to, at least, be able to accurately “formulate goals”. As I see it, we (most people) are not yet capable of.