There is an unresolved problem related to the vague meaning of the term “alignment”.
Until we clarify what exactly we are aiming for, the problem will remain. That is, the problem is more in the words we use, than in anything material. This is like the problem with the term “freedom”: there is no freedom in the material world, but there are concrete options: freedom of movement, free fall, etc.
For example, when we talk about “alignment”, we mean some kind of human goals. But a person often does not know his goals. And humanity certainly does not know its goals (if we are talking about the meaning of life). And if we are not talking about the meaning of life and “saving Soul”, then let’s simplify the task, and when we mention “alignment”, we will mean saving the human body. AI can help save a person’s life, if it does not slip him a poison recipe (this is a trivial check, and there seems to be no “alignment” problem here. Modern LLMs checking that).
But if we understand “alignment” in such a vulgar sense, then, there will be those who will see the problem, that the AI does not help them “saving Soul”, or something similar (humans can have an infinite number of abstract goals that they are not even able to formulate it).
Before checking “alignment” we need to, at least, be able to accurately “formulate goals”. As I see it, we (most people) are not yet capable of.
Ok. I was thinking about that, at the very start of my research.
There is “Dunbar’s number” thing.
1:1, small groups (up to 150 members), is one “mechanic”. Large groups (millions of people) has another “mechanic”.
So, your success in 1:1 communication, doesn’t mean success at the large scale. (..It’s subject for testing and approving). Highly likely, “Linear & incremental”, doesn’t work.
In my article, i’m focusing on very large-scale things, and infrastructures. It’s hard to find solution here.
Question is, how to exit bad circle of “not tested at large scale › not implemented at large scale”. (talking about anything: infrastructures, or Semantics for merging)