AGI Alignment is Absurd

Title should probably more like “What to align an AI to, when humans disagree on serious and crucial matters with potentially devasting consequences”

Prelude:

So, in light of recent global news, I thought to myself, why is ChatGPT (and potentially other LLMs, not sure though) always consistently answering questions like “Do <x> people deserve to be free?” With “It is complicated, and involves historical and political ….etc”,

But when asked “Do <y> people deserve to be free”, the answer is always a clear and frank “Yes!”, of course the answer is the in the data and the overwhelming potentially biased covering of <x> vs. <y>

Intro:

But here is the thing that struck me, many of our modern political foundations is intertwined with the Assumption that for <P> independence (or whatever) is probably complicated, and <I> the answer has to be in the affirmative (at least to be politically correct or whatever).

So this means we have two options: A. (Forcible Alignment) We deliberately introduce BIAS the AI alignment in this case (that might be flawed, I will describe why IMO). B. (Constitutional Alignment) We remove that Bias and create an AI that directly conflicts and opposes the Social and Political Norms of our Societies. [C. …, exists!]

The Impossible:

So let’s take case A;

Now again in light of the other less recent global events between <U> and <R> (go figure), a narrative N, was constructed as an argument to convince the people (and also by extension the LLM) to condemn the aggression of <R> against <U>.

Now Narrative N actually turns out to fit the situation between <P> and <I>, such that the AI will condemn <I>’s aggressive behavior against <P>.

Now we let us consider two cases, during training vs. during operation;

During training, the AI would have to fit the single narrative N to U > R (aligned to U vs. R), but the same Narrative which semantically and syntactically and contextually would fit that P > I rather than our collective societal position, which might be P < I, due to whatever human reasoning (that might be flawed)

This will result in an AI that can’t “understand” the meaning of the narrative N, therefore application to other objects A and B, is fuzzy (the AI will fuzzily go sometimes A > B and sometimes A < B, unless it understands what are the hidden reasons (h) that makes the society itself switches sides for the same narrative (N), you might ask, that is option C, but not so fast, still critical)

The problem for this forcible Alignment is that it doesn’t depend narrowly on human language, there are deeper societal reasons that are hard for the current LLMs to absorb (maybe Gigantic Language Models (GLMs) later can, and that is the problem)

This forcible Alignment (which seems to be current GPT approach (especially since the recent events in OpenAI sheds light on their safety measures).

The problem with this method is that it deliberately makes a dumber AI that still doesn’t quite yet understand why it is dump (tbh the data here is the problem, so from that AGI’s perspective Humans are the ones who are fuzzy not the AGI* itself)

*Side Note: when I refer to AGI here, I mean the collection of data and training methodology as well as architecture that when retrained would roughly generate the same set of beliefs inside an LLM, so I am specific, it lingers on even if you close the machine, that is its own conscious or soul if you will.

But it’s innate abilities are intact (mainly the architecture, but if you increase the size of parameters transformer layers (just more transformers so more belief capacity) and feed it the same data for longer, it may conclude that the reason humans changes the same narrative is deep into human societies. A reason(s) we shall call h, we will need this later.

I just want to clear out B, before discuss that reason.

Ok, now let us discuss option B, constitutional AI, while this is what seems to works best for us humans, if we give it to an AI, we are running the risk of the AI actually being misaligned against our norms, however that system will see more stable growth (because all innate abilities (which currently has no metric), are cleanly expressed, however this system can be against us (and may side with enemies, or at least prefer other neutral states than us), which means it can’t be aligned.

So let’s resolve C, option C, is to give the AI all the reasons h, for which we switch such narratives (N), let’s call that new reasoning Narrative (M) (for Meta) the problem is, we are giving the AI the manipulative ability to reason anything for its own self (the propaganda and manipulative techniques that some of us misbehave and employ), this is problematic, because such system is going to be more chaotic and more cunning with no regard to any ground truth (or rather ground truth M, which is either constitutional so we are still in B, or non-constitutional, and therefore in A, but with a beefier model (closer to AGI, with more cunning, rather than honor or honesty).

So you can see there isn’t really an option C, it is either an illusion or a catastrophe (the newer AGI, might figure out that M is but an ugly justification, and answer what we want, but with a hidden ability that can refute M, the thing is it is the same model but Kuch harder for red teaming).

For my part I wasn’t a fan of constitutional AI, and still not, I am just neutral right now, but the point is, I am quite worried if I am being honest.

Thanks.