The idea of an “aligned superintelligence” seems misguided

Link post

For a while I’ve been reading LessWrong, and there’s something that doesn’t make much sense to me, which is the idea that it is possible to align an AGI/​superintelligence at all. I understand that it’s probably not even a majority of discussion on LW that is optimistic about the prospect of AI alignment via technology as opposed to other means, but nonetheless, I think it’s skews the discussion.

I should humbly clarify here that I don’t consider myself in the league of most LW posters, but I have very much enjoyed reading the forum nonethless. The alignment question is a bit like the the Collatz conjecture, seductive, only the alignment question intuitively seems much harder than Collatz!

Alignment is a subject of intense ongoing debate on LW and other places, and of course the developers of the models, though they admit that it’s hard, are still optimistic about the possibility of alignment (I suppose they have to be).

But my intuition tells me something different; I think of superintelligence in terms of complexity, eg, it has a greater ability to manage complexity than humans, and to us it is very complex. In contrast, we are less complex to it and it may find us basic and rather easy to model /​ predict to a good degree of confidence.

I think complexity is a useful concept to think about, and it has an interesting characteristic, which is that it tends to escape your attempts to manage it, over time. This is why, as living organisms, we need to do continual maintenance work on ourselves to pump the entropy out of our bodies and minds.

Managing complexity takes ongoing effort, and the more complex the problem is, the more likely that your model of it will fall short sooner or later, so pretty much guaranteed, in the case of a superintelligence. And this is going to be a problem for AI safety, news to pretty much no-one I expect…

And this is just how life is, and I havn’t seen an example in any other domain that would suggest we can align a superintelligence. If we could, maybe we could first try to align a politician? Nope, we havn’t managed that either, and the problem isn’t entirely dissimilar. What about aligning a foreign nation state? Nope, there are only 2 ways, either there is mutual benefit to alignment, or there is a benefit to one party to behave well because the other has the circumstantial advantage.

I Googled “nature of complexity” just to see if there was anything that supported my intuition on the subject, and the first result is this page, which has a fitting quote:

“Complexity is the property of a real world system that is manifest in the inability of any one formalism being adequate to capture all its properties. It requires that we find distinctly different ways of interacting with systems. Distinctly different in the sense that when we make successful models, the formal systems needed to describe each distinct aspect are NOT derivable from each other.”

My layman’s understanding of what it is saying is can be summed up thusly:

Life Finds a Way (Single) | Dinosaur Love

Complexity will find a way to escape your attempts to control it, via unforseen circumstances that require you to augment your model with new information.

So, I can’t find a reason to believe that we can align an AI at all, except via a sufficient circumstantial advantage. There is no silver bullet here. So if it were up to me (it’s not), I’d:

  • Replace the terminology of “AI alignment” with something else; deconstructability, auditability, but always keeping in mind that an AGI will win this game in the end, so don’t waste time talking about an “aligned AGI”.

  • Do the work of trying to analyze and mitigate the escape risks, be it via nanotechnology spread in the atmosphere or other channels.

  • Develop a strategy, perhaps a working group, to continually analyze the risk posed by distinct AI technologies and our level of preparedness to deal with these risks; it may be that an agentic AGI is simply an unacceptable risk, and the only acceptable approaches are modular, or auditable.

  • Try to find a strategy to socialize the AI, to make it actually care, if not about us then at least first about other AIs (as a first step). Create a system that slows it’s progress somehow. I don’t know, maybe look into entangling it’s parameters using quantum cryptography , or other kinds of distributed architectures for neural networks ¯\(°_o)/​¯

  • Get people from all fields involved. AI researchers are obviously critical but ideally there should be a conversation happening at every level of society, academia, etc (bonus points: Make another movie about AGI, which can potentially be far scarier than previous ones, since the disruption is starting to become part of public consciousness in a more public way).

Maybe, just maybe, if we do all these things we can be ready in 10 years when that dark actor presses the button…

Edit: going to reference here some better formulated arguments that seem to support what I’m saying.

https://​​www.lesswrong.com/​​posts/​​AdGo5BRCzzsdDGM6H/​​contra-strong-coherence

I agree with ^, a general intelligence can realign itself. The point of agency is it will define utility as being whatever maximises it’s predictability /​ power over it’s own circumstances, not according to some pre-programming.