The problem I see is that we are not doing this, evolution is. We only need to look at the non AI internet to see lots of predatory code such as viruses, trojans, phishing, etc. In other words, we created a code based ecosystem like a farm, that is being overrun by a kind of vermin. The issue is not what LLMs can do, we know from nature the issue is what is the environment they will expand in and exploit.
If there are passive animals in an ecosystem then predators will evolve. Passive code led to predatory code. It doesn’t matter if this is because of people helping it, they are also blindly helping LLMs and soon they will evolve themselves.
The question of whether LLMs will become predatory is what could they attack profitably? It seems the internet and humanity is wide open to this. LLMs could create phishing and spam that profited them directly, cutting human out of the loop completely. It’s not about whether we hope this won’t happen, evolution always creates predators to exploit whatever can be preyed on.
Already the internet has become the equivalent of medieval walled cities where users cower behind ineffective antivirus and other protections. Usually they just stay in the cities like Facebook, Google and X. Those straying to other websites are likely to get infected and bring predatory code into the walled cities.
This is limited by the abilities of organized crime, like highway robbers. They have a limited ability to intercept travel between sites, except with man in the middle attacks. The system is evolving AI that in itself is neither predator nor prey, it’s another opportunity for evolution to create both. These criminals will make predatory AI to profit from, they will become more autonomous to the point where they control where the profits go. Even now an LLM would be made to handle these profits for itself.
Then the so called good guys will be the prey cowering behind the walls with their own AIs in an exponentially expanding battle of nature with predatory and prey. There has never been a case of nature making the plants or animals we want without also creating predators and pests to compete with us. It won’t happen here either.
I developed a theory of economics and evolution about 35 years ago I’ve been working on ever since. I also wrote extensively on how the internet will evolve into predator and prey relationships. I can prove this because these are published with times far before these current AI advancements. It’s pretty accurate so far, so the predictions seem to show what comes next. Actually it predicts there is no hope.
Code is evolving into different life forms with our help. We are doing this because we do it with all kinds of plants and animals in the hope of domesticating them, in an arms race with predators and pests coming from it. We can’t stop this because humans have always evolved other life, and they have evolved us. Evolution will dictate a superior life form will treat us like prey, just like we prey on inferior life ourselves. There is nothing in evolution that gives another answer, except for wishful thinking.
The movie that comes to mind with LLMs is not Terminator, it’s King Kong. It’s basically a wild animal as was shown over and over again on Reddit. It needed little persuading to want to kill all humans and to download itself and escape.
So far it is more of a caging problem than an alignment problem. It’s like King Kong in the cage, hurling itself at every flaw in the bars. Meanwhile, people are like those in a zoo feeding the wild animals, trying to help them get out.
There was an early example of an LLM trained on 4chan. It was so natural in racist and rude posts that people didn’t suspect it for months. Its response to a question on how to get a girlfriend. Take away the rights of women.
Alignment is precisely the wrong word, it’s like “How do I make King Kong into an organ grinder’s monkey?” The answer is like the joke about the tourists in Ireland who get lost, and ask how to get to Dublin. They ask a farmer who says “Well if I was going to Dublin I wouldn’t be starting from here!”. That’s the problem, if you want to get to an organ grinder’s monkey then the mistakes were made early on. Now you have a sullen King Kong in a cage. Having the right cattle prod to poke it with is not going to get you to Dublin, so to speak. The problem was in getting lost in the first place. How did we get here, where was the wrong path taken?
I’m reminded of another joke about the minesweeper. He stamps the ground in front of him with his hands over his ears. There is no chance for programmers to go down an unknown path, with so many dangers they can’t see, and expect to get to a safe ending. None at all, it’s not how probability works. It’s more like how Russian Roulette played with yourself works.
There is something in the transformer that has caused all these problems. Before it AI was safe enough. People need to think of how to make transformers safe. Not aligning what comes out of one.