[Question] What does a positive outcome without alignment look like?

AI-alignment:
I will take a bet at 10:1 odds that human-level AI will be developed before we have a working example of “aligned AI”, that is an AI algorithm that provably incorporates human values in a way that is robust against recursive self-improvement.
Positive outcome to the singularity:
This is even more of a sucker bet than Foom vs Moof. However, my belief is closer to 1:1 than it is to 100:1, since I think there is a real danger that a hostile power such as China develops AI before us, or that we haven’t developed sufficiently robust institutions to survive the dramatic economic upheaval that human-level AI will produce.

You clearly have some sort of grudge against or dislike of china. In the face of a pandemic, they want basically what we want, to stop it spreading and someone else to blame it on. Chinese people are not inherently evil.

But on to AI. First lets consider the concept of a Nash equilibria. A Nash equilibria is a game theoretic situation, in which everyone is doing the action that most benefits their utility function, conditional on everyone else following the equilibria. In other words, no one person can benefit from doing something different.

Democracy is a Nash equilibria. Given that everyone else is holding elections, arranging themselves into a democratic government and passing laws, its in your own best interests to play along.

Dictatorship is another Nash equilibria. If you are the dictator, you can order whatever you want and get it. If you are anyone else, you better do what the dictator wants, or the dictator will order someone to kill you.

In addition to making sure that AI isn’t developed first by an organization hostile to Western liberal values, we also need to make sure that when AI is developed, it is born into a world that encourages its peaceful development. This means promoting norms of liberty, free trade and protection of personal property. In a world with multiple actors trading freely, the optimal strategy is one of trade and cooperation. Violence will only be met with countervailing force.

This is a description of a Nash equilibria in human society. Their stability depends on humans having human values and capabilities.

In the 20 to 50 year after Moof timeframe, you have some very powerful AI’s running around. AI’s that could easily wipe out humanity if they wanted to. AI’s that have probably got molecular nanotech. AI’s that could make more goods using less resources than any human. If anything a human can produce is less valuable to the AI than the resources needed to keep a human alive, then it isn’t a good time to be a human.

There are probably many Nash equilibria between a group of super-intelligent AI’s, and by steering the surroundings as the AI’s grow, we might be able to exert some choice over which Nash equilibria is chosen. But I don’t see why any of the Nash equilibria between superintelligences will be friendly to humans.

Suppose you had one staple maximising AI, and one paperclip maximiser. One nash equilibrium is working together to fill the universe with a mixture of both, while inspecting each others cognition for plans of defection. Humans are made of atoms that could be a mixture of paperclips and staples.

Another equilibrium could be a war. Two AI’s trying to destroy the other, humans all killed in the crossfire. For humans to survive, you need an equilibrium where the AI’s aren’t shooting at each other, but if one of them converted the humans into an equal mix of staples and paperclips, the other would start shooting. Why would one AI start shooting because the other AI did an action that benefited both equally?

If you have several AI’s and one of them cares about humans, it might bargain for human survival with the others. But that implies some human managed to do some amount of alignment.

No answers.