Why the AI Alignment Problem Might be Unsolvable?

Author’s note 1:

The following is a chapter from the story I’ve been writing which contains, well, it contains what I think is probably a proof that the value alignment problem is unsolvable. I know it sounds crazy, but as far as I can tell the proof seems to be correct. There are further supporting details which I can explain if anyone asks, but I didn’t want to overload you guys with too much information at once, since a lot of those additional supporting details would require articles of their own to explain.

One of my friends, who I shall not name, came up with what we think is also a proof, but it’s longer and more detailed and he hasn’t decided whether to post it.

I haven’t had time yet to extract my own less detailed version from the narrative dialogue of my story, but I thought it was really important that I share it here as soon as possible, since if I’m right, the more time wasted on AI research, the less time we have to come up with strategies and solutions that could more effectively prevent x-risk long term.

Author’s note 2:

This post was originally more strongly worded, but I edited it to tone it down a little. While those who have read Inadequate Equilibria might consider that to be “epistemic humility” and therefore darkside epistemology, I’m worried not enough people on here will have read that book. Furthermore, the human brain, particularly system 1, evolved to win political arguments in the ancestral environment. I’m not sure system 1 is biologically capable of understanding the fact that epistemic humility is bad epistemology. And the contents of this post are likely to provoke strong emotional reactions, as it postulates that a particular belief is false, a belief which rationalists at large have invested a LOT of energy, resources and reputation into. I feel more certain that the contents of this post are correct than is wise to express in a context likely to trigger strong emotions. Please keep this in mind. I’m being upfront with you about exactly what I’m doing and why.

Author’s note 3:

Also, HEAVY SPOILERS for the story I’ve been writing, Earthlings: People of the Dawn. This chapter is literally the last chapter of part 5, after which the remaining parts are basically extended epilogues. You have been warned. Also, I edited the chapter in response to the comments to make things more clear.

-------

Unsolvable

There were guards standing outside the entrance to the Rationality Institute. They saluted Bertie as he approached. Bertie nodded to them as he walked past. He reached the front doors and turned the handle, then pulled the door open.

He stepped inside. There was no one at the front desk. All the lights were on, but he didn’t hear anyone in the rooms he passed as he walked down the hallway, approaching the door at the end.

He finally stood before it. It was the door to Thato’s office.

Bertie knocked.

“Come in,” he heard Thato say from the other side.

Bertie turned the knob with a sweaty hand and pushed inwards. He stepped inside, hoping that whatever Thato wanted to talk to him about, that it wasn’t an imminent existential threat.

“Hello Bertie,” said Thato, somberly. He looked sweaty and tired, with bags under his puffy red eyes. Had he been crying?

“Hi Thato,” said Bertie, gently shutting the door behind him. He pulled up a chair across from Thato’s desk. “What did you want to talk to me about?”

“We finished analyzing the research notes on the chip you gave us two years ago,” said Thato, dully.

“And?” asked Bertie. “What did you find?”

“It was complicated, it took us a long time to understand it,” said Thato. “But there was a proof in there that the value alignment problem is unsolvable.”

There was a pause, as Bertie’s brain tried not to process what it had just heard. Then…

“WHAT!?” Berite shouted.

“We should have realized it earlier,” said Thato. Then in an accusatory tone, “In fact, I think you should have realized it earlier.”

“What!?” demanded Bertie. “How? Explain!”

“The research notes contained a reference to a children’s story you wrote: A Tale of Four Moralities,Thato continued, his voice rising.It explained what you clearly already knew when you wrote it, that there are actually FOUR types of morality, each of which has a different game-theoretic function in human society: Eye for an Eye, the Golden Rule, Maximize Flourishing and Minimize Suffering.”

“Yes,” said Bertie. “And how does one go from that to ‘the Value Alignment problem is unsolvable’?”

“Do you not see it!?” Thato demanded.

Bertie shook his head.

Thato stared at Bertie, dumbfounded. Then he spoke slowly, as if to an idiot.

“Game theory describes how agents with competing goals or values interact with each other. If morality is game-theoretic by nature, that means it is inherently designed for conflict resolution and either maintaining or achieving the universal conditions which help facilitate conflict resolution for all agents. In other words, the whole purpose of morality is to make it so that agents with competing goals or values can coexist peacefully! It is somewhat more complicated than that, but that is the gist.”

“I see,” said Bertie, his brows furrowed in thought. “Which means that human values, or at least the individual non-morality-based values don’t converge, which means that you can’t design an artificial superintelligence that contains a term for all human values, just the moral values.”

Then Bertie had a sinking, horrified feeling accompanied by a frightening intuition. He didn’t want to believe it.

“Not quite,” said Thato cuttingly. “Have you still not realized? Do you need me to spell it out?”

“Hold on a moment,” said Bertie, trying to calm his racing anxiety.

What is true is already so, Bertie thought.

Owning up to it doesn’t make it worse.

Not being open about it doesn’t make it go away.

And because it’s true, it is what is there to be interacted with.

People can stand what is true, for they are already enduring it.

Bertie took a deep breath as he continued to recite in his mind…

If something is true, then I want to believe it is true.

If something is not true, then I want not to believe it is true.

Let me not become attached to beliefs I may not want.

Bertie exhaled, still overwhelmingly anxious. But he knew that putting off the revelations any longer would make it even harder to have them. He knew the thought he could not think would control him more than the thought he could. And so he turned his mind in the direction it was afraid to look.

And the epiphanies came pouring out. It was a stream of consciousness, no—a waterfall of consciousness that wouldn’t stop. Bertie went from one logical step to the next, a nearly perfect dance of rigorously trained self-honesty and common sense—imperfect only in that he had waited so long to start it, to notice.

“So you can’t program an intelligence to be compatible with all human values, only human moral values,” Bertie said in a rush. “Except even if you programmed it to only be compatible with human moral values, there are four types of morality, so you’d have four separate and competing utility functions to program into it. And if you did that, the intelligence would self-edit to resolve the inconsistencies between its goals and that would just cause it to optimize for conflict resolution, and then it would just tile the universe with tiny artificial conflicts between artificial agents for it to resolve as quickly and efficiently as possible without letting those agents do anything themselves.”

“Right in one,” said Thato with a grimace. “And as I am sure you already know, turning a human into a superintelligence would not work either. Human values are not sufficiently stable. Yuuto deduced in his research that human values are instrumental all the way down, never terminal. Some values are merely more or less instrumental than others. That is why human values are over patterns of experiences, which are four-dimensional processes, rather than over individual destinations, which are three-dimensional end states. This is a natural implication of the fact that humans are adaptation executors rather than fitness maximizers. If you program a superintelligence to protect humans from death, grievous injury or other forms of extreme suffering without infringing on their self-determination, that superintelligence would by definition have to stay out of human affairs under most circumstances, only intervening to prevent atrocities like murder, torture or rape, or to deal with the occasional existential threat and so on. If the superintelligence was a modified human it would eventually go mad with boredom and loneliness, and it would snap.

Thato continued. “On the other hand, if a superintelligence was artificially designed it could not be programmed to do that either. Intelligences are by their very nature optimization processes. Humans typically do not realize that because we each have many optimization criteria which often conflict with each other. You cannot program a general intelligence with a fundamental drive to ‘not intervene in human affairs except when things are about to go drastically wrong otherwise, where drastically wrong is defined as either rape, torture, involuntary death, extreme debility, poverty or existential threats’ because that is not an optimization function.”

“So, to summarize,” Bertie began, slowly. “The very concept of an omnibenevolent god is a contradiction in terms. It doesn’t correspond to anything that could exist in any self-consistent universe. It is logically impossible.”

“Hindsight is twenty-twenty, is it not?” asked Thato rhetorically.

Silence.

“So what now?” asked Bertie.

“What now?” repeated Thato. “Why, now I am going to spend all of my money on frivolous things, consume copious amounts of alcohol, say anything I like to anyone without regard for their feelings or even safety or common sense, and wait for the end. Eventually, likely soon, some twit is going to build a God, or blow up the world in any number of other ways. That is all. It is over. We lost.”

Bertie stared at Thato. Then in a quiet, dangerous voice he asked, “Is that all? Is that why you sent me a message saying that you urgently wanted to meet with me in private?”

“Surely you see the benefit of doing so?” asked Thato. “Now you no longer will waste any more time on this fruitless endeavor. You too may relax, drink, be merry and wait for the end.”

At this point Bertie was seething. In a deceptively mild tone he asked, “Thato?”

“Yes?” asked Thato.

“May I have permission to slap you?”

“Go ahead,” said Thato. “It does not matter anymore. Nothing does.”

Bertie leaned over the desk and slapped Thato across the face, hard.

Thato seized Bertie’s wrist and twisted it painfully.

“That bloody hurt, you git!”

“I thought you said nothing matters!?” Bertie demanded. “Yet it clearly matters to you whether you’re slapped.”

Thato released Bertie’s wrist and looked away. Bertie massaged his wrist, trying to make the lingering sting go away.

“Are you done being an idiot?” he asked.

“Define ‘idiot’,” said Thato scathingly, still not looking at him.

“You know perfectly well what I mean,” said Bertie.

Thato ignored him.

Silence.

Bertie clenched his fists.

“In the letter Yuuto gave me before he died, he told me that the knowledge contained in that chip could spell Humanity’s victory or its defeat,” he said angrily, eyes blazing with determination. “Do you get it? Yuuto thought his research could either destroy or save humankind. He wouldn’t have given it to me if he didn’t think it could help. So I suggest you and your staff get back to analyzing it. We can figure this out, and we will.”

Bertie turned around and stormed out of the office.

He did not look back.