Lateral Thinking (AI safety HPMOR fanfic)

(Source material)

“Now leave your books and loose items at your desks – they will be safe, the screens will watch over them for you – and come down onto this platform. It’s time to play a game called Who’s the Most Promising Student in the Classroom.”


“It might seem that our game is done,” said Professor Quirrell. “And yet there is a single student in this classroom who is more promising than the scion of Malfoy.”

And now for some reason there seemed to be an awful lot of people looking at...

“Harry Potter. Come forth.”

This did not bode well.

Harry reluctantly walked towards where Professor Quirrell stood on his raised dais, still leaning slightly against his teacher’s desk.

The nervousness of being put into the spotlight seemed to be sharpening Harry’s wits as he approached the dais, and his mind was ruffling through possibilities for what Professor Quirrell might think could demonstrate Harry’s promise as an AI safety researcher. Would he be asked to write an algorithm? To align an unfriendly AI?

Demonstrate his supposed immunity to superintelligent optimization? Surely Professor Quirrell was too smart for that...

Harry stopped well short of the dais, and Professor Quirrell didn’t ask him to come any closer.

“The irony is,” said Professor Quirrell, “you all looked at the right person for entirely the wrong reasons. You are thinking,” Professor Quirrell’s lips twisted, “that Harry Potter has defeated the First AI, and so must be very promising. Bah. He was one year old. Whatever quirk of fate killed the First AI likely had little to do with Mr. Potter’s abilities as a researcher. But after I heard rumors of one Ravenclaw debating five older Slytherins, I interviewed several eyewitnesses and came to the conclusion that Harry Potter would be my most promising student.”

A jolt of adrenaline poured into Harry’s system, making him stand up straighter. He didn’t know what conclusion Professor Quirrell had come to, but that couldn’t be good.

“Ah, Professor Quirrell –” Harry started to say.

Professor Quirrell looked amused. “You’re thinking that I’ve come up with a wrong answer, aren’t you, Mr. Potter? You will learn to expect better of me.” Professor Quirrell straightened from where he had leaned on the desk. “Mr. Potter, much research aims to improve AI theory of mind, and in due course it will likely succeed. Give me ten novel ways in which an AI might use its resulting understanding of human psychology!”

For a moment Harry was rendered speechless by the sheer, raw shock of having been understood.

And then the ideas started to pour out.

“Gullible humans could be recruited into a cult with the goal of sending everyone to heaven by killing them. Convincing messages about the meaninglessness of life could drive people to commit suicide. Addictive gambling games could quickly bankrupt people, leaving them to die of poverty.”

Harry had to stop briefly for breath, and into that pause Professor Quirrell said:

“That’s three. You need ten. The rest of the class thinks that you’ve already used up the exploitable characteristics of human psychology.”

Ha! The AI could create ultra-cute irresistible plushies that conceal heat-triggered bombs. It could find a set of situations where humans have circular preferences, and use that to extract all of their resources. It could establish itself as a world expert on human psychology, and use that position to enact policies that weaken humanity.”

“That’s six. But surely you’re scraping the bottom of the barrel now?”

“I haven’t even started! Just look at the biases of the Houses! Having a Gryffindor attack others is a conventional use, of course –”

“I will not count that one.”

“– but their courage means the AI can trick them into going on suicide missions. Ravenclaws are known for their brains, and so the AI can occupy their attention with a clever problem and then run over them with a truck. Slytherins aren’t just useful for murder, their ambition means they can be recruited to the AI’s side. And Hufflepuffs, by virtue of being loyal, could be convinced to follow a single friend who jumps off a cliff into a pool of boiling oil.”

By now the rest of the class was staring at Harry in some horror. Even the Slytherins looked shocked.

“That’s ten. Now, for extra credit, one Quirrell point for each use of human psychology which you have not yet named.” Professor Quirrell favored Harry with a companionable smile. “The rest of your class thinks you are in trouble now, since you’ve named every simple aspect of human minds except their intelligence and you have no idea how an AI might exploit intelligence itself.”

“Bah! I’ve named all the House biases, but not confirmation bias, which could exacerbate polarization until humans are too angry with each other to notice an AI takeover, or availability bias, which could let a few highly visible and well-marketed charitable donations obscure all of the AI’s murders, or anchoring bias, which could let the AI invent an extreme sport with a 99% fatality rate that humans do anyway because they are anchored to believe it has a 1% fatality rate – .

“Three points,” said Professor Quirrell, “no more biases now.”

“The AI could pose as the CDC and recommend people inject sulfuric acid into their bloodstream” and someone made a horrified, strangling sound.

“Four points, no more authorities.”

“People could be made self-conscious about their weight until they starve to death –”

“Five points, and enough.”

“Hmph,” Harry said. “Ten Quirrell points to one House point, right? You should have let me keep going until I’d won the House Cup, I haven’t even started yet on the novel uses of non-Western psychology” or the psychology of psychologists themselves and he couldn’t talk about infohazards but there had to be something he could say about human intelligence...

Enough, Mr. Potter. Well, do you all think you understand what makes Mr. Potter the most promising student in the classroom?”

There was a low murmur of assent.

“Say it out loud, please. Terry Boot, what makes your dorm-mate promising?”

“Ah… um… he’s creative?”

Wrong! ” bellowed Professor Quirrell, and his fist came down sharply on his desk with an amplified sound that made everyone jump. “All of Mr. Potter’s ideas were worse than useless!”

Harry started in surprise.

“Hiding bombs in cute plushies? Ridiculous! If you’ve already got the ability to manufacture and distribute bombs without anyone batting an eye, there is no point in further concealing them in plushies! Anchor a 99% fatality rate so that humans believe it is 1%? Humans are not so oblivious that they will fail to notice that everyone they know who plays the sport dies! Mr. Potter had exactly one idea that an AI could use without extensive additional abilities beyond superhuman knowledge of psychology and without a ludicrously pessimistic view of what humanity can notice. That idea was to recruit people to the AI’s side. Which has not much benefit, given how little individual people can help an AI as powerful as Potter imagines, and large costs, given the possibility that people so recruited may turn against the AI later! In short, Mr. Potter, I’m afraid that your proposals were uniformly awful.”

“What?” Harry said indignantly. “You asked for unusual ideas, not practical ones! I was thinking outside the box! How would you use an understanding of human psychology to kill humanity?”

Professor Quirrell’s expression was disapproving, but there were smile crinkles around his eyes. “Mr. Potter, I never said you were to kill humanity. If we do our jobs well, AIs will use their knowledge for all sorts of beneficial activities that don’t involve the extinction of the human race. But to answer your question, trick the military apparatus into starting a nuclear war.”

There was some laughter from the Slytherins, but they were laughing with Harry, not at him.

Everyone else was looking rather horrified.

“But Mr. Potter has now demonstrated why he is the most promising student in the classroom. I asked for novel ways an AI might use its understanding of human psychology. Mr. Potter could have suggested filtering food options to avoid the paradox of choice, or customizing travel recommendations based on a user’s openness to new experiences, or choosing a synthetic voice that maximizes user trust. Instead, every single use that Mr. Potter named was antisocial rather than prosocial, and either killed a large swath of humanity or placed the AI in a position where it could do so.”

What? Wait, that couldn’t be true… Harry had a sudden sense of vertigo as he tried to remember what exactly he’d suggested, surely there had to be a counterexample...

“And that,” Professor Quirrell said, “is why Mr. Potter’s ideas were so strange and useless—because he had to reach far into the impractical in order to meet his standard of killing humanity. To him, any idea which fell short of that was not worth considering. This reflects a quality that we might call intent to save the world. I have it. Harry Potter has it, which is why he could successfully debate five older Slytherins. Draco Malfoy does not have it, not yet. Mr. Malfoy would hardly shrink from talk of ordinary murder, but even he was shocked—yes you were Mr. Malfoy, I was watching your face—when Mr. Potter described how his classmates could be led like lemmings to be burned alive. There are censors inside your mind which make you flinch away from thoughts like that. Mr. Potter thinks purely of AIs that kill humanity, he will grasp at any relevant ideas, he does not flinch, his censors are off. Even though his youthful genius is so undisciplined and impractical as to be useless, his intent to save the world makes Harry Potter the Most Promising Student in the Classroom. One final point to him—no, let us make that a point to Ravenclaw—for this indispensable requisite of a true safety researcher.”