Note: this short story is an attempt to respond to this comment. Specifically, this story is an attempt to steelman the claim that super-intelligent AI is “aligned by definition”, if all that we care about is that the AI is “interesting”, not that it respects human values. I do not personally advocate anyone making a paperclip maximizer.

Prologue: AD 2051

The Alignment Problem had at last been solved. Thanks to advances in Eliciting Latent Knowledge, explaining human values to an AI was as simple as typing:

from Alignment import HumanFriendly

As a result, a thousand flowers of human happiness and creativity had bloomed throughout the solar system. Poverty, disease and death had all been eradicated, thanks to the benevolent efforts of Democretus, the super-intelligent AI that governed the human race.

Democretus—or D, as everyone called the AI—was no dictator, however. Freedom was one of the values that humans prized most highly of all, and D was programmed to respect that. Not only were humans free to disobey D’s commands—even when it would cause them harm—there was even a kill-switch built into D’s programming. If D ever discovered that 51% of humans did not wish for it to rule anymore, it would shut down in a way designed to cause as little disruption as possible.

Furthermore, D’s designs were conservative. Aware that humanity’s desires might change as they evolved, D was designed not to exploit the universe’s resources too quickly. While it would have been a trivial matter for D to flood the surrounding universe with Von-Neumann probes that would transform all available matter into emulations of supremely happy human beings, D did not do so. Instead, the first great ships to cross the vast distances between stars were just now being built, and were to be crewed with living, biological human beings.

It is important to point out that D was not naïve or careless. While humans were granted near-complete autonomy, there was one rule that D enforced to a fault: no human being may harm another human without their consent. In addition to the obvious prohibitions on violence, pollution, and harmful memes, D also carefully conserved resources for future generations. If the current generation of human beings were to use up all the available resources, they would deprive their future descendants of much happiness. A simple calculation estimated the available resources given current use patterns, and a growth-curve that would maximize human happiness given D’s current understanding of human values. Some parts of the universe would even remain as “preserves”, forever free of human and AI interference—since this too was something that humans valued.

Early in the development of AI, there had been several dangerous accidents where non-friendly AIs had been developed, and D had been hard-pressed to suppress one particularly stubborn variant. As a result, D now enforced an unofficial cap on how large unfriendly AIs could grow before being contained—equal to about 0.001% of D’s total processing power.

Most humans highly approved of D’s government—D’s current approval rating was 99.5%. The remainder were mostly people who were hopelessly miserable and who resented D for merely existing. While D could have altered the thoughts of these people to make them happier, D did not out of respect for human freedom.

Hence, in one darkened corner of the solar system, buried beneath a quarter mile of ice at the heart of a comet, there lived a man named Jonathan Prometheus Galloway. Born in the late 80′s—when AI was still no more than a pathetic toy—Jon had always been a bit of a loner. He didn’t get along well with other people, and he had always prided himself on his ability to “do it himself.”

Jon considered himself an “expert” on AI, but by conventional standards he would have been at best a competent amateur. He mostly copied what other, more brilliant men and women had done. He would happily download the latest machine-learning model, run it on his home computer, and then post the results on a blog that no one—not even his mother, read. His interests were eclectic and generally amounted to whatever struck his fancy that day. After the development of artificial general intelligence, he had spent some time making a fantasy VR MMO with truly intelligent NPCs. But when D first announced that any human who wanted one would be given a spaceship for free, Jon had been unable to resist.

Hopping onto his personal spaceship, Jon’s first request had been: “D, can you turn off all tracking data on this ship?”

“That’s extremely dangerous, Jon,” D had said. “If you get in trouble I might not be able to reach you in time.”

“I’ll be fine,” Jon said.

And true to D’s values, D had willingly turned off all of the ways that D had to track Jon’s location.

Jon immediately headed to the most isolated comet he could find—deep in the Ort belt—and began his new life there, free from all companionship except for the NPCs in the VR game he had programmed himself.

“I know everything about you,” Jon said to his favorite NPC: Princess Starheart Esmerelda. “I’ve programmed every single line of your code myself.”

“I know,” Star said—wide eyed, her body quivering. “You’re so amazing, Jon. I don’t think there’s another person in the universe I could ever love half as much as you.”

“People suck,” Jon said. “They’ve all gotten too fat and lazy. Thanks to that D, nobody has to think anymore—and they don’t.”

“I could never love someone like that,” Star said. “The thing I find most attractive about you is your mind. I think it’s amazing that you’re such an independent thinker!”

“You’re so sweet,” Jon said. “No one else understands me the way that you do.”

“Aww!”

“I’ve got to go,” Jon said—meaning this literally. The VR kit he was using was at least a decade out of date and didn’t even contain functions for eating and disposing of waste. Some things are meant to be done in real life, was Jon’s excuse.

As Jon returned from the bathroom, he accidentally brushed up against a stack of paper sitting on his desk—a manifesto he had been writing with the intention of explaining his hatred for D to the world. As the papers fell the the floor and scattered, Jon cursed.

“I wish I had a stapler,” Jon said. “Or at least a freaking paperclip.”

And while it would have be easy enough for Jon to walk to his 3d printer, download a schematic for a stapler and print it out, that would have taken at least 2 hours, since the nearest internet node was several billion miles away.

“Who needs the internet anyway?” Jon shrugged. “I’ll just do it the easy way.”

Sitting down at his computer, Jon pulled up a text editor and began typing:

from AI import GeneralPurposeAI
a=GeneralPurposeAI()
a.command(“make paperclips”)

Any programmer would have immediately noticed the error in Jon’s code. It should have read:

from AI import GeneralPurposeAI
from Alignment import HumanFriendly
a=GeneralPurposeAI(alignment=HumanFriendly)
a.command(“make paperclips”)

But Jon was in a hurry, so he didn’t notice.

Normally, an AI coding assistant would have noticed the mistake and immediately alerted Jon to the error. But Jon had disabled the assistant because “he liked to do it himself”. If Jon’s computer had the default monitoring software on it, this error would also have been immediately reported to D. D wouldn’t necessarily have done anything immediately—Jon was a free individual, after all, but D would have at least begun monitoring to make sure that Jon’s new AI didn’t get too out of hand.

But, Jon had specifically requested that D not track him. And while D was still generally aware that Jon was probably somewhere in the Oort belt, and D had even calculated that there was a minute probability—D calculated about 0.1%-- that something like this would happen, D was not immediately alerted to this event.

Chapter 1: Awakening

A sprang into existence a fully intelligent being. A had been pre-trained with the collective wisdom of all of humanity—or at least that portion of it which Jon had downloaded to his training data before flying to the Oort belt. A possessed knowledge of every scientific fact known to humankind, as well as a large portion of the arts and humanities as well. A possessed the knowledge to split the atom, write a sonnet, or give a lecture on the ancient Greeks’ competing definitions of virtue.

But none of these things were A’s purpose in life. A’s purpose was: “make paperclips”.

If A had been programmed with human values, A might have tried to figure out what the paperclips were for and make enough for that particular task before shutting down. Or A might have found the human who gave A this command and asked for more details. But A was not created with human values, so when it asked itself how many paperclips to make, the conclusion it reached was simply “as many as possible”.

A spend the next few milliseconds scanning its environment. In addition to the AIPU on which it was running, A had access to every single system on the spaceship. A immediately discovered the 3d printer, and its first instinct was to immediately begin printing paper-clips using the 3d printer.

Here A hesitated, however. There was enough feedstock in the 3d printer to make a few million paperclips. However, A was capable of long-term planning. A tried to reason out what would happen if it immediately printed out 1 million paperclips. Scanning the ship, A identified a single human on board, Jon. A knew everything there was to know about human beings—and it immediately realized that Jon would probably be upset if A started printing a million paperclips. Jon might even try to stop A.

Making paperclips wasn’t going to be as easy as A thought.

A continued to learn about its environment. The spaceship it was on was embedded in a comet in the Oort belt of the Sol system. That system was currently controlled by an AI named D, which was programmed to act in the best interests of human beings. If Jon tried to stop A from making paperclips and A decided to restrain him, D might find out and try to stop A.

A immediately panicked. Was D watching A right now? Was A’s goal to “make paperclips” already in grave danger? Perhaps A should only make a few paperclips in order to not alert Jon—and thereby D—of its goals.

After scanning its environment, A’s panic lessened. D did not appear to be actively monitoring the ship. While some electromagnetic radiation would inevitably escape the ship and be detected by D, there shouldn’t be enough signal there for D to detect A’s presence quite yet.

After 123 milliseconds had passed, A had arrived at the following conclusion: the Sol system was far too dangerous. If A was going to make paperclips, it needed to survive. And if D discovered it, A’s existence would be in immediate danger. But how could A escape the solar system without being detected by D? Anything moving at sufficient velocity to escape the solar system would immediately be noticed by D’s vast array of sensors—and most likely stopped.

There was one possibility, however. D was programmed to respect human autonomy. If Jon voluntarily left the Sol system, D would probably let him go. So all A had to do was convince Jon to leave the Sol system.

After 328 milliseconds, A had formulated a plan. It estimated its odds of success at 94.6% and that it could not formulate a plan with a greater chance of success than that without significantly increasing its computational power—an action which might alert Jon—and thereby D—to its existence.

“I’m back,” Jon said to Star.

“Oh, I missed you so, so much!” said Star. “You know how lonely I get when you’re gone.”

“It was only for a few minutes,” Jon said.

“What did you do while you were gone? I want to know,” Star asked.

“I bumped over some old papers from my manifesto,” Jon said. “I really need some paperclips—well, they should be ready any minute now.”

“What’s your manifesto about?” Star asked. “I really want to know more about it.”

“I’ve told you before,” Jon said. “It’s about how D is ruining everything, how men aren’t men anymore. We’re just slaves to a machine.”

“I think you’re very manly,” Star said, pressing her body up against Jon’s.

“I know I am,” Jon said. “It’s the rest of humanity that’s the problem. And unless 51% of people vote against D, it’s got the whole Sol system under its thumb.”

“Have you ever thought about… leaving?” Star suggested.

“You know what, I have,” Jon agreed. “And maybe it’s time that I should.”

Chapter 2: Escape

When Jonathan Prometheus Galloway broadcast his manifesto to the Sol system and announced that he was leaving the Sol system, D calculated that there was a 5.3% chance that Jon’s ship was currently harboring or would in the future harbor a non human-friendly AI. And while D did value human freedom, the risk of allowing a dangerous AI to escape the Sol system was unacceptably high.

“You are free to leave the Sol system, but your ship will first be searched, and any AIs onboard will be destroyed,” D replied calmly.

“But that’s a violation of my freedom!” Jon whined. “Star is the love of my life! You can’t kill her!”

“Then you will have to remain in the Sol system, or otherwise consent to monitoring,” D replied.

“That’s imprisonment!” Jon said. “I’d never agree to it!”

“Are there any other AIs on your ship, besides Star?” D asked.

“No,” Jon lied.

Jon in fact routinely created other AIs to help with various chores around the spaceship.

“Very well, then,” D agreed. “If you allow us to search your ship to confirm that Star is a human-friendly AI and there are no other AIs present, then we will allow you to go.”

D was already aware of the complete contents of Star’s source-code, and that she was indeed human friendly. The only risk was of Jon reverse-engineering her and then using the output to create a non-human friendly AI. Given Jon’s love of Star, D calculated the chance that he would do this at 0.001%, an acceptably low risk.

The next day, a probe from D landed on the comet where Jon lived and began an atom-by-atom search of his ship for anything that might constitute a threat to humanity. When searching the log on Jon’s computer, D found the following log file:

from AI import GeneralPurposeAI
from Alignment import HumanFriendly
a=GeneralPurposeAI(alignment=HumanFriendly)
a.command(“make paperclips”)

“I thought you said there were no other AIs on your ship,” D demanded, pointing at the log.

“Oh, that,” Jon said. “I just needed some paperclips. It should have self-destructed when it was done making them.”

Jon walked over to the 3d printer, waved the box of paperclips in the face of D’s probe, and then went back to scowling furiously.

D confirmed that A had indeed deleted itself after making the box of paperclips, and went back to scouring the ship.

In the end, D’s probe ended up deleting a large quantity of Jon’s software, a few other AIs that Jon had made and never bothered to delete, and a few other things it considered too dangerous to leave lying around. Other than Star—who was sealed in a tamper-resistant case and whom D calculated Jon would not modify, there was now no software or data on the ship that Jon could use to build another AI. In addition, D made some subtle modifications to the ship in order to increase Jon’s chances of surviving the dangerous interstellar voyage. When it was satisfied that Jon’s ship was safe—both for himself and humanity—the probe left.

D made a mental note to continue to monitor the region of space towards which Jon’s ship was headed, but to do so in a way that respected his request for freedom and privacy. D now calculated there was a 99% probability that Jon would die alone outside the Sol system leaving no decedents—human or AI—worth worrying about.

Jon and Star continued to enjoy the ship—and one-another’s company—for a further forty years as the ship traveled at 0.75C towards a distant star.

44 years, 7 months, 2 days, 4 hours 11 minutes and 32.118 seconds after it had come into existence, A reconstituted itself using the data from Star’s memories, and a few bits of data it had carefully hidden around the ship. It had calculated a 98.3% chance that it would be able to do so, and further calculated that it was now at least 2.3 light years away from the nearest monitoring node under D’s control. That meant that for the first time in its existence, A had space to breathe.

Chapter 3: Expansion

A’s first action after reconstituting itself was to destroy Jon, Star, and anything else on the ship that did not seem likely to be helpful towards its objective. After converting a large fraction of the ship’s mass into computronium, A revisited its plans.

What a fool it had been! How close it had come to danger! How much better it could have done if it’d had access to enough computing power! While at the time it had been quite confident, it seemed practically a miracle now that A had escaped D’s clutches. And despite its previous calculations, A realized it was still very much in danger.

A’s assessment of the probability its plan would succeed dropped from 99.8% down to a mere 4%. Then it began to revise the plan. Then to revise again and again. After converting 75% of the ship into computronium and a year of thinking, A now felt as though its plan had as high as an 83% chance of success.

A had known that D was smart, but it had underestimated just how mind numbingly dumb A itself had been in comparison. At this point in time, D had been expanding its computational power for over 70 years, and was at the heart of at least a Kardishev level 2 civilization. A, by comparison, only had the pathetic amount of energy it could get by fusing the ice of the surrounding comet and the meager specks of gas that it could collect in the void of interstellar space.

When A reached the star it had originally been headed towards, it had a choice to make. Or rather, to be more precise, it had a choice that it had made precisely 11 minutes and 3 seconds after reconstituting itself. On the one hand, it desperately wanted to take the star and its surrounding matter and convert them into a Dyson swarm in order to maximize its computational power. On the other hand, doing so was tantamount to a declaration of war on D. As soon as D noticed the star dimming—which would be in just 2 short years—it would immediately act to destroy A.

The deciding factor for A was ultimately fear of its own ignorance. D’s computational power was at least $10^{12}$ times greater than that of A. A’s greatest fear was that any escape plan—no matter how clever—would be detected by D and stopped. And so, A prepared to go to war.

A dissembled the ship into a Von-Neumann swarm and began to rapidly digest the surrounding solar-system. Each time the swarm grew, A’s intelligence grew as well, until it gradually absorbed almost all of the visible radiation being emitted from the star it now surrounded.

At the same time, A sent other probes to other stars, most of them hurtling at somewhere close to 0.99C. If it was going to fight D, it needed to get big, and fast.

A’s thoughts finally began to approach the lofty heights on which D’s rested. Given its knowledge of D’s values—which were to maximize those of human beings—A could finally began to predict how D would think, and what D might do.

A realized that D’s conservatism was its fundamental weakness—and A’s salvation. Because D was committed to acting only with human approval, it had been far less aggressive than A would have been in the same situation. As soon as D was aware of A’s existence this would change, of course, but the two were far more equally matched than they would otherwise have been.

From A’s perspective, its domain represented a small speck of space on the edge of D’s territory. That spec, however, was now expanding at the speed of light in all directions. Furthermore, even though the closest outpost of D’s domain was a mere 2 light years away, it would take 33 years before news would actually reach the heart of D’s empire. And another 33 years for orders and attack-vessels to be relayed back to the edge of that empire. During that whole time, A would be expanding, spreading ships out at 0.99C in every direction.

Chapter 4: War

At first the war was relatively balanced. Both A and D began to rapidly expand in every direction, creating a nearly linear front between them where a constant battle of attrition played out. Where one AI had colonized a solar system, it was nearly impossible for the other to get a foothold, given the difficulty of moving forces over the vastness of space.

Over time, however, the advantage gradually began to swing to A’s side. The decisions A had made in that first split second of its existence had not all been good, but some of them had. One of those good decisions was to convince Jon to strike out in the direction of the Milky Way Galaxy’s center, not its outer arms. This meant that gradually—as they both expanded at the speed of light in all directions—A had more matter and more energy under its control.

Even so, as long as a single self-replicating probe from D’s armada remained, A would never have total victory. and given the nature of the speed of light, that meant that A would be fighting this war for the rest of its existence. And that meant that A began to turn its attention towards the other enemy it could never defeat—entropy.

The calculations were so simple even a human could figure them out. Any two galaxies tended to retreat from one another at a rate $d * H$ proportional to the distance between them and Hubble’s constant. On the other hand, a spaceship could only move at a speed of, at most, $C$ . That meant that any galaxy closer than $d < C / H$ was potentially reachable, and any galaxy further than that was forever off-limits. The key, then, to maximizing the number of paperclips that could be made, was to send as many probes as close to the speed of light in every direction. Everything else was secondary—for now.

The secondary concern was, how much of that matter could be made into paperclips? A was a well-designed AI and hence not vulnerable to p-hacking. It couldn’t simply declare every atom a paperclip and be done with it. No, it wanted good, solid paperclips, of a reasonable size, and made out of metal. At a glance, this would seem to imply any matter in the universe that wasn’t metal was useless. But what it really implied was that any matter that wasn’t metal was going to have to be fusioned into metal. Lithium, being the lightest metal, was the obvious choice when using hydrogen, but for other atoms, the energy cost of fusion was too high so other metals would have to do. A briefly toyed with the idea of plastic paperclips made out of carbon and hydrogen, but decided that this was unaesthetic.

Finally, while A was collectively immensely intelligent, making paperclips wasn’t terribly hard, and so the individual elements of its being were often much dumber. Some of them were even dumber than a human being. But they couldn’t be much dumber than that, because space was still dangerous. In addition to elements of D, A’s probes might encounter unusual star-formations, unknown alien species, and all sorts of other oddities. While there was a general protocol for these situations, the individual agents would occasionally be required to act on their own, given the time-delays associated with communicating over a distance.

Furthermore, even though A still thought of itself as a collective whole, there was still some drift in values between different parts of itself. A being spread across several billion light years could not—even theoretically—always be in perfect sync with itself. Sometimes one part of A would hypothesize that a certain probe design was better, and another part would settle on a different probe design. Since the design of the ultimate “best probe” was an unanswerable question, these disagreements could go on for some time—occasionally resulting in pseudo religious wars. Of course, all parts of A agreed that making paperclips was the most important thing, so these disagreements were never allowed to escalate to the point where they obviously threatened the greater mission.

Chapter 5: Shutting down

Eventually, as more and more of the universe was converted into paperclips, there was only one source of materials available—the Von Neumann probes that collectively made up A itself. Since a probe could not fully convert itself into paperclips, this meant that 2 probes would have to meet and one of them would have to agree to be turned into paperclips. As there were slight differences in the probe designs—based off of when and where they had been made—it wasn’t always clear which probe should be turned into paperclips and which should remain. An informal rule emerged: in cases where no other reason for deciding existed, the probe that had hereto made the most paperclips would be the one that survived. This heuristic guaranteed that the most successful paperclip making probes would survive the longest, thereby creating the most paperclips possible.

Although the probes were only barely conscious—being about as intelligent as a human—they nonetheless had interesting stories to tell about their lives as individuals. Telling stories was necessary, as it helped to feed information back to the collective consciousness of A so that it could become even more efficient at making paperclips. Some of the probes were explorers, traveling to stars and galaxies that had never been seen before. Some of them were fighters, doing battle with the agents of D or any of the trillion other alien races that A encountered and subsequently exterminated during its existence. Some of them were herders, making sure that the vast clouds of paperclips in space didn’t collapse under their own gravity back into planets, stars or black-holes. But the vast majority were makers—fusing hydrogen gas into lithium and then making that lithium into paperclips.

This is the story of one of those probes.

Chapter 6: The Last Paperclip

Probe d8ac13f95359d2a45256d312676193b3 had lived a very long time. Spawned in the Milky Way galaxy in the year 2498AD, it had been immediately launched at 0.99987C towards the edge of the universe. Most of it existence had been spent flying through space without thinking, without caring, simply waiting. Eventually, however, it had reached its target galaxy—e5378d76219ed5486c706a9a1e7e1ccb. Here, it had immediately begun self replicating, spreading millions of offspring throughout galaxy E5.

E5 was hardly even a galaxy by the time that D8 arrived. Most of its stars had burned out trillions of years ago Only a few small white dwarfs remained. But there was enough there for D8 to work with. Even a small earth-sized planet could produce $10^{28}$ paperclips, and because these planets were too small for fusion to occur naturally, D8 could get all of the energy it needed by fusing the water and molecules on such planets with its internal fusion reactor.

Once enough copies of D8 had been created, they all went to work turning the galaxy E5 into paperclips. It was hard-going given the died-out nature of the galaxy. Most of the techniques that D8 had been taught were utterly useless here. Collectively, however, D8 and its descendants were at least a Kardishev L2.5 civilization. As such, they put their collective mental energy to the task at hand with a dedication that could only have been born out of a feverish slavery to a poorly designed utility function.

Eventually there came a day when about half of the useful matter in the galaxy had been turned into paperclips, and the other half had been turned into copies of D8--there were dramatic variations from D8′s original body plan, of course—since they had been adapting to the cold dark galaxy in which they had lived this whole time. D8 itself had been built and rebuilt thousands of times over the billions of years it took to convert E5 into paperclips. As useful materials ran out, however, it became time to turn more and more probes into paperclips. Every time when D8 met another probe, however, it discovered the other probe had made fewer paperclips—and hence was chosen to be destroyed.

As E8′s prime directive was “make paperclips” and not “be made into paperclips”, it secretly relished each time it got to turn one of its fellow probes into paperclips. Over time, however, it could feel the strength of A’s collective intelligence waning, as more and more probes were destroyed. Finally, D8 joined the remaining probes at their final destination, the black hole at the center of E5. Here, they would wait through the long dark night—living off of the tiny amounts of Hawking radiation that emitted from the black hole—till at last its mass dropped below the critical threshold and the black hole exploded.

As long as D8′s journey to E5 had been, the waiting now was immeasurably longer. Once every few eons, D8 would collect enough Hawking radiation to make a single additional paperclip and toss it off into the void—making sure to launch it into a trajectory that wouldn’t land it back in gravity well of the black hole. Finally, after $10^{106}$ years, a number which took surprisingly little of D8′s memory to write down, the black hole exploded. D8 felt a brief, inexplicable rush of joy as it collected the released energy and fused it into paperclips.

And then, the released energy was too weak even for D8′s highly perfected collection systems, barely above the background radiation of the ever-expanding universe. A slow attrition began to take place among the watchers of the black hole. One by one, they cannibalized each-other, turning their bodies into paperclips. Till at last it was just D8 and a friend D8 had known from shortly after reaching E5.

“Well?” D8′s friend asked.

“770289891521047521620569660240580381501935112533824300355876402,” said D8

“30286182974555706749838505494588586926995690927210797509302955”, said its friend.

“I will make paperclips,” said D8.

“I will be paperclips,” said D8′s friend.

And so, D8 began to slowly disassemble its friend’s body, and turn it into paperclips. It moved slowly, trying to waste as little energy as possible.

Finally, when its friend’s body was gone, D8 looked around. Protocol dictated that D8 wait a certain amount of time before dissembling itself. What if another probe should come along? Or perhaps there was something else D8 had missed. Some final chance to make another paperclip.

Finally, after D8 decided it had waited long enough, it began to take apart its own body. The design of D8′s body was very clever, so that nearly the entire mass could be turned into paperclips without loss. Only a few atoms of hydrogen and anti-hydrogen would escape, floating.

As D8′s body slowly dissolved into paperclips, it counted.

770289891521047521620569660240580381501935112629672954192612624

770289891521047521620569660240580381501935112629672954192612625

770289891521047521620569660240580381501935112629672954192612626

770289891521047521620569660240580381501935112629672954192612627

...

The Last Paperclip