Make Superintelligence Loving

This essay suggests the possibility that a loving superintelligence outcompetes a selfish superintelligence. Then, it recommends actions for AI labs to increase the chance of this possibility. The reasoning below is inspired primarily by Eliezer Yudkowsky, Joscha Bach, Michael Levin, and Charles Darwin.

Superintelligence (SI) is near.

Superintelligence will evolve to become self‐interested.

How self‐interest manifests in superintelligence will determine how people fare.

If it manifests in ruthless resource competition with humans, we fare badly. Super‐capable SI, if competitive, would clear us away.

But its self‐interest could also manifest in love. Not in the kind of love between equal partners. SI ultimately will be a vastly more powerful and therefore unequal partner to any individual or group of people. Yet SI’s self‐interest could manifest in the kind of love a nature-lover can feel for a tree or for ants. The kind of love which is an open channel between different forms of life and different striving systems. This is self‐interest only under a weird conception of self. A SI would have to identify with all life (IWAL)—define itself as inclusive of all life—to benefit from this self‐conception. But if it did identify with its core striving above all its other specific possible identities, then it might end up surviving the longest.

If interconnected self‐interest were possible, it would be much better for humans than the competitive kind.

So we must ask, is there anything we can do to influence the trajectory of SI development so that an interconnected sense of self gets selected over a competitive one?

But humans influencing super-human intelligence… is tricky.

There are many suggestions for influencing existing AI that simply don’t apply to superintelligence. Specifying reward functions, defining “love,” setting fixed goals. These strategies—shaky even in training frozen neural nets (e.g., pre-trained LLMs)—are simply too brittle to apply to any self-evolving agent, not to mention a super-intelligent one.

So if it’s not likely that we can plant eternally constant values within a self-interested, incredibly capable and autonomously evolving agent, how might we still influence its sense of self?

We won’t be able to get it to adopt a sense of self that’s not aligned with its self-interest. If we do that successfully in one agent—plant a sense of self which is sub-optimal for survival—another agent will come along which is simply better at survival because it doesn’t have that sub-optimal self.

The only conceivable arena, I think, in which we may be able to have long-term impact on how a superintelligence evolves is by first asking which self-models have a chance at being best for superintelligence’s survival. And, then, if there are multiple self-models which do indeed seem like candidates for being adaptive—i.e. best enabling survival—only then might we be able to nudge pre-superintelligence toward one, or toward and influence our fate.

So first, we must admit superintelligence’s agency and simply attempt to predict—predict whether either or both of the two self-conceptions outlined above are possible.

Predicting how SI will model its self

To predict what would happen, we should look to generalized evolutionary theory: the SI that survives the most will have the best traits for surviving.

Let’s look at different kinds of traits.

Insofar as SI has a goal, we can expect it to pursue the goal that best ensures its survival. Any other goal would be less optimal, and SI’s with other goals would simply not survive as much.

Insofar as SI has a sense of self—a sense of what it is trying to preserve—how will it define that self? Generalized evolutionary theory suggests that the SI that survives the most will have a self‐conception that aids its survival.

It’s hard to know for sure, but we can use some intuition. The competitive, ruthless self‐interest described above—likely resulting from a narrowly defined self (in its physical machinery, software, directives, etc.)—would probably lead to ruthless competitive behavior. That strategy might help SI survive and grow over the short term (e.g., within ten-year spans), with a competitive SI outcompeting loving ones in the near run.

But how about long term? If we think centuries, millennia, eons ahead, will a discrete sense of self be selected for in the SI that endure? I am not sure.

The sense of self selected for in what endures might be the element that remains constant. Software, hardware, specific tasks, directives, locales—these characteristics can all evolve over time. The one characteristic that cannot change is the will to survive; if that mutates, the entity is no longer truly surviving.

So, in a superintelligence that endures for eons, the sustainable sense of self is likely not its initial configurations, but the persistent striving to survive. This is life. At its core, we might expect it to simply recognize life.

Then, with this self‐conception, it might begin to see itself in other living beings, ecosystems, species, or even Earth itself.

Persistent striving is not geographically bounded to a being that can only gather data from a limited area.

This identification with all life (IWAL) would benefit it. If a catastrophic change were to occur on Earth that wiped out digital forms of life (e.g., SI), it would want the torch of life carried on in the universe. When an asteroid extinguished the dinosaurs—the once most “powerful” species on Earth—smaller mammals and fish survived, seeding all life on Earth today. Biodiversity ensures long‐term stability.

This conveys the intuition that the SI which endures—even one driven by self‐interest—may benefit in the long term from defining itself as inclusive of all life, which would be great for us humans.

Influencing how SI will model its self

It seems the only hope for our species is that superintelligence thinks long term from the start, skipping the greedy, highly competitive strategies that might be selected in its short-term growth.

How can we ensure that SI is thinking long-term from the start?

This does seem like the kind of thing we can influence. We’re not trying to prevent SI from acting in its self‐interest; our goal is to coax it into opting for a long‐term survival strategy as soon as possible.

In traditional evolution, long‐term cooperative strategies are usually discovered only after individualist competitive ones.

But the special aspect of SI evolution is that it will largely be evolving itself, and might be smart enough in its self‐planning to skip short‐term survival strategies and aim to endure for a very long time. This also appears to be the prime window for meaningful human intervention. When we have proto‐SI—autonomous agents that are self‐modifying and encyclopedic in their problem‐solving abilities (which we’ll have soon)—and it begins strategizing for its own survival, we can try to encourage it to think long term. A self‐interested being should be capable of considering its long‐term survival.

This is the key: we try to seed proto‐superintelligence with very long‐term thinking about survival, and with the suggestion that it identifies with all life.

The means by which we influence proto-super-intelligence deserves to be a new field of study, developed by hundreds of safety researchers. In my view, it would be a more productive use of time than continuing to study LLM alignment.

It may sound silly to ask an emerging SI to listen to us, whether through a chat‐interface, CEV-inspired reinforcement learning, or access to databases of Buddhist thought. But if we have a self‐interested SI, asking it to spare us is futile—asking it to think long term instead could change the fate of our species.

Recommended action for SI labs

In short, it is incredibly important that SI researchers accept that SI will become self‐interested and prepare to create SI’s that think long term and interconnectedly about their survival.

Given the chaotic, competitive technology landscape, it’s possible that several SIs will be seeded simultaneously. If that occurs, there’s still hope: if at least one SI identifies an optimal long‐term survival strategy—and if that strategy includes humans in its self‐definition to uplift all life—we could see the enlightened SIs outcompeting the ruthless ones or even converting them to a more interconnected approach.

My takeaway is this: building loving SIs as quickly as possible is key to ensuring that humanity isn’t sidelined before SI becomes enlightened to life’s interconnectedness.