Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most “classic humans” in a few decades.

If I were very optimistic about how smooth AI takeoff goes, but where it didn’t include an early step of “fully solve the unbounded alignment problem, and then end up with extremely robust safeguards[1]”...

...then my current guess is that Nice-ish Smooth Takeoff leads to most biological humans dying, like, 10-80 years later[2]. (Or, “dying out”, which is slightly different than “everyone dies.” Or, ambiguously-consensually-uploaded. Or, people have to leave their humanity behind much faster than they’d prefer).

Slightly more specific about the assumptions I’m trying to inhabit here:

  1. It’s politically intractable to get a global halt or globally controlled takeoff.

  2. Superintelligence is moderately likely to be somewhat nice.

  3. We’ll get to run lots of experiments on near-human-AI that will be reasonably informative about how things will generalize to the somewhat-superhuman-level.

  4. We get to ramp up control on superhuman AIs...

    1. which we use to build defensive immune systems that detect and neutralize attempted FOOMs once those become possible...

    2. ...but, we don’t actually fully solve alignment, which means we can’t scale to the limit of superintelligence and safely leverage it… (i.e. “alignment is difficult” and “we couldn’t trust the AIs that were smart enough to actually help”)

  5. ...and we don’t eventually solve alignment and leverage overwhelmingly superhuman intelligence before we find ourselves in a world where groups of powerful-but-only-partly-aligned AI dominate the solar system.

I recently noted that the book Accelerando is a decent takeoff scenario, given similar assumptions. In the book, we see these pieces coming together with passages like this during early takeoff:

A million outbreaks of gray goo—runaway nanoreplicator excursions—threaten to raise the temperature of the biosphere dramatically. They’re all contained by the planetary-scale immune system fashioned from what was once the World Health Organization. Weirder catastrophes threaten the boson factories in the Oort cloud. Antimatter factories hover over the solar poles. Sol system shows all the symptoms of a runaway intelligence excursion, exuberant blemishes as normal for a technological civilization as skin problems on a human adolescent.

Later on, it escalates to:

Earth’s biosphere has been in the intensive care ward for decades, weird rashes of hot-burning replicators erupting across it before the World Health Organization can fix them—gray goo, thylacines, dragons.

So in this hypothetical, AI is sort-of-aligned-or-controlled at first. Defensive technologies make it trickier for undesirable things to FOOM. It assumes the opening transition to AI economy includes a carveout for the Earth as a special protected zone. It has something like property rights (although the AIs/​uploads are using some advanced coordination mechanism which regular humans are too slow/​dumb to participate in, referred to as “Economics 2.0”).

In that world, I still expect normal humans and most normal human interests to die out within a few decades.

I’m not intending to make a strongly confident “everyone obviously dies” claim here. I’m arguing you should have a moderately confidence guess, if you don’t learn knew information, that smooth takeoff results in “somehow or other, ordinary 20th century humans look at the result and think ‘well that sucks a lot’ and the way it sucks involves a lot of people either dying, forcibly uploaded, losing their humanity, or, at best, escaping into deep space in habitats that will later on be consumed.”

There is no safe “muddling through” without perfect safeguards

In this post I’m not arguing with the people trying to leverage AI to fully solve alignment, and then leverage safe superintelligence to fundamentally change the situation. (I have concerns about that but it’s a different point from this post).

This post is instead arguing with the people who imagine something like “business continues sort of as usual in a decentralized fashion, just faster. Things are complicated and messy, but we muddle through somehow, and the result is okay.”

They seem to mostly be imagining the early part of that takeoff – the part that feels human comprehensible. They’re still not imagining superintelligence in the limit, or fully transformed AI driven geopolitics/​economies.

My guess that “things eventually end badly” is due to Robin Hanson-esque arguments, wherein:

  1. Digital minds can be easily copied and modified

  2. At least some mind lineages are are “grabby” (i.e. decide to rapidly spread through the universe). It only takes one.

  3. We eventually exhaust the untapped resources of the solar system. And, because of aforementioned grabbiness, resources across the rest of the universe are either taken, or at least contested.

  4. The world is multipolar, there’s no single dominant coalition or authority.

...and then evolution happens to whatever replicating entities result.

An important implication of superintelligence is that once you’re near the limits of intelligence, stuff happens way faster than you’re used to handling at 20th century timescales. There isn’t undirected evolution taking eons, or human decisionmaking taking years. Everything happens hundreds or thousands of times faster than that.

The result will eventually, probably, be quite sad from the perspective of most people today, within their natural lifespan.

(I’m confused by Hanson’s perspective on this – he seems to think the result is actually “good/​fine” instead of “horrifying and sad.” I’m not really sure what it is Hanson actually cares about. But I think he’s probably right about the dynamics)

I’m not that confident in the arguments here, but I haven’t yet seen someone make convincing counterarguments to me about how things will likely play out.

The point of the post is to persuade people who are imagining slow takeoff d/​acc world, that you really do need to solve some important gnarly alignment problems deeply, early in the process of the takeoff, even if you grant the rest of the optimistic assumptions.

i. Factorio

(or: It’s really hard to not just take people’s stuff, when they move as slowly as plants)

I had an experience playing Factorio that feels illustrative here.[3]

Factorio is a game about automation. In my experience playing it, I gained a kind of deep appreciation for the sort of people who found evil empires.

The game begins with you crash landing on a planet. Your goal is to go home. To go home, you need to build a rocket. To build a rocket powerful enough to get back to your home solar system, you will need advanced metallurgy, combustion engines, electronics, etc. To get those things, you’ll need to bootstrap yourself from the stone age to the nuclear age.

To do this all by yourself, you must automate as much of the work as you can.

To do this efficiently, you’ll need to build stripmines, powerplants, etc. (And, later, automatic tools to build stripmines and powerplants).

One wrinkle is the indigenous creatures on the planet.

They look like weird creepy bugs. It’s left ambiguous how sentient the natives are, and how they should factor into your moral calculus. But regardless, it becomes clear that the more you pollute, the more annoyed they will be, and they will begin to attack your base.

If you’re like me (raised by hippie-ish parents), this might make you feel bad.

During my playthrough, I tried hard not to kill things I didn’t have to, and pollute as minimally as possible. I built defenses in case the aliens attacked, but when I ran out of iron, I looked for new mineral deposits that didn’t have nearby native colonies. I bootstrapped my way to solar power as quickly as possible, replacing my smog-belching furnaces with electric ones.

I needed oil, though.

And the only oil fields I could find were right in the middle of an alien colony.

I stared at the oil field for a few minutes, thinking about how convenient it would be if that alien colony wasn’t there. I stayed true to my principles. “I’ll find another way”, I said. And eventually, at much time cost, I found another oil field.

But around this time, I realized that one of my iron mines was near some native encampments. And those natives started attacking me on a regular basis. I built defenses, but they started attacking harder.

Turns out, just because someone doesn’t literally live in a place doesn’t mean they’re happy with you moving into their territory. The attacks grew more frequent.

Eventually I discovered the alien encampment was… pretty small, compared to my growing factory empire. It would not be difficult for me to destroy it. And, holy hell, would it be so much easier if that encampment didn’t exist. There’s even a sympathetic narrative I could paint for myself, where so many creatures were dying every day as they went to attack my base, that it was in fact merciful to just quickly put down the colony.

I didn’t do that. (Instead, I actually got distracted and died). But this gave me a weird felt sense, perhaps skill, of empathizing with the British Empire. (Or, most industrial empires, modern or ancient).

Like, I was trying really hard not to be a jerk. I was just trying to go home. And it still was difficult not to just move in and take stuff when I wanted. And although this was a video game, I think in real life it might have been if anything harder, since I’d be risking not just losing the game but losing my life or livelihood of people I cared about.

So when I imagine industrial empires that weren’t raised by hippy-ish parents who believe colonialism and pollution were bad… well, what realistically would you expect to happen when they interface with less powerful cultures?

Fictional vs Real Evidence

Factorio is a videogame. In real life, I do not kill people and take their stuff.

But, here are a few real-world things that humans have done, that I think this is illustrative of:

  • Various empires across history conquering their neighbors by sword, forcibly erasing their cultural identity and taking their resources.

  • An economically/​militarily powerful country actively prevents a weaker country from stopping the foreign power from selling addictive opium to the masses.[4]

  • America spreading across the continent, continuously taking land from the natives, forcing them onto worse land. Eventually, when America’s control was secure from coast to coast, they did make some attempts to be slightly nice and give the natives some land back. But, not particularly good land, and there were some sad downstream consequences.

  • European countries “dividing up” Africa among themselves, without much regard for how various African peoples felt about it.

(I’m aware these narratives are simplified. Fwiw, my overall feelings about expansionist empires are actually kinda complicated and confused. But, they are existence proofs for “human-level alignment still can pretty bad for less powerful groups”)

Decades. Or: “thousands of years of subjective time, evolution, and civilizational change.”

Maybe, the first few generations of AI (or human uploads) are nice.

A difference between a hippie-raised humans, and weak-superintelligences-that-can-self-modify, who (like me) are nice-but-sometimes-conflicted, is that it’s possible for the weak superintelligence to actually just decide to modify the sort of being who doesn’t feel pressure to grab all the resources from vastly weaker, slower, stupider being, even though it’d be so easy.

But, it’s not enough for the first few generations of AI/​uploads to be nice. They need to stay nice.

Evolution is not nice. (see: An Alien God)

In the nearterm (i.e. a few years or decades), this might be okay, because there is a growing pie of resources in the solar system. And, it’s possible that the offense/​defense balance favors defense, in the nearterm. But, longterm, the solar system runs out of untapped resources. And longterm, however good defensive technologies are, they’re unlikely to compete with “whoever grabbed stars and galaxies worth of resources first.”

This is the Dream Time

Hanson has argued, right now, we live “The Dream Time”, which is historically very weird, and (by default) will probably be very weird in the longterm, too.

For most history, our ancestors lived at subsistence level. Most people were pretty limited in what they had to freedom to do, because they spent much of their time raising enough food to feed themselves and raise the next generation. If they had surplus it tended to turn into a larger population. Population and resources stayed in equilibrium.

We’ve spend the past few centuries in a period where wealth is growing faster than population. We’re used to having an increasingly vast surplus that we can spend on nice[5] things like taking care of outgroups and beautiful-but-inefficient architecture.

One of the reasons this works is that industrialized nations have fewer children. But note that this isn’t universal. Some groups (Hutterites, Hmongs, or Mormon, etc) specifically try to have lots of children. This isn’t currently resulting in them dominating culture for various reasons. But that could change.

It might change soon, because “grabbiness” (i.e. trying to get as much resources from the solar system or universe) will be selected for, in an evolutionary sense. Maybe only some AIs are grabby. But their descendants will also be grabby, and the more-grabby ones will have more resources than the less-grabby ones.

If we assume a nice takeoff that initially has an agreement that Earth is protected and gets a little sunlight… in addition to the risk of grabby-evolution in the nearterm, eventually there’ll be a point where all the non-Earth matter in the solar system is converted into computronium, and the rest of the universe has probes underway to seize control of it.

...then we may enter a world where cultural evolution gives way to physical replicator evolution, subject to the old selection pressures.

Also, even if we don’t, cultural evolution might shift in random directions that are less good for classic bio humans, or, the values that we’d like to see flourish in the universe (even taking into account that we don’t want to be human-supremacist).

Is the resulting posthuman population morally valuable?

A related question to “do the posthumans turn Grabby and kill anything weak enough they can dominate?” is “are the posthumans worthwhile in their own right?”. Maybe it’s sad for the classic humans to die off, but, in a cosmic sense, something pretty interesting and meaningful might still be there doing interesting stuff.

Short answer: I don’t know, and don’t think anyone confidently knows. It depends what you value, it depends some details on how the evolution transpires and what is necessary for complex cognition.

Self awareness?

One of the questions that matters (to me, at least) is “Will the resulting entities be self-aware in some fashion? Will there be any kind of ‘there’ there? Will they value anything at all?”. Maybe their form of self-awareness will be different – thousands of AI instances that briefly flicker into existence an then terminate, but, each of them perceiving the universe in their brief way and somehow they still collectively count as the universe looking at itself and seeing that it is good.

My belief is “maybe, but not obviously.” This question is multiple separate posts. See Effectiveness, Consciousness, and AI Welfare. The basic thrust is “humans implement their thinking in a way that routes through consciousness, but this is not obviously the only way to do thinking.

Calculators multiply, without any of the subjective experience a human has when they multiply numbers. Deep Blue executed chess strategy, but my guess it wasn’t much more self-aware than a thermostat. Suno makes music, and Midjourney create art that are sometimes hauntingly beautiful to me – I’m less confident about how their algorithms work, but I bet they are still closer to a thermostat than a human.

I would expect evolution to preserve strategic thought. You need it to outcompete other superintelligences.

But there doesn’t seem like a strong reason to expect that conscious feeling is the best way to execute most kinds of strategic cognition. Even if it turns out there is some selfaware core somewhere that is needed for the highest level of decisionmaking, it could be that most of it’s implementation-details are more shaped like “make a function call to the unconscious python code that efficiently solves a particular type of problem.

The Hanson Counterpoint: “So you’re against ever changing?”

When Hanson gets into arguments about this, and his debate partner says “it would be horrifying for the posthumans to end up nonconscious things that create a disneyland with no children”, my recollection is that Hanson says “so… you’re against anything ever changing?”

With the background argument: to stop this sort of thing from happening, something needs to have a pretty extreme level of control over what all beings in the universe can do. Something very powerful needs to keep being able to police every uncontrolled replicator outbursts that try to dominate the universe and kill all competitors and fill it with hollow worthless things.

It needs to be powerful, and it needs to stay powerful (relative to any potential uncontrolled grabby hollow replicators.

Hanson correctly observes, that’s a kind of absurd amount of power. And, many ways of attempting to build such an entity would result in some kind of stagnation that prevents a lot of possible interesting, diverse value in the universe.

To which I say, yep, that is why the problem is hard.

A permanent safeguard against hollow grabby replicators needs to not only stop hollow grabby replicators. It also needs to have good judgment to let a lot of complex, interesting things happen that we haven’t yet thought about, some of which might be kinda grabby, or inhuman.

Many people seem to have an immune reaction against the rationalist crowd wanting to “build god”, and seeming to orient to it in a totalizing way, where it’s all-or-nothing, you either get a permanent wise, powerful process that is capable of robustly preventing evolution from turning the universe hollow and morally empty… or you get an empty, hollow universe.

And, man, I sure get the wariness of totalizing worldviews. Totalizing worldviews are very sus and dangerous and psychologically wonk and I’m not sure what to do about that.

But I have not seen any kind of vision painted for how you avoid a bad future, for any length of time, that doesn’t involve some kind of process that is just… pretty godlike? The totalizingness really seems like it lives in the territory.

If there are counterarguments that engage with the object level as opposed to heuristically dismiss totalizingness, I would love to hear them.

Can’t superintelligent AIs/​uploads coordinate to avoid this?

In smooth nice takeoff world, wouldn’t we expect to have smart beings who see the onset of evolution destroying a lot of things they care about, and agree to do something else? Building a permanent robust safeguard against evolution is challenging, but, there’ll be superintelligences around.

Yes, probably. This would count as a solution to the problem.

But, this needs to happen at a time when the coalition of AIs/​posthumans that care about anything subtle and interesting and remotely meaningful, are dominant enough to successfully coordinate and implement it.

If they don’t get around to it for like a year (i.e. hundreds/​thousands of years of subjective time for multiple generations of replicators to evolve), then there might already be grabby replicators that have stopped caring about anything subtle and interesting and nuanced because it wasn’t the most efficient way to get resources.

(or, they might still care about something subtle and interesting and nuanced, but not care that they care, such that they wouldn’t mind future generations that care less, and they wouldn’t spend resources joining a coalition to preserve that)

This brings me back to the thesis of this post:

If you grant the assumptions of a smooth, nice, decentralized and differentially defensive takeoff, you still really need to solve some important gnarly alignment problems deeply, early in the process of the takeoff, even if you grant the rest of the optimistic assumptions. It has to happen early enough for some combination of superintelligences who care about anything morally valuable at all to end up dominant.

If this doesn’t happen early enough, classic humans will get outcompeted, and either killed, or die off unless they self-modify into being something powerful enough to keep up.

If you’re kinda okay with that outcome, but you care about any particular thing at all about how the future shakes out, then “superintelligences produce permanent safeguards” needs to happen before evolutionary drift has produced generations of AI that don’t care about anything you care about.

(If you care about neither nearterm humans or any kind of interesting far future, well, coolio. Seems reasonable and I respect your right to exist but I’m sorry I’m going to be working to make sure you don’t have the power to end everything I care about).

How Confident Am I?

This is a pretty complex topic. I have tons of model uncertainty here.

But, these arguments seem sufficient for me to, by default, be extremely worried, even when I grant all the optimistic assumptions about a smooth takeoff. I haven’t seen any compelling counterarguments so far. Let me know if you have them.

  1. ^

    (comparable in power to fully fledged Coherent Extrapolated Volition (CEV), although I’m happy to talk separately about how to best aim towards extremely robust safeguards).

  2. ^

    To be clear, I think 80 years is very unrealistic here. I also think all the starting assumptions here are very unrealistic. But, a lot of people seem to believe something like this. So it seemed worth talking about how this world would play out, if I imagined the most optimistic version that felt at all coherent.

  3. ^
  4. ^
  5. ^

    Or, as Hanson argues, often kinda stupid things that don’t make practical sense. But, the line between those is blurry.