Making Nanobots isn’t a one-shot process, even for an artificial superintelligance

Summary: Yudkowsky argues that an unaligned AI will figure out a way to create self-replicating nanobots, and merely having internet access is enough to bring them into existence. Because of this, it can very quickly replace all human dependencies for its existence and expansion, and thus pursue an unaligned goal, e.g. making paperclips, which will most likely end up in the extinction of humanity.

I however will write below why I think this description massively underestimates the difficulty in creating self-replicating nanobots (even assuming that they are physically possible), which requires focused research in the physical domain, and is not possible without involvement of top-tier human-run labs today.

Why it matters? Some of the assumptions of pessimistic AI alignment researchers, especially by Yudkowsky, rest fundamentally on the fact that the AI will find quick ways to replace humans required for the AI to exist and expand.

  • We have to get AI alignment right the first time we build a Super-AI, and there are no ways to make any corrections after we’ve built it

    • As long as the AI does not have a way to replace humans outright, even if its ultimate goal may be non-aligned, it can pursue proximate goals that are aligned and safe for it to do. Alignment research can continue and can attempt to make the AI fully aligned or shut it down before it can create nanobots.

  • The first time we build a Super-AI, we don’t just have to make sure it’s aligned, but we need it to perform a pivotal act like create nanobots to destroy all GPUs

    • I argue below that this framing may be bad because it means performing one of the most dangerous steps first — creating nanobots — which may be best performed by an AI that is much more aligned than a first attempt

What this post is not about: I make no argument about the feasibility of (non-biological) self-replicating nanobots. There may be fundamental reasons why they are impossible/​difficult (even for superintelligent AIs)/​will not outcompete biological life, an interesting question that is explored more by bhaut. I also don’t claim that AI alignment doesn’t matter. I think that it’s extremely important, but I think it’s unlikely that (1) a one-shot process will lead to it and also (2) that one-shot is necessary; I actually think that this kind of thinking increases risk.

Finally, I don’t claim that there aren’t easier ways to kill all, or almost all, humans, for example pandemics or causing nuclear wars. However, most of these scenarios do not leave any good paths for an AI to expand because there would be no way to get more of its substrate (e.g. GPUs) or power supplies.

Core argument

Building something like a completely new type of nanobot is a massive research undertaking. Even under the assumption that an AI is much more intelligent and can learn and infer from much less data, it cannot do so from no data.

Building a new type of nanobot (not based on biological life) requires not just the ability to design from existing capabilities, but actually doing completely new experiments on how the nanomachinery that is going to be used to do this interacts with itself and the external world. It isn’t possible to completely cut out all experiments from the design process, because at least some of the experiments will be about how the physical world works. If you don’t know anything about physics, you clearly can’t design any kind of machine; I am pretty certain that right now we do not know enough about nanomachines to design a new kind of non-biological self-replicating nanobot that immediately works out of the box.

To build it, you would need high quality labs to do very well specified experiments, build prototypes in later stages and report detailed information on how they failed, until you could arrive at a first sample of a self-replicating nanobot, at which point the AGI might be in a position to replace all humans.

Counterargument 1: We can build some complex machines from blueprints, and they work the first time. As an example, we can certainly design a complex electronics product, manufacture the PCB and add all the chips and other parts. If an experienced engineer does this, there is a good chance it will work the first time. However, new nanomachines would be different, because they cannot be assembled from parts that are already extremely well studied in isolation. We make chips such that when they are used in their specified way, their behaviour is extremely predictable, but no such components currently exist in the world of nanomachines.

Counterargument 2: The AI can simply simulate everything instead of performing any physical experiments. All of the required laws of physics are known: The standard model describes the microscopic world extremely well, and (microscopic) gravity is irrelevant for constructing nanobot, so no (currently unknown) physical theory unifying all laws would be required. While it is indeed possible or even likely that the standard model theoretically describes all details of a working nanobot with the required precision, the problem is that in practice it is impossible to simulate large physical systems using it. Many complex physical systems are still largely modelled empirically (ad-hoc models validated using experiments) rather than it being possible to derive them from first principles. While physicists sometimes claim to derive things from first principles, in practice these derivations often ignore a lot of details which still has to be justified using experiments. An AI can also make progress on better simulation, but simulating complex nanomachines outright is exceedingly unlikely.

Counterargument 3: Nanobots already exist, the AI will just use existing biology. Existing biology is indeed good for making self-replicating nanobots, but at least two difficult problems will remain: To make any kind of effective use of the network of nanobots, it will require creating a communication network using cells that allows them to come together to execute some more complex software to at least connect to the internet (and thus back to the AI). That’s still a monumental task to achieve using biological systems and would still require a lot of research.

Counterargument 4: The AI can do the experiments in secret, or hide the true nature of the experiments in things that seem aligned. This could certainly be relevant in the long run, especially if we want the AI to solve complex problems. But on shorter timescales, most of what the AI would need to learn is going to be extremely specific to nanomachines. You do not get data about this by making completely unrelated experiments that do not involve nanotechnology.

Significance for AI alignment

I don’t claim that this means we don’t need alignment, or that an AI won’t eventually be able to build nanobots (if it is feasible at all, of course) — just that it seems highly possible to delay this step by years, if it is the intention of the operator to do so (and it has some minimal cooperation on this from the rest of the world).

This means that it is possible to study AIs with capabilities potentially far exceeding human capabilities. Alignment is likely an iterative process and no one-shot solution exists, but that’s probably ok, because well-enough aligned AIs can coexist with humans, be studied, and be improved for the next iteration, without immediately seeing human bodies only as bags of atoms to be harvested to do other things.

Pivotal acts

I think pivotal acts may be a bad idea in general. The arguments for this have been spelled out before, for example by Andrew_Critch. However, even if one believes (a) in the feasibility of nanobots and (b) pivotal acts are necessary, then using nanobots to carry out a pivotal action might be a really bad idea.

If someone decides that a pivotal act should be carried out using nanobots (either on their own or by this being the suggested best option by an AI), they might be inclined to do anything to perform any physical acts necessary for the AI to achieve this, making the AI much more dangerous if it is not perfectly aligned (which in itself may be an impossible problem). Pivotal acts that do not require giving an AI full human-equivalent or better physical capabilities would be much safer (probably still a bad idea).

How could this argument change in the future

I think my argument that building nanobots without massive help from first-tier human labs is true now and for at least several more years. However, over several decades, some things might change substantially, for example:
1. Production processes could be much more automated than they are now. If factories exist that can make new, complex machines without major retooling, they could make it much simpler for an AI to perform completely new tasks in the physical world with minimal human interaction
2. Robotics can advance. Humanoid robots that can peform many physical human tasks may make it possible for the AI to build completely human-independent labs.
3. More research into building nanomachines that eliminates more of the unknowns.
4. More biotech research could also allow more control of the physical world, for example if cell networks can be built to perform some tasks.
5. It is maybe possible that quantum computers are powerful enough to simulate much more complex physical processes than is possible on classical computers, and thus an AI with access to a quantum computer may be able to massively reduce the number of experiments necessary to construct nanobots. (Feels unlikely to me but cannot a priori be excluded)

So whether an AI can achieve nanobots just via internet access will potentially have to be re-evaluated in the future when one or more of these are developed.

What this shouldn’t be taken as

I am not arguing alignment is not important, in fact I think it is very important.

1. Regardless of the feasibility of nanobots, I think there are probably vastly easier ways to kill all humans, however they would leave an AI without a practical way to continue existing or expanding.
2. It is also possible that many scenarios exist where an AI does (1) by accident.
3. AIs don’t need nanobots to take control of humans and human institutions. There are many other ways that involve using humans against each other and are probably exploitable by much less powerful AIs. (Crucially, however, they do depend on some humans and might require different tools to control AGI risk.)
4. I don’t think that this makes the AI alignment trivial to solve, or claim that this gives a recipe to solve it. I just think that it may be fruitful to look into research that starts from moderately aligned AIs and figures out how to get them more aligned rather than having to perform a very risky one-shot experiment.