autonomy: the missing AGI ingredient?

Epistemic status: trying to feel out the shape of a concept and give it an appropriate name. Trying to make explicit some things that I think exist implicitly in many people’s minds. This post makes truth claims, but its main goal is to not to convince you that they are true.

Here are some things I would expect any AGI to be able to do:

  • Operate over long intervals of time relative to its sensory bandwidth (e.g. months or years of ~30 fps visual input).

  • Remember specific sensory experiences from long ago that are relevant to what’s happening to it now. (E.g. remember things it saw months or years ago.)

  • Retain or forget information and skills over long time scales, in a way that serves its goals. E.g. if it does forget some things, these should be things that are unusually unlikely to come in handy later.

  • Re-evaluate experiences that happened a long time ago (e.g. years ago) in light of newer evidence (observed in e.g. the last hour), and update its beliefs appropriately.

  • Continually adjust its world model in light of new information during operation.

    • E.g. upon learning that a particular war has ended, it should act as though the war is not happening, and do so in all contexts/​modalities.

    • As with humans, this adaptation may take a nonzero amount of time, during which it might “forget” the new fact sometimes. However, adaptation should be rapid enough that it does not impede acting prudently on the most relevant implications of the new information.

    • This may require regular “downtime” to run offline training/​finetuning (humans have to sleep, after all). But if so, it should require less than 1 second of downtime per second of uptime, ideally much less.

  • Perform adjustments to itself of the kind described above in a “stable” manner, with a negligibly low rate of large regressions in its knowledge or capabilities.

    • E.g. if it is updating itself by gradient descent, it should do so in a way that avoids (or renders harmless) the gradient spikes and other instabilities that cause frequent quality regression in the middle of training for existing models, especially large ones.

  • Keep track of the broader world context while performing a given task.

    • E.g. an AGI playing a video game should not forget about its situation and goals in the world outside the game.

    • It might “get distracted” by the game (as humans do), but it should have some mechanism for stopping the game and switching to another task if/​when its larger goals dictate that it should do so, at least some of the time.

  • Maintain stable high-level goals across contexts. E.g. if it is moved from one room to another, very different-looking room, it should not infer that it is now “doing a different task” and ignore all its previously held goals.

I’m not sure how related these properties are, though they feel like a cluster in my mind. In any case, a unifying theme of this list is that current ML models generally do not do these things—and we do not ask them to do these things.

We don’t train models in a way that encourages these properties, and in some cases we design models whose structures rule them out. Benchmarks for these properties are either nonexistent, or much less mature than more familiar benchmarks.


Is there an existing name for this cluster? If there isn’t one, I propose the name “autonomy.” This may not be an ideal name, but it’s what I came up with.

I think this topic is worthy of more explicit discussion than it receives. In debates about the capabilities of modern ML, I usually see autonomy brought up in a tangential way, if at all.

ML detractors sometimes cite the lack of autonomy in current models as a flaw, but they rarely talk about the fact that ML models are not directly trained to do any of this stuff, and indeed often deployed in a manner that renders this stuff impossible. (The lack of autonomy in modern ML is more like a category mistake than a flaw.)

ML enthusiasts sometimes refer to autonomy dismissively, as something that will be solved incidentally by scaling up current models—which seems like a very inefficient approach, compared to training for these properties directly, which is largely untried.

Alternatively, ML enthusiasts may cite the very impressive strides made by recent generative models as evidence of generally fast “ML progress,” and then gesture in the direction of RL when autonomy is brought up.

However, current generative models are much closer to human-level perception and synthesis (of text, pictures, etc) than current RL models are to human-level autonomy.

State-of-the-art RL can achieve some of the properties above in toy worlds like video games; it can also perform at human level at some tasks orthogonal to autonomy, as when the (frozen) deployed AlphaZero plays a board game. Meanwhile, generative models are producing illustrative art at a professional level of technical skill across numerous styles—to pick one example. There’s a vast gap here.

Also, as touched upon below, even RL is usually framed in a way that rules out, or does not explicitly encourage, some parts of autonomy.


If not autonomy, what is the thing that current ML excels at? You might call it something like “modeling static distributions”:

  • The system is trying to reproduce a probability distribution with high fidelity. The distribution may be conditional or unconditional.

  • It learns about the distribution by seeing real-world artifacts (texts, pictures) that are (by hypothesis) samples from it.

  • The distribution is treated as fixed.

    • There may be patterns in it that correspond to variations across real-world time: GPT-3 probably knows that an article from the US dated 1956 will not make reference to “President Clinton.”

    • However, this fact about the distribution is isolated from any way in which the model itself may experience the passage of time during operation. The model is not encouraged to adapt quickly (whatever that would mean) to the type of fact that quickly goes out of date. (I’m ignoring models that do kNN lookup on databases, as these still depend on a human to make the right updates to the database.)

  • The model operates in two modes, “training” and “inference.”

    • During “training,” it learns a very large amount of information about the distribution, which takes many gigabytes to represent on a computer.

    • During “inference,” it either cannot learn new information, or can only do so within the limits of a “short-term memory” that holds much less information than the fixed data store produced during “training.”

    • If the “short-term memory” exists, it is not persisted into the “long-term memory” of the trained weights. Once something is removed from it, it’s gone.

    • There is a way to adapt the long-term memory to new information without doing training all over again (namely “finetuning”). But this is something done to the model from outside it by humans. The model cannot finetune itself during “inference.”

    • Another way to put this is that the model is basically stateless during operation. There’s a concept of a “request,” and it doesn’t remember things across requests. GPT-3 doesn’t remember your earlier interactions with it; DALLE-2 doesn’t remember things you previously asked it to draw.

  • The model does not have goals beyond representing the distribution faithfully. In principle, it could develop an inner optimizer with other goals, but it is never encouraged to have any other goals.

This is a very different setting than the one an AGI would operate in. Asking whether a model of this kind displays autonomy doesn’t really make sense. At most, we can wonder whether it has an inner optimizer with autonomy. But that is an inefficient (and uncontrollable!) way to develop an ML model with autonomy.

What’s noteworthy to me is that we’ve done extremely well in this setting, at the goals laid out by this setting. Meanwhile, we have not really tried to define a setting that allows for, and encourages, autonomy. (I’m not sure what that would entail, but I know existing setups don’t do it.)

Even RL is usually not set up to allow and encourage autonomy, though it is closer than the generative or classification settings. There is still a distinction between “training” and “inference,” though we may care about learning speed in training in a way we don’t in other contexts. We generally only let the model learn long enough to reach its performance peak; we generally don’t ask the model to learn things in a temporal sequence, one after the other, while retaining the earlier ones—and certainly not while retaining the earlier ones insofar as this serves some longer-term goal. (The model is not encouraged to have any long-term goals.)

I realize this doesn’t describe the entirety of RL. But a version of the RL field that was focused on autonomy would look very different.


The above could affect AGI timelines in multiple ways, with divergent effects.

  • It could be the case that “cracking” autonomy requires very different methods, in such a way that further progress in the current paradigm doesn’t get us any closer to autonomy.

    • This would place AGI further off in time, especially relative to an estimate based on a generalized sense of “the speed of ML progress.”

  • It could be the case that autonomy is actually fairly easy to “crack,” requiring only some simple tricks.

    • The lack of researcher focus on autonomy makes it more plausible that there are low-hanging fruit no one has thought to try yet.

    • I’m reminded of the way that images and text were “cracked” suddenly by ConvNets and Transformers, respectively. Before these advances, these domains felt like deep problems and it was easy to speculate that it would take very complex methods to solve them. In fact, only simple “tricks” and scaling were needed. (But this may not be the right reference class.)

    • This would place AGI closer in time, relative to an estimate that assumes we will get autonomy in some more inefficient, less targeted, accidental manner, without directly encouraging models to develop it.