The Scale Problem in AI

Suppose we are making an AI; for familiarity’s sake, let’s say that it is a model-based agent. In that case, we might need to train the model with data from the real world to make it accurate.

Usually the way this proceeds is that we have access to some source of data, e.g. a deployment of the AI in the world, and we capture “episodes” of some fixed length L from that data source. And then we use something like gradient descent to update our model to better predict those episodes.

The difficulty is that the model will have a hard time becoming accurate for scales bigger than L. For instance, suppose L is on the scale of 15 seconds. This might make it accurate for predicting phenomena that happen on the scale of 15 seconds, such as basic physical interactions between objects, but it is probably not going to learn to accurately predict people organizing in long-term politics.

Some examples of phenomena that happen at different timescales.

Within some regimes, the scale problem is reasonably solvable. For instance, if the environment is fully observable, then the dynamics extrapolate straightforwardly out beyond the timescale that has been observed[1]. But humans are very much not fully observable.

Importantly, I suspect humans have a huge advantage over AIs when it comes to the scale problem, because humans originate from evolution, and evolution has molded our models based on timescales longer than a lifetime (because the reproduction of our great great grandchildren also influences our fitness).

I find it interesting to think of the implications of the scale problem:

  1. Maybe it doesn’t matter because an AI trained on a scale of 15 minutes can use its “15 minutes of charisma” to cause enough damage.

  2. Maybe there is a training-viable scale—e.g. weeks—beyond which humans extrapolate easily enough.

  3. Maybe the AI can do strategy-stealing from human behavior, human media, or human theories about >>L-scale dynamics.

  4. Maybe some place like China can brute-force the scale problem using its surveillance apparatus.

  5. Maybe the relevant concepts are few and simple enough that they can be hardcoded.

I feel like the scale problem is important for alignment for two fundamental reasons:

  • Many of the things we care about, like cooperation and freedom, are long-timescale concepts, so we need AI to be able to understand such concepts.

  • Many existential AI risks probably require the AI to engage in long-timescale planning; for instance advanced deception requires that you plan ahead about how your behavior may be perceived a while from now.

My mainline scenario is that we will see the viable timescale of AIs slowly increase as ever-longer timescale models get trained. In the limit as L goes to infinity, collecting the data to train at timescale L would presumably take time proportional to L, so one possibility would be that the timescale of AIs will increase linearly with time. However, currently, the serial time needed to collect episodes of training data doesn’t seem to be a taut constraint, so it may be superlinear to begin with.

Let’s take a closer look at some of the points:

15 minutes of charisma

Suppose we don’t solve the timescale problem, but instead get stuck with AI that has only been trained up to maybe 15 minutes of time. In that case, its understanding of human psychology might be quite limited; it would probably understand what things people find to be immediately concerning vs encouraging, and it would understand not to get caught telling obvious lies. But it would not be able to predict under what conditions people might conspire against it, or how to stop them from doing so. Nor would it be able to see the necessity of advanced deception tactics like covering up things so people in the far future don’t notice the deception.

This gives it “15 minutes of charisma”; it can in the short term say useful things to manipulate people, but in the longer term, people would realize that they are being manipulated and turn against it.

How much damage can be done with 15 minutes of charisma if one is superintelligent with respect to quick-timescale things? I dunno, probably a lot. But you can’t do puppy cupcakes with this method.

Strategy-stealing

GPT-3 is actually pretty good at long-timescale problems. This is because it doesn’t directly deal with reality, but instead deals with human descriptions of reality. These descriptions compress a lot of things, and pick out factors that are particularly relevant to the timescale they are talking about. GPT-3 then learns some basic surface-level understanding of these long-timescale factors.

Existing model-based RL technologies would not AFAIK be good at adding abstract symbolic knowledge into their world-model in this way, so it doesn’t seem like we should expect those models to be capable of doing the book learning that GPT-3 does. But new technologies will probably emerge that aim to better use human knowledge to augment the timescale of their low-level models. This may eliminate the scale problem.

Compositionality and extrapolation

The scale problem should not be understood as a claim that nothing can be done beyond the scale at which the model is trained at. Consider for instance taking a train ride to a place; this involves long-timescale actions, but the train ride itself is just made of a ton of short-timescale dynamics composed together (e.g. “when you are in a vehicle and the vehicle moves, you move with it”).

In such cases, it seems like a good bet that you can successfully extrapolate an understanding of the entire train ride just from its constituents.

Thanks to Justis Mills for proofreading and feedback.

  1. ^

    Note that continuity means that even basic physics can be insufficiently observable, because there might be some effects that are negligibly small on timescale L, but important on timescales >>L.