The authors have emphasized repeatedly that AI 2027 was and is faster than their mode scenario, which makes doing this kind of evaluation annoying,
We’ve said that it was faster than our median, not our mode. I think it was close to most of our modes at the time of publication, mostly we were at around 2027-2028.
But the evaluation itself seems useful either way, in terms of checking in on how things are going relative to the trajectory that was our best guess conditional on the AGI timelines depicted.
Small point of information—as I’ve heard Daniel (and maybe you?) explain, the ‘mode’ in your language means repeatedly thinking ‘what might happen next, what is most likely’ and sampling forward in that way.
But that’ll systematically underestimate the overall mode of a series of such nonnegative, right-skewed distributions (which would tend to look more like the median).
So I think it could be worth being a bit pedantic about how you describe this.
I was only referring to our AI timelines mode, in this case it’s defined as the most likely year in which superhuman coder arrives.
In general the concept of mode for most of the scenario decisions seems not well defined as e.g. for non-naturally-numeric choices it depends on how you define the categories and what past events you condition on (for the timelines mode we’re conditioning on the starting point but in other cases one might condition on all events thus far).
I would personally describe our process as some mixture of sampling what intuitively feels most likely at each point (which might e.g. correspond to the mode of a natural categorical breakdown or of a distribution conditional on all events thus far, but we mostly didn’t explicitly calculate this), while also optimizing for making things not too degenerate and overall intuitively feel like a plausible trajectory (because by default doing mode every time would look unlike what we actually expect in some sense, because in the real world there will be many surprises).
As an example of how much definitions matter here, if we just conditioned on the previous conditions for each month and sampled what big algorithmic improvements might happen treating this as a categorical variable which enumerated many possible improvements, we might never end up with any specific algorithmic improvements or end up with them quite late in the game. But if we instead assume that we think overall probably some will come before superhuman coder and then pick what we think are the most likely ones even though any individual one may be <50% this quickly (though not totally clear in this case) and <<50% in any individual month, then we end up with neuralese recurrence and shared memory bank right before SC.
Perhaps a simpler example of how categorization matters is that if we break down possible AIs’ goals very granularly then we have the most peobabilities of AIs being very well aligned, relative to any very specific misaligned goal. But we overall have more probability on misalignment in this scenario so we first make that high level choice, then we choose one of the most likely specific misaligned goals.
I want to say this more clearly and simply somehow. Something like ‘adding up a series of conditional modes does not give the overall mode’? (And for nonnegative, right-skewed things like timelines, it’ll systematically underestimate except maybe in the presence of very particular correlations?)
Here’s a try at phrasing it with less probability jargon:
The forecast contains a number of steps, all of which are assumed to take our best estimate of their most likely time. But in reality, unless we’re very lucky, some of those steps will be faster than predicted, and some will be slower. The ones that are faster can only be so much faster (because they can’t take no time at all). On the other hand, the ones that are slower can be much slower. So the net effect of this uncertainty probably adds up to a slowdown relative to the prediction.
Generate an image randomly with each pixel black with 51% chance and white with 49% chance, independently. The most likely image? Totally black. But virtually all the probability mass is on images which are ~49% white. Adding correlations between neighbouring pixels (or, in 1D, correlations between time series events) doesn’t remove this problem, despite what you might assume.
The core problem is that the mode of a high-dimensional probability distribution is typically degenerate. (Aside, it also causes problems for parameter estimation of unnormalized energy-based models, an extremely broad class, because you should sample from them to normalize; maximum probability estimates can be dangerous.)
Statistical mechanics points to the solution: knowing the most likely microstate of a box of particles doesn’t tell you anything; physicists care about macrostates, which are observables. You define a statistic (any function of the data, which somehow summarizes it) which you actually care about, and then take the mode of that. For example, number of breakthrough discoveries by time t.
An intuition you might be able to invoke is that the procedure they describe is like greedy sampling from an LLM, which doesn’t get you the most probable completion.
We’ve said that it was faster than our median, not our mode. I think it was close to most of our modes at the time of publication, mostly we were at around 2027-2028.
But the evaluation itself seems useful either way, in terms of checking in on how things are going relative to the trajectory that was our best guess conditional on the AGI timelines depicted.
Small point of information—as I’ve heard Daniel (and maybe you?) explain, the ‘mode’ in your language means repeatedly thinking ‘what might happen next, what is most likely’ and sampling forward in that way.
But that’ll systematically underestimate the overall mode of a series of such nonnegative, right-skewed distributions (which would tend to look more like the median).
So I think it could be worth being a bit pedantic about how you describe this.
I was only referring to our AI timelines mode, in this case it’s defined as the most likely year in which superhuman coder arrives.
In general the concept of mode for most of the scenario decisions seems not well defined as e.g. for non-naturally-numeric choices it depends on how you define the categories and what past events you condition on (for the timelines mode we’re conditioning on the starting point but in other cases one might condition on all events thus far).
I would personally describe our process as some mixture of sampling what intuitively feels most likely at each point (which might e.g. correspond to the mode of a natural categorical breakdown or of a distribution conditional on all events thus far, but we mostly didn’t explicitly calculate this), while also optimizing for making things not too degenerate and overall intuitively feel like a plausible trajectory (because by default doing mode every time would look unlike what we actually expect in some sense, because in the real world there will be many surprises).
As an example of how much definitions matter here, if we just conditioned on the previous conditions for each month and sampled what big algorithmic improvements might happen treating this as a categorical variable which enumerated many possible improvements, we might never end up with any specific algorithmic improvements or end up with them quite late in the game. But if we instead assume that we think overall probably some will come before superhuman coder and then pick what we think are the most likely ones even though any individual one may be <50% this quickly (though not totally clear in this case) and <<50% in any individual month, then we end up with neuralese recurrence and shared memory bank right before SC.
Perhaps a simpler example of how categorization matters is that if we break down possible AIs’ goals very granularly then we have the most peobabilities of AIs being very well aligned, relative to any very specific misaligned goal. But we overall have more probability on misalignment in this scenario so we first make that high level choice, then we choose one of the most likely specific misaligned goals.
I want to say this more clearly and simply somehow. Something like ‘adding up a series of conditional modes does not give the overall mode’? (And for nonnegative, right-skewed things like timelines, it’ll systematically underestimate except maybe in the presence of very particular correlations?)
Here’s a try at phrasing it with less probability jargon:
The forecast contains a number of steps, all of which are assumed to take our best estimate of their most likely time. But in reality, unless we’re very lucky, some of those steps will be faster than predicted, and some will be slower. The ones that are faster can only be so much faster (because they can’t take no time at all). On the other hand, the ones that are slower can be much slower. So the net effect of this uncertainty probably adds up to a slowdown relative to the prediction.
Does that seem like a fair summary?
Generate an image randomly with each pixel black with 51% chance and white with 49% chance, independently. The most likely image? Totally black. But virtually all the probability mass is on images which are ~49% white. Adding correlations between neighbouring pixels (or, in 1D, correlations between time series events) doesn’t remove this problem, despite what you might assume.
The core problem is that the mode of a high-dimensional probability distribution is typically degenerate. (Aside, it also causes problems for parameter estimation of unnormalized energy-based models, an extremely broad class, because you should sample from them to normalize; maximum probability estimates can be dangerous.)
Statistical mechanics points to the solution: knowing the most likely microstate of a box of particles doesn’t tell you anything; physicists care about macrostates, which are observables. You define a statistic (any function of the data, which somehow summarizes it) which you actually care about, and then take the mode of that. For example, number of breakthrough discoveries by time t.
An intuition you might be able to invoke is that the procedure they describe is like greedy sampling from an LLM, which doesn’t get you the most probable completion.