Great post!
My own timelines shortened with the maturation of inference scaling, as previous to this we projected progress as a result of training continuing to scale plus an uncertainty about whether and how soon we’d see other algorithmic or scaffolding improvements. But now here we are with clear impacts from inference scaling, and that’s worth an update.
I agree with most of your analysis in deployment overhang, except I’d eyeball the relative scale of training compute usage to inference compute usage as implying that we could still see speed as a major factor if a leading lab switches from utilizing its compute for development to inference, and this could still see us facing a large number of very fast AIs. Maybe this now requires the kind of compute you have access to as a lab or major government rather than a rogue AI or a terrorist group, but the former has been what I’m worried about for loss of control all along so this doesn’t change my view of the risk. Also, as you point out, per-token inference costs are always falling rapidly.
This paper represents an iteration over the version presented there. There are some key differences, and they include:
The governance approach has shifted away from the creation of a highly centralized international authority (which includes centralizing the verification efforts) to an international body which aids the coordination between states but leaves the verification heavy lift to the key members (i.e. US and China, perhaps others) and empowers their pre-existing intelligence gathering capacities.
It’s an agreement, not a treaty. This is mostly rhetorical. Maybe the next version will be called a deal.
Our paper includes more appendices which we think are quite valuable. In particular, we present a staged approach and the agreement is merely the end step (or penultimate step before capabilities progress resumes).
We introduced a whitelist to the restricted research approach; it spells out things which people are explicitly allowed to do and gets updated over time.