wargames, expert feedback, experience at OpenAI, and previous forecasting successes
I strongly suspect that wargames were involved in a different part of the forecast, when one tried to find out what would happen once the superhuman coders were invented and stolen. Then both[1] sides of the race would integrate the coders in order to make AI research as fast as possible. Next the sides would race hard, ignoring the need to ensure that the AIs are actually aligned. This ignorance would lead to the AIs becoming adversarially misaligned. While the scenario assumes that adversarial misalignment gets discovered, it might also fail to get discovered, in which case the leading company would race all the way to a fiasco.
Expert feedback is the only potential source of estimates related to takeoff speeds after superhuman coders. Trend extrapolation was the method for creating the timelines forecast which isa little better founded in reality than the takeoff forecast, but contained mistakes coupled with the fact that the actual time horizon trend is likely to experience a slowdown instead of being superexponential.
My understanding of timeline-related details
The superexponential trend was likely an illusion caused by the accelerated trend between GPT-4o and o1 (see METR’s paper, page 17, figure 11). While o3, released on April 16, continued the trend, it cannot be said about Grok 4 or GPT-5, released in July and August of 2025. In addition, the failure of Grok 4, unlike GPT-5, COULD have been explained away by xAI’s incompetence.
Kokotajlo already claims to have begun working on AI-2032 branch where the timelines are pushed back, or that “we should have some credence on new breakthroughs e.g. neuralese, online learning, whatever. Maybe like 8%/yr?[2] Of a breakthrough that would lead to superhuman coders within a year or two, after being appropriately scaled up and tinkered with.”
Here I talk about two sides because OpenBrain and other American companies have the option to unite their efforts or the USG can unite the companies by force.
I don’t understand why Kokotajlo chose 8%/yr as the estimate. We don’t know how easy it will be to integrate neuralese into LLMs. In addition, there is Knight Lee’s proposal, my proposal to augment the models with a neuralese page selector which keeps the model semi-interpretable since the neuralese part guides the attention of the CoT-generating part into important places. Oh, and stuff like diffusion models which was actually tried by Google.
I agree with where you believe the wargames were used.
I think trend extrapolation from previous progress is a very unreliable way to predict progress. I would put more stock into a compelling argument for why progress will be fast/slow, like the one I hope I have provided. But even this is pretty low-confidence compared to actual proof, which nobody has.
In this case, I don’t buy extrapolating from past LLM advances because my model is compatible with fast progress up to a point followed by a slowdown, and the competing model isn’t right just because it looks like a straight line when you plot it on a graph.
I strongly suspect that wargames were involved in a different part of the forecast, when one tried to find out what would happen once the superhuman coders were invented and stolen. Then both[1] sides of the race would integrate the coders in order to make AI research as fast as possible. Next the sides would race hard, ignoring the need to ensure that the AIs are actually aligned. This ignorance would lead to the AIs becoming adversarially misaligned. While the scenario assumes that adversarial misalignment gets discovered, it might also fail to get discovered, in which case the leading company would race all the way to a fiasco.
Expert feedback is the only potential source of estimates related to takeoff speeds after superhuman coders. Trend extrapolation was the method for creating the timelines forecast which is a little better founded in reality than the takeoff forecast, but contained mistakes coupled with the fact that the actual time horizon trend is likely to experience a slowdown instead of being superexponential.
My understanding of timeline-related details
The superexponential trend was likely an illusion caused by the accelerated trend between GPT-4o and o1 (see METR’s paper, page 17, figure 11). While o3, released on April 16, continued the trend, it cannot be said about Grok 4 or GPT-5, released in July and August of 2025. In addition, the failure of Grok 4, unlike GPT-5, COULD have been explained away by xAI’s incompetence.
Kokotajlo already claims to have begun working on AI-2032 branch where the timelines are pushed back, or that “we should have some credence on new breakthroughs e.g. neuralese, online learning, whatever. Maybe like 8%/yr?[2] Of a breakthrough that would lead to superhuman coders within a year or two, after being appropriately scaled up and tinkered with.”
Here I talk about two sides because OpenBrain and other American companies have the option to unite their efforts or the USG can unite the companies by force.
I don’t understand why Kokotajlo chose 8%/yr as the estimate. We don’t know how easy it will be to integrate neuralese into LLMs. In addition, there is Knight Lee’s proposal, my proposal to augment the models with a neuralese page selector which keeps the model semi-interpretable since the neuralese part guides the attention of the CoT-generating part into important places. Oh, and stuff like diffusion models which was actually tried by Google.
I agree with where you believe the wargames were used.
I think trend extrapolation from previous progress is a very unreliable way to predict progress. I would put more stock into a compelling argument for why progress will be fast/slow, like the one I hope I have provided. But even this is pretty low-confidence compared to actual proof, which nobody has.
In this case, I don’t buy extrapolating from past LLM advances because my model is compatible with fast progress up to a point followed by a slowdown, and the competing model isn’t right just because it looks like a straight line when you plot it on a graph.