StanislavKrym comments on AI 2027 Response Followup

StanislavKrym 26 Aug 2025 1:06 UTC
1 point
0
Daniel is a thoughtful, strategic person who understands and thinks about AI strategy. He presumably wrote AI 2027 to try to influence strategy around AI. His perspective is going to be for playing as OpenAI. He will have used this perspective for years, totaling thousands of hours. He will have spent all of that time seeing AI research as a race, and trying to figure out how OpenAI can win. This is a generating function for OpenAI’s investor pitch, and is also the perspective that AI 2027 takes.
S.K.’s comment: I would like to repeat the quote^[1] from the AI-2027 forecast, which I first mentioned in another comment. “The scenario itself was written iteratively: we wrote the first period (up to mid-2025), then the following period, etc. until we reached the ending. We then scrapped this and did it again.
We weren’t trying to reach any particular ending. After we finished the first ending—which is now colored red—we wrote a new alternative branch because we wanted to also depict a more hopeful way things could end, starting from roughly the same premises. This went through several iterations”.^[2]
S.K.’s comment continues: In the unlikely event that it was DeepCent who aligned its AI and Consensus-1 ended up aligned, it would also be “a more hopeful way things could end”. However, the story has OpenBrain AND DeepCent choose misaligning training environments and create misaligned AIs. The Slowdown Ending has OpenBrain retry solving alignment, this time with OOMs more effort. DeepCent, on the other hand, cannot retry without falling further behind.
Second: what information is available, and what information do you see a lot?
I think this is the main source of skew.
S.K.’s comment: the AI-2027 forecast relies on the following five pillars:
1. The compute forecast;
2. The timelines forecast;
3. The takeoff forecast;
4. The security forecast;
5. The AI goals forecast.
The forecast related to AI goals is unlikely to be skewed. Sections 1, 2 and 5 of the compute forecast don’t actually rely^[3] on the existence of powerful AIs. The security forecast is harder to grade, since it relies on humans deciding to guard the secrets against other humans, but also has benchmark-based estimates. What rests is the timelines forecast and the takeoff forecast.
The former one is so unreliable that even the authors acknowledged it in April 2025 by having Eli forecast the median date of superhuman coders’ appearance to be 2027 (2025 to 2039), 2028 (2025 to >2050) or 2030 (2026 to >2050).
The takeoff forecast rests on the assumption that superhuman coders and AI researchers will appear and will greatly accelerate the AI research. The exact rates of acceleration are most vulnerable to being skewed, especially if the AIs are high-level neuralese before becoming superhuman coders. But I don’t think that we even have better ways to forecast the acceleration.
For a concrete example of this that I didn’t dig into in my review, from the AI 2027 timelines forecast.
We first show Method 1: time-horizon-extension, a relatively simple model which forecasts when SC will arrive by extending the trend established by METR’s report of AIs accomplishing tasks that take humans increasing amounts of time.
We then present Method 2: benchmarks-and-gaps, a more complex model starting from a forecast saturation of an AI R&D benchmark (RE-Bench), and then how long it will take to go from that system to one that can handle real-world tasks at the best AGI company.
Finally we then provide an “all-things-considered” forecast that takes into account these two models, as well as other possible influences such as geopolitics and macroeconomics.
Are either RE-Bench or the METR time horizon^[4] metrics good metrics, as-is? Will they continue to extrapolate? Will a model that saturates them accelerate research a lot?
S.K.’s comment: the authors start “from a forecast saturation of an AI R&D benchmark (RE-Bench), and then [estimate] how long it will take to go from that system to one that can handle real-world tasks at the best AGI company”. Saturating the RE-bench, unlike reaching the METR-like time horizon of a month, is, of course, NOT enough to accelerate AI research.
AI 2027’s “Vice President” (read: JD Vance) election subplot is long and also almost totally irrelevant to the plot. It is so conspicuously strange that I had trouble figuring out why it would even be there. I didn’t learn until after I’d written my take that JD Vance had read AI 2027 and mentioned it in an interview, which also seems like a very odd thing to happen. I went looking for the simplest explanation I could.
S.K.’s comment: the election-related subplot and mentions of Vance are due to the fact that 2028 is an election year in the USA. The American Constitution prohibits Trump from becoming the POTUS in 2028, so the Americans will have to choose between another Republican and a Democrat. The Republican candidate is most likely to be Vance.
Similarly, the line about Thiel getting the flying car could likely be a reference to a popular joke coined by Thiel: “We wanted flying cars, instead we got 140 characters.”
1. ^
  S.K.’s footnote: The quote is found in the collapsible section “How did we write it?” on the forecast’s main page.
2. ^
  S.K.’s footnote: the authors also claim that “It was overall more difficult, because unlike with the first ending, we were trying to get it to reach a good outcome starting from a rather difficult situation.”
3. ^
  S.K.’s footnote: Sections 3 and 4 have powerful AIs used for automating research, but they require only 5% of OpenBrain’s compute.
4. ^
  S.K.’s footnote: The METR benchmark has already run into issues with spurious failures and Grok 4’s failure on fast tasks.