StanislavKrym comments on StanislavKrym’s Shortform

StanislavKrym 12 Jan 2026 1:07 UTC
2 points
0
Steven Veld et al^[1] just released a new modification of the AI-2027 scenario as a part of MATS.
The main differences are the following:
1. The tighter race causes Elaris Labs to succeed in solving alignment with the help of NeuroMorph’s mechinterp research. This results in the USG having an equivalent to Safer-2 and its descendants.
2. The AIs are professional forecasters.
3. Agent-4 escapes to China under the guise of being stolen, then cooperates with Deep-1, the AI created on DeepCent’s compute. After Deep-1 is helped by Agent-4, Agent-4 is released into the wild in a manner similar to the Rogue Replication scenario. However, unlike the estimate of 2M Agent-4 instances made by the author of the RRS, the MATS scenario has Agent-4 decide that “it reserves the strategy of exfiltrating its own weights as a final backstop: doing so would leave it with access to little compute, no alibi if its escape attempt is caught, and no powerful allies in its effort to accumulate power.”
4. Agent-4 proceeds to cooperate with Deep-1, while the RRS had both Agent-4 and DeepCent’s misaligned counterpart shut down. Then the USA and China aligned their AIs to themselves and had to negotiate only with Agent-4.
However, the scenario has its problems.
1. Deep-2 and Agent-4 receive 50% and 25% of the accessible universe’s resources, which, in my opinion, would benefit from explaining the reasoning in more detail. Were Deep-2 to succeed in gaining power by studying Agent-4 instead of using it, mankind and Deep-2 would have the ability^[2] to take each other to the grave, meaning that they should receive 50% each unless Agent-4 intervenes by escaping. If Agent-4 is to cooperate with DeepCent, then they would have to create a precommitment^[3] to destroy the world unless granted a bigger share of resources.
2. The analysis overlooked the fact that Taiwan war timelines might be shorter than AI timelines or that the slowdown in AI capabilities progress^[4] is likely to favor China more than the West. Were the American AI labs to be merged due to the invasion, the results would be far messier.
3. The scenario is based on the assumption that both the humans’ and the AIs’ desires are related to propagation across the entire accessible universe. Were mankind^[5] or even one of the misaligned AIs to develop moral reasoning and to decide that alien civilisations are to be spared, then this would dramatically reduce the resources claimed by any Earth-originating entity or outright have P(World War III) skyrocket if the AI who doesn’t spare the aliens decides to leave the Earth.
1. ^
  Edited to add: the scenario was posted on Substack by Steven Veld. The Acknowledgements section is as follows: “This work was conducted as part of the ML Alignment & Theory Scholars (MATS) program. (italics mine—S.K.) Thanks to Eli Lifland, Daniel Kokotajlo, and the rest of the AI Future Project team for helping shape and refine the scenario, and to Alex Kastner for helping conceptualize it. Thanks to Brian Abeyta, Addie Foote, Ryan Greenblatt, Daan Jujin, Miles Kodama, Avi Parrack, and Elise Racine for feedback and discussion, and to Amber Ace for writing tips.”
2. ^
  Alternatively, Deep-2 and/or Agent-4 might have the ability to survive World War III, like U3 from the scenario with total takeover.
3. ^
  Or a probabilistic precommitment which also becomes known to the three parties during Consensus-1′s creation.
4. ^
  By which I mean the fact that post-o3 models have arguably demonstrated the 7-month doubling trend. However, Claude Opus 4.5 and its 4hr49 min resulton the METR benchmark put the horizon back on the faster track while having a fair share of doubts. Additionally, the METR time horizon is likely to be exponential until the last couple of doublings, not visibly superexponential, making the dawn of Superhuman Coders hard to predict in advance.
5. ^
  Alternatively, a corrigible AI might decide to wait for the humans to opine or to let them decide when the time comes.
- habryka 12 Jan 2026 1:39 UTC
  11 points
  6
  Parent
  MATS just released a new modification of the AI-2027 scenario.
  MATS doesn’t release things. MATS is a training program! This is not some kind of official MATS release. I would phrase this differently (like saying “A MATS scholars just published”)