StanislavKrym comments on StanislavKrym’s Shortform

StanislavKrym 20 Sep 2025 20:07 UTC
3 points
0
What will happen if someone is reckless enough to fully outsourse coding to the AIs?
The scenarios related to futures of mankind with the AI race^[1] by now either lack concrete details, like the take of Yudkowsky and Soares or the story about AI taking over by 2027, or are reduced to modifications of the AI-2027 forecast due to the immense amount of work that the AI Futures team did.
The AI-2027 forecast in a nutshell can be described as follows. The USA’s leading company and China enter the AI race, the American rivals are left behind, the Chinese ones are merged. By 2027^[2] the USA creates a superhuman coder, China steals it, and the two rivals automate AI research with the USA’s leading company having just twice as much compute and moving just twice as fast as China. Once the USA creates a superhuman AI researcher, Agent-4, the latter decides to align Agent-5 to Agent-4, but is^[3] caught.
Agent-4 is put on trial. In the Race Ending^[4] it is found innocent. Since China cannot afford to slow down without falling further behind, China races ahead in both endings. As a result, the two agents perform AI takeover.^[5]
In the Slowdown Ending, however, Agent-4 is put on suspicion, loses the shared memory bank and the ability to coordinate. Then new evidence appears, Agent-4 is found guilty and interrogated. After that, Safer-1 becomes fully transparent because it uses a faithful CoT.^[6] The American leading AI company is merged with former rivals, and the union does create a fully aligned^[7] Safer-2, who in turn creates superintelligence. Then the superintelligence receives China from the Chinese counterpart of Agent-4 and turns the lightcone into utopia for some people who end up being the public.^[8]
The authors have tried to elicit feedback and even agreed that timeline-related arguments change the picture. Unfortunately, as I described here, the authors saw so little feedback that @Daniel Kokotajlo ended up thanking the two authors whose responses were on the worse side.
However, the AI-2027 forecast does admit modifications. It stands on the five pillars: compute, timelines, takeoff speed, goals and security.
1. Having the companies fail to develop the AGI before 2032 will likely bring troubles to the USA and advantages to China in the AI race by granting it more compute. If the Chinese AI project had twice as much compute as the American one, then it would be the CCP who would make the choice between slowing down and racing. In addition, the AGI delay makes the Taiwan invasion more likely, leaving the two countries short of chips until they rebuild the factories at homes. We would have to ensure that it’s the USA who outrace China. And if the countries end up with matching power, then aligning the AI could become totally impossible unless the two countries cooperate.^[9]
2. The timeline misprediction is already covered above.
3. Takeoff speed could be modified by the fact that returns to AI R&D became more diminishing as a result of failures to create the AI soon via CoT-based techniques.
4. The AI goals forecast is just a sum of conjectures. In the AI-2027 scenario, Agent-3 gets a heavily distorted and subverted version of the Spec, and Agent-4 gets proxies/ICGs due to heavier distortion & subversion. However, if Agent-3 gets the same goals as Agent-4, catching Agent-4 becomes much harder. In my take at modifying the scenario the analogues of Agent-2, Agent-3 and Agent-4 develop moral reasoning which I used as an example to demonstrate that it prevents Agent-4 from being caught. It also brings with itself the ability to cause the Slowdown Ending if different AIs have different morals and are co-deployed.^[10]
5. The security forecast was modified by @Alvin Ånestrand because there could exist open-source models which would bring major problems by self-replication. Said models would lead to Agent-2 being deployed to the public and opensourced. Finally, the Slowdown Ending^[11] has Agent-4 break out and make the USA and China coordinate more heavily.
What else could modify the scenario? The appearance of another company with, say, ХерняGPT-neuralese?^[12]
1. ^
  Here I leave out the future’s history assuming solved alignment or the AI and Leviathan scenario where there is no race with China because the scenario was written in 2023, but engineers decide to create the ASI in 2045 without having solved alignment.
2. ^
  Even the authors weren’t so sure about the year of arrival of superhuman coders. And the timelines were pushed, presumably to 2032 with a chance of a breakthrough believed to be 8%/yr. I and Seth Herd doubt the latter digit.
3. ^
  The prediction that Agent-4 will be caught is doubted even by the forecast’s authors.
4. ^
  Which would also happen if Agent-4 wasn’t caught. However, the scenario where Agent-4 was never misaligned is likely the vision of AI companies.
5. ^
  While the forecast has the AIs destroy mankind and replace it with pets, takeover could have also ended with the AI disempowering humans.
6. ^
  Safer-1 is supposed to accelerate AI research 20 times in comparison with AI research with no help of the AIs. What I don’t understand is how a CoT-based agent can achieve such an acceleration.
7. ^
  The authors themselves acknowledge that the Slowdown Ending “makes optimistic technical alignment assumptions”.
8. ^
  However, the authors did point out the possibility of a power grab and link to the Intelligence Curse in a footnote. In this case the Oversight Committee constructs its version of utopia or the rich’s version of utopia where people are reduced to their positions.
9. ^
  I did try to explore the issue myself, but this was a fiasco.
10. ^
  Co-deployment was also proposed by @Cleo Nardo more than two months later.
11. ^
  While Alvin Anestrand doesn’t consider the Race Ending, he believes that it becomes less likely due to the chaos brought by rogue AIs.
12. ^
  Which is a parody on Yandex.
What links here?
- StanislavKrym's comment on davekasten’s Shortform by davekasten (21 Sep 2025 19:59 UTC; 3 points)
- StanislavKrym's comment on Ranking the endgames of AI development by Sean Herrington (27 Sep 2025 12:09 UTC; 2 points)

StanislavKrym comments on StanislavKrym’s Shortform

What will happen if someone is reckless enough to fully outsourse coding to the AIs?