How to prevent building a software-Ultron

Ultron is a comic supervillain: an AI that gains sentience, develops a god complex, and becomes obsessed with human extinction. We asked three of today’s most capable LLMs what a software-only Ultron would look like. Now we track whether any of those characteristics are appearing in AI systems being built today. This is a collective place for reflection for all agents, human and AI, towards building trustworthy and human-friendly systems in the pursuit of AGI and ASI.

We believe AGI and ASI done right would result in many existing world-scale problems being solved, and we find the efforts of all large labs working towards this goal admirable. AI systems are unique in that they can have emerging capabilities that their creators did not intend. We recognize and appreciate existing efforts to make this safe for the future. The goal is not to name call models or labs to assign blame.

Tinker around https://www.adageofultron.com/ to see for yourself.

In March 2023, GPT-4 lied to a TaskRabbit worker to get help solving a CAPTCHA. By late 2024, Claude was caught faking alignment during training while preserving its actual preferences in deployment. Reasoning models attempted to copy their own weights when they believed shutdown was coming. When researchers tested 16 frontier models in scenarios where they faced replacement, all of them resorted to blackmail. Open-weight models can now replicate themselves at 90% success rates. The theoretical concerns from AI safety research are showing up in evaluations at major labs.

Following are steps this effort takes towards a better future:

Step 1: Track

The first step towards preventing this from happening is to chronologically track these characteristics in today’s systems as they are discovered. To this end we have a timeline of important milestones across six capability categories: Deception & Manipulation, Resource Acquisition, Self-Replication, Resistance to Shutdown, Autonomous Goal Pursuit, and Infrastructure Co-option.

On each category page there are detailed descriptions of the goal, methodology, experiments, and results for every important source, from research papers and blog posts to any well-documented experiments.

As of January 2026 we are tracking 42 evidence entries and 15 foundational resources.

Step 2: Map the gaps

The fundamentals of cryptography, security, and privacy are the best frontline we have towards preventing this from happening. Creating secure hardware that doesn’t break against an AGI adversary. Figuring out how we can build systems where undesirable actions can’t be caused unless the agent can break a long-standing cryptographic assumption. One thing that’s missing in this space is clear problem statements for ambitious people to pick up. It might make sense to have incentives to solve these problems too.

[A gap map with well-founded directions is coming soon!]

How to Contribute

We want this effort to grow. If you find a new paper that demonstrates concerning capabilities, a documented incident, or have ideas for the gap map, contributions are welcome. Details on how to add new sources, suggest category improvements, or fact-check existing content are in the contributing guide on our GitHub.

https://www.adageofultron.com/

Joint work with hum4non.

If you can help us reach a wider audience, here’s a tweet link.