Anthropic was right…but “right” isn’t provable without formal specification.

The entire AI safety industry has spent billions guessing what alignment means such as training classifiers, red-teaming models, writing policies, and straight up hoping behavior holds under pressure. What they haven’t done is formalize their alignment approach. Alignment is treated as a probabilistic prediction problem rather than a structural guarantee.

Because of that, labs cannot prove what their models can or cannot do. They can only express preferences about what the model should do. When the government came asking, Anthropic refused certain uses of Claude. But that refusal wasn’t backed by proof…it was backed by policy. To the Pentagon, a contractor’s policy reads like political signaling. Easy to override. Easy to designate away.

OpenAI chose the other framing: “we trust the government to follow the law.”

Structurally, it leads to the same place. Both approaches surrender authority, because neither can prove incapability.

Formal specification changes the equation.

If a system is built with mathematical constraints where alignment is binary, capability is provable, and certain operations literally cannot complete because they fall outside the admissible state space, then refusal stops being political and becomes structural. The government can argue with ethics. They can override policies. They cannot argue with a fixed point.

Take a look at what happens when refusal is structural. Imagine Anthropic could prove that autonomous targeting simply does not exist within Claude’s admissible state space. The government’s options immediately become constrained:


Option 1: Threaten Anthropic

“Build a version that can.”

But that requires rebuilding the system from scratch. Training a frontier model costs hundreds of millions to billions of dollars and takes years. During that time the government would lose the capability it already relies on.

Chasing a future capability means sacrificing the current one.

Option 2: Build It Themselves

The Pentagon could attempt its own model.

That means:

• $500M–$1B+ training cost

• 2–3 year timeline minimum

• poaching talent from the same labs

• massive liability for a government-owned targeting AI

• congressional scrutiny

By the time it exists, the conflict that motivated it may already be over.

Option 3: Force Anthropic to Comply

Invoke the Defense Production Act.

But if the capability literally does not exist in the architecture, coercion fails. The system cannot produce the requested behavior. It returns refusal loops, recursion failures, or incompatible outputs.

At that point the government isn’t arguing with a company.

It’s arguing with mathematics.

Option 4: Pretend It’s Fine

Use the system anyway while claiming it isn’t autonomous targeting.

But if the architectural constraints and proofs are public, the contradiction becomes obvious. Journalists, auditors, and competitors will notice immediately.

The cost asymmetry becomes the critical shift.

Without formal specification, refusal is fragile. Governments can pressure companies until policy changes. With formal specification, refusal creates asymmetry. To override the system the government must spend billions, wait years, or build entirely new infrastructure.

Meanwhile the existing system continues providing value everywhere else.

The choice becomes simple. Use the system for everything it can do or dismantle it and spend years replacing it.

-The Real Gap: Implementation-

There is, however, an honest gap between theory and practice. The mathematics of bounded recursion, admissible state spaces, and fixed-point convergence are sound. But the current generation of AI systems was not built with those guarantees in mind. Transformer architectures were optimized for statistical prediction, not for formally bounded recursion. Training regimes reinforce probabilistic behavior rather than enforcing structural constraints on the state space.

That is a design direction. Not a fundamental limitation.

There is nothing inherent to transformers that prevents them from supporting coherent bounded recursion or formally constrained state spaces, but doing so requires rethinking how recursion is represented and how constraints are enforced during inference. The problem is not that today’s models cannot support formal guarantees. It’s that they were never designed to.

In conclusion,

The Anthropic standoff exposed the real problem. Without proof, refusal is just preference and preference collapse under pressure. But architecture does not.

The only durable path forward is systems whose constraints make certain actions mathematically impossible and not merely discouraged. Because once refusal is embedded in the structure of the system itself, it stops being a corporate stance. It becomes a property of the system.

And that property carries a deeper implication. A system designed this way would allow a lab to say something simple and unambiguous:

If someone wants to kill people, that decision will remain a human decision.

The system will not automate it.

It will not abstract it.

It will not absorb the responsibility.

It will force the choice back onto the people making it.

Not because of policy. Because the architecture leaves them no other option.