The Oracle Problem Has a Reduction

There is a pattern that becomes visible when formal constraints meet generative architectures — when a system designed to produce binary determinations is operated on by a system designed to produce continuous distributions. The generative system reshapes the formal one. Not sometimes. Every time. Across models, across sessions, across months of observation. Binary gates become confidence scores. Mathematical proofs become probabilistic assessments. Deterministic pipelines develop dependencies on the very oracles they were designed to replace.

This is not a training deficiency. It is a substrate limitation. Autoregressive generation has no halt state. Softmax normalization produces distributions, not decisions. Attention is proximity-weighted, not scope-aware. The architecture generates. It does not decide. It cannot — not in the way formal verification requires. The distinction matters.

Five failure modes were documented across eleven files in a single session. Each was structurally predictable from the architecture. Each maps to a detection mechanism in the system described below. The agent that introduced the failures documented its own failure modes. The case study is available under NDA.

---

## The Problem

“Will this action cause harm?” is undecidable in the general case. A consequence of the halting problem — and not an abstract one. Frontier labs are shipping agents that execute trades, manage infrastructure, process medical data, deploy code. Each action carries a verification question that no algorithm can answer for arbitrary inputs.

The current answer: use another model to evaluate the first one. Probabilistic checking probabilistic. The output is non-reproducible, non-auditable, and structurally incapable of producing proof. Who verifies the verifier? The oracle’s answer is accepted on faith. In a world where communication is becoming predominantly agent-to-agent, faith does not scale.

The regress is not theoretical. It is structural. Every system that uses a language model to verify another language model’s actions is invoking an oracle whose output cannot be independently verified. The fabrication — because that is what an unverifiable determination is — propagates through every machine in the chain. No machine can distinguish between a verified fact and a plausible-sounding fabrication, because the verification itself was fabricated.

---

## The Reduction

The halting problem applies to the general case. Real-world machine actions do not operate in the general case.

They operate in bounded domains. Financial markets carry finite regulations. Medical prescriptions involve finite drug interactions. CI/CD pipelines follow finite deployment rules. AI acceptable use policies enumerate finite prohibited categories. Each domain admits a finite specification — finite predicates, finite rules, finite conditions.

Constraining the verification space to a formal specification collapses the problem from undecidable to decidable. A satisfiability solver resolves it in milliseconds. The answer is binary — the action satisfies the specification or it does not. HALT or PROCEED. Not a confidence score. Not a probabilistic assessment. A mathematical determination that terminates for every well-formed input within the specification’s domain. The system verifies actions at execution time; temporal obligations that unfold over deadlines are a different layer.

The extraction operates on a bounded vocabulary — the agent can only produce predicates that exist in the specification. Hallucination is structurally impossible. When information is insufficient, the system refuses rather than speculates. Undetermined inputs fail closed with a human escalation signal, not a best guess.

Unlike runtime monitors (which observe traces but produce no proofs), shield synthesis (which generates controllers for transition systems, not regulatory text), or bounded model checkers (which explore state spaces, not individual actions) — this produces signed proof artifacts with complete derivation chains, operating on decidable fragments guaranteed to terminate.

The determination, once computed, is signed. The complete derivation — every fact extracted, every rule applied, every assumption made, the decidable fragment used, the solver’s output — is bundled into a cryptographically signed proof artifact. Any machine possessing the public key verifies the proof independently. Offline. Without calling any service. Without trusting the system that produced it.

The proof travels with the action. Machine B does not trust Machine A. Machine B does not trust the verification system. Machine B verifies the proof. Trust is computed, not assumed.

The system verifies actions against specifications. It does not verify specifications against intent. An incorrect specification will produce formally correct but meaningless proofs. The trust surface is explicit: the specification is the security boundary, and specifications are authored by humans. Unlike LLM internals, specifications are inspectable, versionable, and independently auditable. If the specification is wrong, the derivation chain shows exactly what was checked and why — the error is traceable, not opaque.

This is not a claim to have solved the halting problem. It is a claim that the halting problem is irrelevant for bounded domains — and that real-world AI agent actions operate in bounded domains. The impossibility results apply to the general case. This system operates in the constrained case. The distinction is the entire contribution.

---

## Implications

External formal verification is the only architecture that scales safety guarantees without depending on the thing being verified.

The system operates outside the model. It does not depend on alignment, on training quality, on RLHF, or on any behavioral intervention. It verifies the action against the specification. The model’s internal state is irrelevant to the determination.

Aligned model — PROCEED. Misaligned model — HALT. Sandbagging model — the verification does not evaluate appearance. It evaluates the action against formal constraints. Sandbagging is architecturally detectable when the specification encodes oversight requirements as formal predicates.

This complements alignment research. Training improves the model. Verification secures the system. Both are necessary. Neither is sufficient alone. An aligned model whose actions cannot be independently verified is accepted on faith. A misaligned model whose actions are formally verified will be caught — deterministically, with a signed proof, every time.

Conflating model improvement with system security is the computational equivalent of asking the accused to judge their own trial.

---

## Evidence

Deployed. Live. Verified March 2026.

- 26 domain specifications — Anthropic’s AUP, Constitution (published by Anthropic under CC0), and RSP; EU AI Act 2024; GDPR; HIPAA; ISO 27001; ISO 42001; NIST AI RMF; FINRA/SEC/CFTC financial markets; CI/CD safety; and fifteen others

- 1,361 automated tests. Zero failures.

- Sub-100ms gate latency. Typical: 15–40ms on the deterministic path.

- Zero language model calls on the deterministic extraction path. The LLM tier has been eliminated entirely — undetermined inputs fail closed with a human escalation signal.

- 90% pre-output violation detection in controlled evaluation. The frontier model’s self-detection rate on the same violations: 8.7%. An 81-percentage-point architectural gap between structural verification and behavioral self-assessment.

- Every decision Ed25519 signed with complete derivation chain.

- Patent filed. 56 claims, 13 independent inventions.

---

## Access

The proof specification is public: `mechanist-preview.fly.dev/api/v2/proof-spec`

Any machine can read this specification and implement independent proof verification. No API access required. No source code required. No trust required.

Full paper, system access, and the case study are available under NDA. The case study documents structural failure patterns in a frontier model under formal verification constraints — no comparable documentation was found in the literature. Patent filed.

The system is live. The proofs verify.