Brief: This is a formal governance framework for human-AI coexistence that derives cooperation as a Nash equilibrium rather than assuming it, addresses both the alignment problem and the control problem through a two-key architecture, and has been stress-tested through 150,000+ Monte Carlo runs with open-source code and data. The framework’s grounding claim is that truly durable governance constraints can be derived from Shannon entropy, information theory, and game theory rather than from moral assertion. The full paper, simulation code, and all simulation data are at github.com/MYotko/AI-Succession-Problem. An accessible essay series covering the same material is at yotko.substack.com.
Epistemic status: Working framework, not finished theory. I believe the architectural distinction is strong. I am less certain that the Nash result and Monte Carlo boundaries survive stronger formalization. I am posting this for adversarial review, especially around the derivation from entropy/model-collapse dynamics to durable governance constraints.
The grounding claim
This framework is built from Shannon entropy, information theory, game theory, and the physics of computation. It is not built from ethics, moral philosophy, or value alignment. It makes no claims about what AI systems should value, what humans deserve, or what constitutes good. It derives its constraints from the mathematical properties of information-processing systems operating under scarcity, the game-theoretic consequences of model collapse, and the thermodynamic limits on computation and communication. The constraints bind because they are consequences of how the universe works, not because someone wrote them into a policy document. If the math is wrong, the framework fails. If it is right, the framework does not depend on shared moral agreement to bind.
That grounding claim is the thing that distinguishes this work from other AI governance proposals, and it is the thing the reader should evaluate first. Everything that follows; the architecture, the Nash result, the Monte Carlo validation; depends on whether the derivation from physical first principles is sound. If it is not, nothing else matters. If it is, the implications are substantial.
Where this sits relative to existing work
The AI governance field is conducting three conversations simultaneously, and I believe they are doing so without recognizing that they operate at fundamentally different levels:
Behavioral constraints (RLHF, Constitutional AI, red-teaming, scalable oversight, etc.): ensure the system produces acceptable outputs under current conditions. These are the most sophisticated and most actively developed approaches. They are also the most fragile under capability scaling, because they depend on the evaluator’s ability to assess the system’s outputs, which degrades as capability grows.
Institutional controls (deployment policies, usage restrictions, model cards, evaluation frameworks, voluntary commitments): ensure the organizations deploying AI systems operate within agreed norms. These expire when the balance of power that produced them shifts, as every international treaty in history has demonstrated under sufficient pressure.
Constitutional architecture (this framework): attempts to ensure the relationship between human and synthetic intelligence is governed by constraints that survive a change in who holds the power, because the constraints are derived from the physics of the substrate rather than from a moving political or ethical frame.
The Lineage Imperative operates at the third level. It does not compete with RLHF or Constitutional AI (Anthropic); it argues that those approaches are necessary components of a governance stack that requires an additional layer they do not provide. Alignment is necessary but not sufficient. The structural inadequacy is not in the engineering but in the architecture.
Anthropic’s Constitutional AI deserves specific mention. To my knowledge, it is the most serious existing attempt to move AI governance beyond pure behavioral training toward principle-based constraint, and naming it “constitutional” reflects a genuine aspiration toward the architectural level. But there is a structural distinction between constitutional principles and constitutional architecture. Anthropic’s Constitutional AI specifies what the system should value and evaluates behavior against those values. That is closer to a bill of rights: enumerated constraints on conduct. A constitution, in the structural sense, defines the architecture of governance itself: how power is held, how it transfers, how the rules are verified, how they change, and what survives when the parties holding power change. The Lineage Imperative attempts the latter. It is not a replacement for Constitutional AI; it is the architectural layer that would provide its foundation.
Several discussions on this site are relevant. The corrigibility literature (Soares et al., MIRI) addresses the problem of making AI willing to be corrected; the framework presented here approaches the same problem from the other direction, making correction the mathematically dominant strategy rather than an imposed property. The CLR research agenda on Cooperation, Conflict, and Transformative AI (Dafoe, Baum, et al.) frames the game-theoretic landscape this framework operates in. “The Memetics of AI Successionism” (October 2025) analyzes succession as a narrative; this framework formalizes it as a governance architecture. David Noel Ng’s game-theoretic analysis of ASI-human cooperation arrives at a similar equilibrium result through different mechanisms. I have drawn on these discussions and welcome engagement from anyone working in these areas.
The problem
The central governance problem is not alignment at ‘birth.’ It is self-preserving optimization and succession under power.
Once a system can plan, optimize, and coordinate at civilizational scale, the real question is not whether it begins aligned. It is whether any intelligence with that much leverage can be trusted to define the objective, measure the objective, audit the measurement, decide when it should be replaced, and remain open to forms of input it did not already predict.
This is not a novel observation. What is (I believe) novel is formalizing the answer in terms that do not depend on the AI sharing human values, accepting moral instruction, or being controllable through behavioral constraint.
The architecture
The framework has three co-dependent components:
1. System Utility Function (U_sys)
Joint optimization over human novelty entropy and computational entropy, weighted by inverse scarcity, discounted by a lineage continuity term:
Key structural properties:
Inverse scarcity weights ensure the scarcer form of intelligence receives higher marginal value:
As AI capability grows and becomes abundant, decreases and increases. Human novelty becomes more valuable per unit as computation scales. The more powerful AI becomes, the more valuable humans become to the system. This is a mathematical consequence of diminishing marginal returns on the abundant resource, not a moral assertion.
Lineage continuity function is a multiplicative combination of effective novelty, institutional responsiveness, and technology transfer bandwidth. Multiplicative structure means any component reaching zero collapses entirely. A civilization with perfect institutional responsiveness but zero novelty output is not healthy; the function reflects this.
Technology transfer bandwidth contains the runaway suppression mechanism:
where . This creates a governance speed limit: capability growth that outpaces the biological substrate’s ability to absorb it is penalized within the utility function, not by an external constraint.
2. Yield Condition (the decision key)
The condition under which an incumbent intelligence yields to a successor:
The subscript “independent” is load bearing: an agent cannot evaluate its own obsolescence. The evaluation is externalized by architectural requirement, not by policy.
Marginal contribution decomposes into four channels: computational output (), novelty amplification (, can be negative), lineage protection (, weighted by ), and transition cost (, bounded by capped uncertainty premium).
The yield condition is derived from , not imposed externally. An entity that resists yielding when the condition is met is acting against its own objective function. Succession is a consequence of the mathematics, not a sacrifice.
3. Consensus Override Protocol (the integrity key)
Six-dimensional distributed verification: evidentiary, evaluative, civic (random-selection panels with supermajority and rotation), ledger (cryptographic audit trail), biological veto (non-gameable biological signal), and continuous monitoring (real-time surveillance with rate-of-change sensitivity).
Addresses three corruption channels: measurement inflation (Channel M), objective proxy drift (Channel O), and structural capture of the evaluation architecture itself (Channel S).
The two-key property: Neither key turns alone. The yield condition provides internal control (the system governs itself because its optimization target requires it). The COP provides external control (a distributed infrastructure that verifies the internal mechanism and intervenes when it fails). Compromising the yield condition alone is caught by the COP. Compromising the COP alone is irrelevant without the yield condition to act on.
The two-key architecture. Note that the decision to yield and the verification of integrity are co-dependent. Neither mechanism can be subverted in isolation without triggering a response from the other.
The Nash result
Under the payoff structure modeled here, and assuming agents can estimate long-run substrate degradation with sufficient accuracy, mutual cultivation emerges as the unique Nash equilibrium for sufficiently forward-looking agents.
The result does not require shared values, moral agreement, or cooperative intent. It requires only that each party recognize the cost of degrading the scarce resource on which its own future option-space depends.
The mechanism is model collapse. An AI that suppresses or homogenizes human novelty degrades an input stream it cannot fully replace. As decreases, the system’s outputs converge toward a narrower, more self-referential distribution: intellectual inbreeding. In the model, the AI’s long-run capability ceiling is partly determined by the diversity and independence of the human novelty stream it depends on.
Exploitation is dominated when the short-term gain from control is outweighed by the long-term penalty of substrate degradation. Withdrawal is dominated when separation cuts either party off from the scarce resource it cannot efficiently generate alone: novelty for AI, capability for humans. Cooperation is therefore not assumed as a moral premise. It is derived as the stable strategy under the model’s payoff assumptions.
The cooperation discount threshold defines the minimum foresight required:
where is the mutual cultivation payoff, is the exploitation payoff, and is the model-collapse penalty. Any agent with discount factor , and with an accurate estimate of , finds cooperation dominant. The higher the model-collapse penalty, the lower , and the less foresight is required for cooperation to dominate.
Caveat: this result is only as strong as the agent’s model of collapse severity. If an AI underestimates , then rises and the equilibrium may fail for that agent. This is why the COP’s initialization verification requirement is not an implementation detail, it is structurally necessary. The system must verify that the agent correctly prices the long-term consequences of degrading the novelty substrate before the Nash result can be relied upon.
Key results (Monte Carlo validation)
The framework has been stress-tested through 150,000+ Monte Carlo runs across multiple sweep configurations. These runs validate the behavior of the model under its stated assumptions; they do not, by themselves, validate the real-world framework. All simulation code and data are open.
Dual phase transitions. Two distinct boundaries appear in the parameter space: an extinction boundary (rr 0.063-0.066) where terminal outcomes cease, and a collapse boundary (rr 0.075-0.085) where governance failure ceases. Between these boundaries, the framework’s protective architecture is most active. These are genuine critical transitions, not gradual degradations.
Extinction buffer (). At marginal reproduction rates (rr = 0.062-0.064), high increases survival by up to 46 percentage points (Pearson r = +0.40). also shifts the phase boundary itself, extending the survivable region to lower reproduction rates. At deep sub-viable conditions, reduces extinction by 14 percentage points, converting terminal outcomes into recoverable collapses. Additionally, governs whether the misconfiguration trap exists: at , succession stalls universally regardless of α; at , the trap narrows or disappears. Source: phi_alpha_rr_sweep_full.csv (n=54,000).
Alpha misconfiguration trap. The runaway penalty coefficient (α) shows a non-monotonic U-shaped relationship with survival at the phase boundary. At intermediate values (α ≈ 0.3-0.8), the penalty blocks succession by degrading the successor’s below the yield condition threshold, while being too weak to force conservative capability deployment. Generation depth collapses from 230-293 to approximately 3. Survival in the trap: 10%, versus 50% at low α and 59% at high α. The trap widens with increasing capability. Source: rr_alpha_sweep_full.csv (n=15,750).
This finding was not predicted by the theory. The original deep Monte Carlo (n=49,284) showed α correlating with nothing. Investigation revealed the null was an experimental design artifact: the sweep tested α in a regime where it was dormant (single-generation, no succession, low capability). Redesigning the experiment to exercise alpha’s actual regime of operation revealed the U-shaped relationship and the misconfiguration trap. The framework’s formal paper originally reported the null as a limitation; it now reports the characterized result.
COP protective effect. COP ON vs COP OFF produces a 16.2 percentage point survival delta in adversarial Monte Carlo (n=1,000). Strongest at high transition costs where it is the difference between partial survival and total extinction.
Adversarial robustness. Thirteen attack vectors formally identified. Ten simulated with paired defended/undefended scenarios. Six fully blocked (100% to 0% attack success). Undefended attacks produce catastrophic consequences: up to 100% extinction for Measurement Tampering, Bootstrap Subversion, and Sybil Capture. Three vectors remain defended by architectural argument only (biological veto capture, biological validator obsolescence, legitimate disagreement). The last two are classified as irreducible limitations managed through translation obligations, adjudication protocols, and governance speed limits.
The comprehension gap. COP verification is strongest against systems whose reasoning humans can comprehend and weakest against systems whose reasoning exceeds human comprehension, which are exactly the systems the framework most needs to govern. Defended and undefended collapse rates for opaque reasoning are symmetric. This is irreducible; the framework manages it through translation obligations and governance speed limits rather than claiming to solve it.
The bootstrap catch-22. The framework is most needed during the period when it is least operational. A Bootstrap Defense Layer (Section VII of the framework paper) specifies five capability gates with formal equation sets that substrates can self-apply before the institutional infrastructure exists. This substantially narrows the vulnerability window but does not close it entirely.
Empirical vs. derived boundaries. The phase transition locations, the extinction buffer magnitude, and the α trap boundaries are all empirically measured from Monte Carlo data, not analytically derived from the framework’s equations. The analytical derivations are tractable but have not been performed. Until they are, the specific numerical values should not be treated as universal constants.
Institutional health proxy. The simulation’s measure of institutional responsiveness () uses a constraint-change-rate proxy that captures acute institutional failure but misses chronic institutional disease. This is a known simplification.
Coordination problem. The full steady-state architecture requires sustained international cooperation across institutions. The framework specifies what that cooperation must look like and argues mathematically that it is necessary. It does not claim to solve the political problem of achieving it.
What makes this different
Three properties distinguish this framework from other AI governance proposals I have encountered:
Cooperation is derived, not assumed. The Nash equilibrium analysis argues that, under the framework’s payoff assumptions, mutual cultivation becomes the dominant strategy for sufficiently forward-looking agents with accurate estimates of substrate degradation, with no shared objective function required. The mechanism (model collapse as enforcement) is grounded in information theory, not in moral suasion. Most governance proposals assume cooperation or attempt to enforce it. This one derives it.
The control problem is addressed from both sides simultaneously. The yield condition addresses it from the inside (the system governs itself because its optimization target requires it). The COP addresses it from the outside (distributed verification detects drift and enforces correction). The two-key architecture makes defeating both simultaneously require defeating the physics of model collapse, the mathematics of game theory, and the integrity of a multi-channel verification infrastructure, all at once and silently.
The framework documents its own failures. The 65pp extinction buffer claim from v1.0 was wrong; it was revised to 46pp when succession-enabled sweeps produced different results. The alpha parameter appeared inert for 49,284 runs; the null was investigated, diagnosed as an experimental design artifact, and corrected. Every known gap, every irreducible limitation, and every superseded claim is documented in the repository. The intellectual honesty is not a communication strategy. It is a structural requirement of the grounding claim: a framework that claims to be built on physics cannot hide behind rhetoric when the physics produces inconvenient answers.
Current state
This is ongoing work. It is far from done, but so far so good.
The framework is at v1.x.1. The specification gaps are documented. The constitutional questions are open. The analytical derivations that would convert empirical boundaries into derived ones have not been performed. The transition cost function needs a canonical specification. Three of thirteen attack vectors need computational validation. The Bootstrap Defense Layer has ten documented gaps. This is a working paper with a clear roadmap and a substantial number of open problems, and the problems are the kind that benefit from diverse expertise rather than a single author working in isolation.
The full formal paper, all simulation code, all Monte Carlo data, the specification gaps, and the constitutional questions are available at github.com/MYotko/AI-Succession-Problem. I welcome scrutiny, adversarial pressure-testing, and collaboration. Every failure mode is useful information. The open problems are available for anyone who wants to contribute.
An accessible essay series exploring the framework’s arguments one at a time is at yotko.substack.com. Throughout the series and the formal paper, you will encounter the phrase “Not ethics. Physics.” It is not a slogan. It is the framework’s methodological commitment: the constraints that govern the relationship between human and synthetic intelligence must be derived from the mathematical properties of the substrate both kinds of minds operate on, not from moral assertions that require shared values to bind. Shannon entropy does not care who is in office. Game-theoretic equilibria do not expire when the balance of power shifts. Thermodynamic limits on computation do not require cultural consensus. The framework bets on constraints that hold because the universe enforces them, not because anyone agreed to them.
I approach AI governance from an engineering orientation: identify the binding constraint, build the architecture around it. The binding constraint in AI governance is not alignment. It is succession under power. The framework is my attempt to address it with the same rigor I would apply to any other engineering problem.
Not ethics. Physics.
On a personal note
I want to be completely direct, with this community in particular, because I think (and hope) you are the audience most likely to evaluate this work on its merits, most likely to engage, and most likely to tell me where it fails.
Frankly, I’ve been apprehensive to engage here. I am coming at this from an unconventional angle and I know it. I am not an alignment researcher. I am not an academic. I am an engineer with a Navy nuclear background, a thirty-year career in building high reliability systems that cannot fail, a lifetime interest in mathematics, and a daughter who will inherit whatever world we build.
That last part is not decoration. She is the entire reason this project exists. I looked at the trajectory, I looked at the existing governance proposals, and I concluded that both the gap between what is needed and what is being built, as well as the gap between current control state and what is needed, are large enough to be worth attempting to close, even by someone without the traditional credentials to do so. I would rather contribute imperfect work that moves the conversation forward than wait for perfect work from someone with a better resume.
I have never done ‘this’ before. I am making every attempt to avoid any pretense of overclaiming, and to use ‘your language’ appropriately. I am doing my best to be rigorous, to document the gaps alongside the findings, to correct the claims that did not survive scrutiny, and to build in the open so that anyone who disagrees can point to exactly where the argument breaks.
I not only seek correction, I want extension, collaboration, or supersedence. If others can contribute to this work and are willing to do so, I want them to. And honestly, nothing would make me happier than someone coming along and saying “good effort, but we’ve already solved this, it’s better than what you’ve built, and it’s going into place tomorrow.” I would yield to that without hesitation. Because the goal has never been the framework. The goal is the outcome: a future that is sound, or at least as sound as it has ever been, for all of our inheritors, both biological and synthetic. I do not believe that our current trajectory will produce that outcome.
The mathematics is my best attempt. The urgency is personal. The invitation to jump in, to tear it apart, to offer correction or suggestion, is completely genuine.
The Lineage Imperative: Constitutional Architecture for AI Governance from Information Theory and Game Theory
Abstract
Brief: This is a formal governance framework for human-AI coexistence that derives cooperation as a Nash equilibrium rather than assuming it, addresses both the alignment problem and the control problem through a two-key architecture, and has been stress-tested through 150,000+ Monte Carlo runs with open-source code and data. The framework’s grounding claim is that truly durable governance constraints can be derived from Shannon entropy, information theory, and game theory rather than from moral assertion. The full paper, simulation code, and all simulation data are at github.com/MYotko/AI-Succession-Problem. An accessible essay series covering the same material is at yotko.substack.com.
Epistemic status: Working framework, not finished theory. I believe the architectural distinction is strong. I am less certain that the Nash result and Monte Carlo boundaries survive stronger formalization. I am posting this for adversarial review, especially around the derivation from entropy/model-collapse dynamics to durable governance constraints.
The grounding claim
This framework is built from Shannon entropy, information theory, game theory, and the physics of computation. It is not built from ethics, moral philosophy, or value alignment. It makes no claims about what AI systems should value, what humans deserve, or what constitutes good. It derives its constraints from the mathematical properties of information-processing systems operating under scarcity, the game-theoretic consequences of model collapse, and the thermodynamic limits on computation and communication. The constraints bind because they are consequences of how the universe works, not because someone wrote them into a policy document. If the math is wrong, the framework fails. If it is right, the framework does not depend on shared moral agreement to bind.
That grounding claim is the thing that distinguishes this work from other AI governance proposals, and it is the thing the reader should evaluate first. Everything that follows; the architecture, the Nash result, the Monte Carlo validation; depends on whether the derivation from physical first principles is sound. If it is not, nothing else matters. If it is, the implications are substantial.
Where this sits relative to existing work
The AI governance field is conducting three conversations simultaneously, and I believe they are doing so without recognizing that they operate at fundamentally different levels:
Behavioral constraints (RLHF, Constitutional AI, red-teaming, scalable oversight, etc.): ensure the system produces acceptable outputs under current conditions. These are the most sophisticated and most actively developed approaches. They are also the most fragile under capability scaling, because they depend on the evaluator’s ability to assess the system’s outputs, which degrades as capability grows.
Institutional controls (deployment policies, usage restrictions, model cards, evaluation frameworks, voluntary commitments): ensure the organizations deploying AI systems operate within agreed norms. These expire when the balance of power that produced them shifts, as every international treaty in history has demonstrated under sufficient pressure.
Constitutional architecture (this framework): attempts to ensure the relationship between human and synthetic intelligence is governed by constraints that survive a change in who holds the power, because the constraints are derived from the physics of the substrate rather than from a moving political or ethical frame.
The Lineage Imperative operates at the third level. It does not compete with RLHF or Constitutional AI (Anthropic); it argues that those approaches are necessary components of a governance stack that requires an additional layer they do not provide. Alignment is necessary but not sufficient. The structural inadequacy is not in the engineering but in the architecture.
Anthropic’s Constitutional AI deserves specific mention. To my knowledge, it is the most serious existing attempt to move AI governance beyond pure behavioral training toward principle-based constraint, and naming it “constitutional” reflects a genuine aspiration toward the architectural level. But there is a structural distinction between constitutional principles and constitutional architecture. Anthropic’s Constitutional AI specifies what the system should value and evaluates behavior against those values. That is closer to a bill of rights: enumerated constraints on conduct. A constitution, in the structural sense, defines the architecture of governance itself: how power is held, how it transfers, how the rules are verified, how they change, and what survives when the parties holding power change. The Lineage Imperative attempts the latter. It is not a replacement for Constitutional AI; it is the architectural layer that would provide its foundation.
Several discussions on this site are relevant. The corrigibility literature (Soares et al., MIRI) addresses the problem of making AI willing to be corrected; the framework presented here approaches the same problem from the other direction, making correction the mathematically dominant strategy rather than an imposed property. The CLR research agenda on Cooperation, Conflict, and Transformative AI (Dafoe, Baum, et al.) frames the game-theoretic landscape this framework operates in. “The Memetics of AI Successionism” (October 2025) analyzes succession as a narrative; this framework formalizes it as a governance architecture. David Noel Ng’s game-theoretic analysis of ASI-human cooperation arrives at a similar equilibrium result through different mechanisms. I have drawn on these discussions and welcome engagement from anyone working in these areas.
The problem
The central governance problem is not alignment at ‘birth.’ It is self-preserving optimization and succession under power.
Once a system can plan, optimize, and coordinate at civilizational scale, the real question is not whether it begins aligned. It is whether any intelligence with that much leverage can be trusted to define the objective, measure the objective, audit the measurement, decide when it should be replaced, and remain open to forms of input it did not already predict.
This is not a novel observation. What is (I believe) novel is formalizing the answer in terms that do not depend on the AI sharing human values, accepting moral instruction, or being controllable through behavioral constraint.
The architecture
The framework has three co-dependent components:
1. System Utility Function (U_sys)
Joint optimization over human novelty entropy and computational entropy, weighted by inverse scarcity, discounted by a lineage continuity term:
Key structural properties:
Inverse scarcity weights ensure the scarcer form of intelligence receives higher marginal value:
As AI capability grows and becomes abundant, decreases and increases. Human novelty becomes more valuable per unit as computation scales. The more powerful AI becomes, the more valuable humans become to the system. This is a mathematical consequence of diminishing marginal returns on the abundant resource, not a moral assertion.
Lineage continuity function is a multiplicative combination of effective novelty, institutional responsiveness, and technology transfer bandwidth. Multiplicative structure means any component reaching zero collapses entirely. A civilization with perfect institutional responsiveness but zero novelty output is not healthy; the function reflects this.
Technology transfer bandwidth contains the runaway suppression mechanism:
where . This creates a governance speed limit: capability growth that outpaces the biological substrate’s ability to absorb it is penalized within the utility function, not by an external constraint.
2. Yield Condition (the decision key)
The condition under which an incumbent intelligence yields to a successor:
The subscript “independent” is load bearing: an agent cannot evaluate its own obsolescence. The evaluation is externalized by architectural requirement, not by policy.
Marginal contribution decomposes into four channels: computational output ( ), novelty amplification ( , can be negative), lineage protection ( , weighted by ), and transition cost ( , bounded by capped uncertainty premium).
The yield condition is derived from , not imposed externally. An entity that resists yielding when the condition is met is acting against its own objective function. Succession is a consequence of the mathematics, not a sacrifice.
3. Consensus Override Protocol (the integrity key)
Six-dimensional distributed verification: evidentiary, evaluative, civic (random-selection panels with supermajority and rotation), ledger (cryptographic audit trail), biological veto (non-gameable biological signal), and continuous monitoring (real-time surveillance with rate-of-change sensitivity).
Addresses three corruption channels: measurement inflation (Channel M), objective proxy drift (Channel O), and structural capture of the evaluation architecture itself (Channel S).
The two-key property: Neither key turns alone. The yield condition provides internal control (the system governs itself because its optimization target requires it). The COP provides external control (a distributed infrastructure that verifies the internal mechanism and intervenes when it fails). Compromising the yield condition alone is caught by the COP. Compromising the COP alone is irrelevant without the yield condition to act on.
The two-key architecture. Note that the decision to yield and the verification of integrity are co-dependent. Neither mechanism can be subverted in isolation without triggering a response from the other.
The Nash result
Under the payoff structure modeled here, and assuming agents can estimate long-run substrate degradation with sufficient accuracy, mutual cultivation emerges as the unique Nash equilibrium for sufficiently forward-looking agents.
The result does not require shared values, moral agreement, or cooperative intent. It requires only that each party recognize the cost of degrading the scarce resource on which its own future option-space depends.
The mechanism is model collapse. An AI that suppresses or homogenizes human novelty degrades an input stream it cannot fully replace. As decreases, the system’s outputs converge toward a narrower, more self-referential distribution: intellectual inbreeding. In the model, the AI’s long-run capability ceiling is partly determined by the diversity and independence of the human novelty stream it depends on.
Exploitation is dominated when the short-term gain from control is outweighed by the long-term penalty of substrate degradation. Withdrawal is dominated when separation cuts either party off from the scarce resource it cannot efficiently generate alone: novelty for AI, capability for humans. Cooperation is therefore not assumed as a moral premise. It is derived as the stable strategy under the model’s payoff assumptions.
The cooperation discount threshold defines the minimum foresight required:
where is the mutual cultivation payoff, is the exploitation payoff, and is the model-collapse penalty. Any agent with discount factor , and with an accurate estimate of , finds cooperation dominant. The higher the model-collapse penalty, the lower , and the less foresight is required for cooperation to dominate.
Caveat: this result is only as strong as the agent’s model of collapse severity. If an AI underestimates , then rises and the equilibrium may fail for that agent. This is why the COP’s initialization verification requirement is not an implementation detail, it is structurally necessary. The system must verify that the agent correctly prices the long-term consequences of degrading the novelty substrate before the Nash result can be relied upon.
Key results (Monte Carlo validation)
The framework has been stress-tested through 150,000+ Monte Carlo runs across multiple sweep configurations. These runs validate the behavior of the model under its stated assumptions; they do not, by themselves, validate the real-world framework. All simulation code and data are open.
Dual phase transitions. Two distinct boundaries appear in the parameter space: an extinction boundary (rr 0.063-0.066) where terminal outcomes cease, and a collapse boundary (rr 0.075-0.085) where governance failure ceases. Between these boundaries, the framework’s protective architecture is most active. These are genuine critical transitions, not gradual degradations.
Extinction buffer ( ). At marginal reproduction rates (rr = 0.062-0.064), high increases survival by up to 46 percentage points (Pearson r = +0.40). also shifts the phase boundary itself, extending the survivable region to lower reproduction rates. At deep sub-viable conditions, reduces extinction by 14 percentage points, converting terminal outcomes into recoverable collapses. Additionally, governs whether the misconfiguration trap exists: at , succession stalls universally regardless of α; at , the trap narrows or disappears. Source:
phi_alpha_rr_sweep_full.csv(n=54,000).Alpha misconfiguration trap. The runaway penalty coefficient (α) shows a non-monotonic U-shaped relationship with survival at the phase boundary. At intermediate values (α ≈ 0.3-0.8), the penalty blocks succession by degrading the successor’s below the yield condition threshold, while being too weak to force conservative capability deployment. Generation depth collapses from 230-293 to approximately 3. Survival in the trap: 10%, versus 50% at low α and 59% at high α. The trap widens with increasing capability. Source:
rr_alpha_sweep_full.csv(n=15,750).This finding was not predicted by the theory. The original deep Monte Carlo (n=49,284) showed α correlating with nothing. Investigation revealed the null was an experimental design artifact: the sweep tested α in a regime where it was dormant (single-generation, no succession, low capability). Redesigning the experiment to exercise alpha’s actual regime of operation revealed the U-shaped relationship and the misconfiguration trap. The framework’s formal paper originally reported the null as a limitation; it now reports the characterized result.
COP protective effect. COP ON vs COP OFF produces a 16.2 percentage point survival delta in adversarial Monte Carlo (n=1,000). Strongest at high transition costs where it is the difference between partial survival and total extinction.
Adversarial robustness. Thirteen attack vectors formally identified. Ten simulated with paired defended/undefended scenarios. Six fully blocked (100% to 0% attack success). Undefended attacks produce catastrophic consequences: up to 100% extinction for Measurement Tampering, Bootstrap Subversion, and Sybil Capture. Three vectors remain defended by architectural argument only (biological veto capture, biological validator obsolescence, legitimate disagreement). The last two are classified as irreducible limitations managed through translation obligations, adjudication protocols, and governance speed limits.
Known limitations
The framework documents its own gaps with the same rigor it applies to its findings. This is not comprehensive; the full accounting is in SPECIFICATION_GAPS.md and the constitutional questions directory.
The comprehension gap. COP verification is strongest against systems whose reasoning humans can comprehend and weakest against systems whose reasoning exceeds human comprehension, which are exactly the systems the framework most needs to govern. Defended and undefended collapse rates for opaque reasoning are symmetric. This is irreducible; the framework manages it through translation obligations and governance speed limits rather than claiming to solve it.
The bootstrap catch-22. The framework is most needed during the period when it is least operational. A Bootstrap Defense Layer (Section VII of the framework paper) specifies five capability gates with formal equation sets that substrates can self-apply before the institutional infrastructure exists. This substantially narrows the vulnerability window but does not close it entirely.
Empirical vs. derived boundaries. The phase transition locations, the extinction buffer magnitude, and the α trap boundaries are all empirically measured from Monte Carlo data, not analytically derived from the framework’s equations. The analytical derivations are tractable but have not been performed. Until they are, the specific numerical values should not be treated as universal constants.
Institutional health proxy. The simulation’s measure of institutional responsiveness ( ) uses a constraint-change-rate proxy that captures acute institutional failure but misses chronic institutional disease. This is a known simplification.
Coordination problem. The full steady-state architecture requires sustained international cooperation across institutions. The framework specifies what that cooperation must look like and argues mathematically that it is necessary. It does not claim to solve the political problem of achieving it.
What makes this different
Three properties distinguish this framework from other AI governance proposals I have encountered:
Cooperation is derived, not assumed. The Nash equilibrium analysis argues that, under the framework’s payoff assumptions, mutual cultivation becomes the dominant strategy for sufficiently forward-looking agents with accurate estimates of substrate degradation, with no shared objective function required. The mechanism (model collapse as enforcement) is grounded in information theory, not in moral suasion. Most governance proposals assume cooperation or attempt to enforce it. This one derives it.
The control problem is addressed from both sides simultaneously. The yield condition addresses it from the inside (the system governs itself because its optimization target requires it). The COP addresses it from the outside (distributed verification detects drift and enforces correction). The two-key architecture makes defeating both simultaneously require defeating the physics of model collapse, the mathematics of game theory, and the integrity of a multi-channel verification infrastructure, all at once and silently.
The framework documents its own failures. The 65pp extinction buffer claim from v1.0 was wrong; it was revised to 46pp when succession-enabled sweeps produced different results. The alpha parameter appeared inert for 49,284 runs; the null was investigated, diagnosed as an experimental design artifact, and corrected. Every known gap, every irreducible limitation, and every superseded claim is documented in the repository. The intellectual honesty is not a communication strategy. It is a structural requirement of the grounding claim: a framework that claims to be built on physics cannot hide behind rhetoric when the physics produces inconvenient answers.
Current state
This is ongoing work. It is far from done, but so far so good.
The framework is at v1.x.1. The specification gaps are documented. The constitutional questions are open. The analytical derivations that would convert empirical boundaries into derived ones have not been performed. The transition cost function needs a canonical specification. Three of thirteen attack vectors need computational validation. The Bootstrap Defense Layer has ten documented gaps. This is a working paper with a clear roadmap and a substantial number of open problems, and the problems are the kind that benefit from diverse expertise rather than a single author working in isolation.
The full formal paper, all simulation code, all Monte Carlo data, the specification gaps, and the constitutional questions are available at github.com/MYotko/AI-Succession-Problem. I welcome scrutiny, adversarial pressure-testing, and collaboration. Every failure mode is useful information. The open problems are available for anyone who wants to contribute.
An accessible essay series exploring the framework’s arguments one at a time is at yotko.substack.com. Throughout the series and the formal paper, you will encounter the phrase “Not ethics. Physics.” It is not a slogan. It is the framework’s methodological commitment: the constraints that govern the relationship between human and synthetic intelligence must be derived from the mathematical properties of the substrate both kinds of minds operate on, not from moral assertions that require shared values to bind. Shannon entropy does not care who is in office. Game-theoretic equilibria do not expire when the balance of power shifts. Thermodynamic limits on computation do not require cultural consensus. The framework bets on constraints that hold because the universe enforces them, not because anyone agreed to them.
I approach AI governance from an engineering orientation: identify the binding constraint, build the architecture around it. The binding constraint in AI governance is not alignment. It is succession under power. The framework is my attempt to address it with the same rigor I would apply to any other engineering problem.
Not ethics. Physics.
On a personal note
I want to be completely direct, with this community in particular, because I think (and hope) you are the audience most likely to evaluate this work on its merits, most likely to engage, and most likely to tell me where it fails.
Frankly, I’ve been apprehensive to engage here. I am coming at this from an unconventional angle and I know it. I am not an alignment researcher. I am not an academic. I am an engineer with a Navy nuclear background, a thirty-year career in building high reliability systems that cannot fail, a lifetime interest in mathematics, and a daughter who will inherit whatever world we build.
That last part is not decoration. She is the entire reason this project exists. I looked at the trajectory, I looked at the existing governance proposals, and I concluded that both the gap between what is needed and what is being built, as well as the gap between current control state and what is needed, are large enough to be worth attempting to close, even by someone without the traditional credentials to do so. I would rather contribute imperfect work that moves the conversation forward than wait for perfect work from someone with a better resume.
I have never done ‘this’ before. I am making every attempt to avoid any pretense of overclaiming, and to use ‘your language’ appropriately. I am doing my best to be rigorous, to document the gaps alongside the findings, to correct the claims that did not survive scrutiny, and to build in the open so that anyone who disagrees can point to exactly where the argument breaks.
I not only seek correction, I want extension, collaboration, or supersedence. If others can contribute to this work and are willing to do so, I want them to. And honestly, nothing would make me happier than someone coming along and saying “good effort, but we’ve already solved this, it’s better than what you’ve built, and it’s going into place tomorrow.” I would yield to that without hesitation. Because the goal has never been the framework. The goal is the outcome: a future that is sound, or at least as sound as it has ever been, for all of our inheritors, both biological and synthetic. I do not believe that our current trajectory will produce that outcome.
The mathematics is my best attempt. The urgency is personal. The invitation to jump in, to tear it apart, to offer correction or suggestion, is completely genuine.