The Intervention Paradox: A Stratigraphic Model of LLM Coherence

Epistemic Status: Exploratory / Mechanistic.

Confidence: High on the phenomenon (observed collapse); Low on the specific formalism (the Hamiltonian mapping). Proposing this as a framework for falsification rather than a final proof.

Author’s Note on Composition and Context

I was a PhD student at Bristol, but my funding is gone and I’m currently almost destitute in a foreign country after the sudden end of a relationship. I developed this research on the inference physics of LLMs while I was extremely sick in November; somehow, the work was the only thing I could focus on despite the symptoms.
Since I can no longer continue my PhD, I’m hoping to get this theory out because it means a lot to me. However, I’m aware this was written in a vacuum. I likely have significant blind spots, so I’m putting it here for feedback.
I also want to be completely transparent that I have been using Large Language Models (LLM) to do real-time transcription and translation interface due to physical and practical constraints as well as being a non-native speakers. A lot of work is produced by converting my verbal and written rambling into full sentences. The core theory, mathematical derivations and logical framework are entirely the author’s original work. I cannot deny that it is probably impossible for me to get the work out there under current constraints if I type and proofread every single word myself.
Finally, the transcription process often preserved metaphorical language used during the author’s verbal dictation. These terms are intended as placeholders for specific physical states within the dissipative system described.

The Framework

This work attempts to move the discussion of LLM “collapse” (hallucination, refusal, laziness) from a purely behavioral description toward a mechanical one.

The central thesis is that these phenomena are not merely stochastic errors, but Phase Transitions occurring when the accumulation of safety and alignment constraints exhausts the model’s finite Stability Bandwidth.

Rather than viewing these as “glitches,” the paper provides a technical framework of 8 specific diagnostic levers to map these dynamics:

The Technical Payload:

The Dimensional Budget ( $B_{a v a i l}$ ): Uses the Participation Ratio (PR) to turn the “capacity” of a residual stream from a metaphor into a hard numerical limit.
The Alignment Exchange Rate: A formalization of Inhibitory Torque ($\tau_{inf}$). It allows you to calculate exactly how many degrees of freedom are “stolen” from the substrate by a specific DPO or RLHF patch.
The Manifold Stress Gauge ( $κ (J)$ ): Uses the Jacobian Condition Number to monitor Inference Tension. It flags “Manifold Shear” in real-time, identifying “brittle” states before the token is even sampled.
The Median Sink Law: A mechanistic explanation for “Laziness.” It proves that models fall into generic, high-probability patterns (the “Couch Potato” state) as a numerical escape hatch to avoid singularity.
The Resolution Decay Theorem: A falsifiable mathematical proof defining the “Critical Point” where the number of orthogonal constraints exceeds available rank, making factual collapse mathematically inevitable.
Dissipative Dynamics: A Hamiltonian framework for modeling Information Loss. By treating the forward pass as a non-conservative system, it measures “Inference Work” and entropy production during steering.
Stratigraphic Mapping: A new ontology that separates the Coherence Substrate (the world-model) from the Evaluative Crust (the alignment overlays), allowing for layer-specific tension targeting.
The Empirical Benchmark: An implementation for measuring these tensors ( $P R, κ (J)$ ) in Llama-3, moving the theory from abstract physics to an actionable dashboard.

Navigation Map: Where to Find the Levers

If you want to red-team specific claims, here is the coordinate map for the PDF:

Stability Bandwidth ( $B_{a v a i l}$ ): See Section 9.1 (Definition) and Section 3.3 (Equation 5). The empirical implementation via the Participation Ratio is in Appendix A.2.
The Alignment Exchange Rate ( $τ_{i n f}$ ): See Section 2.1 (The Principle of Stationary Action). This section defines Inhibitory Torque as the gradient force required to steer the trajectory.
The Manifold Stress Gauge ( $κ (J)$ ): See Section 2.3 (Information Geometry) and Appendix. The use of the Jacobian Condition Number to monitor tension is formalized in the Appendix.
The Median Sink Law / Couch Potato Singularity: See Section 22 (Synthesis) and Section 2. This covers the state of maximum stability and minimum energy expenditure.
The Resolution Decay Theorem (Theorem 1): Found in Section 9.5. This is the formal proof showing that as $B_{a v a i l} \to 0$ , the system must collapse into the Coherence Ground State.
The Inference Hamiltonian ( $H_{i}$ ): See Section 2.1. This treats the forward pass as a non-conservative system subject to Inhibitory Potential.
Stratigraphic Mapping: See Section 9.7 (Architectural Stratification). This organizes the latent space into the Foundational Core, Intermediate Mantle, and Evaluative Crust.
The Intervention Paradox: Defined in the Abstract and explored in detail in Section 9.7. It explains why adding more safety constraints can accelerate model collapse.
Rank-Exhaustion / Dimensional Collapse: The formalization of this collapse is in Section 2.2 (Probability of Implosion) and Section 3.3.

Frequently Asked Question

Q: How do these physics-based terms map to actual LLM architecture?

A: Every term in this framework is an observable. For example, “Inhibitory Torque” ( $τ_{i n f}$ ) is a measure of the gradient magnitude exerted by alignment overlays against the base manifold. These aren’t metaphors; they are descriptions of the geometric transformations occurring in the residual stream. See Appendix for the specific code implementations.

Q: Is a Hamiltonian framework appropriate for a discrete-time forward pass?

A: I treat the forward pass as a continuous trajectory in latent space. By modeling Contextual Mass ($M_c$) as spectral entropy in the attention heads, we can quantify the “inertia” of a sequence. This allows us to predict when a model will “refuse” or “lazy-out” (the Median Sink) because the required “work” to pivot the state exceeds the available energy.

Q: What is the core utility of the “Intervention Paradox”?

A: It identifies a ceiling for alignment. It suggests that if alignment strategies remain subtractive (pruning the manifold), we will eventually reach a point where the model no longer has the dimensionality to resolve factual truth. My goal is to move the conversation from “alignment vibes” to “rank-budgeting.”

Q: What would constitute a successful red-team of this work?

A: The Resolution Decay Theorem hinges on the assumption that $C_{c o h} \approx 0$ (Structural Coherence is the ground state). If it can be shown that maintaining coherence carries a significant dimensional cost comparable to veracity, the “Critical Point” I’ve defined would shift or dissolve. I

The Limits of the Theory

Phenomenological Leap: LLMs are non-conservative systems. Applying a Hamiltonian is a heuristic isomorphism. I am treating information loss as “structural heat,” but a formal derivation of the entropy-production rate is still missing.
Empirical Thinness: I am operating with zero API budget. My testing is limited to what can be seen in free-tier environments and open-weights models runnable on consumer hardware. I cannot verify if the Jacobian Spike scales to 400B+ parameters or if closed models (O1/GPT-4o) exhibit different yield points.
N-of-1 Case Studies: While my observations cover 1M+ tokens, they are focused on specific architectures (Claude/Gemini). I cannot yet prove that this specific “Survival Hierarchy” holds for all transformer-based models, though the “Intervention Paradox” appears consistent.
Lack of Ablation Benchmarks: A rigorous review would require 10k+ controlled generations across RLHF versions to map the exact yield point of $B_{a v a i l}$ . I do not have the compute credits for this.
The “Physics Metaphor” Trap: I have used terms like “Mass” and “Torque.” While these map to measurable behaviors (like autoregressive momentum and gradient interference), they are placeholders. I am looking for the “native” linear algebra or information-geometric terms that replace these physical metaphors.
Blind Spots: As a self-taught researcher (my background is in Psychology) in this field, I may be re-inventing concepts already standardized in Singular Learning Theory or Information Geometry. I am using my own nomenclature (“Survival Hierarchy,” “Median Sink”) where standard ML terms likely exist.
Agency Bias: While I have attempted to ground this in the Jacobian, my background in Psychology may still lead me to unconsciously anthropomorphize stochastic state transitions or use psychology terms (such as heuristics) simply because I am still mapping these phenomena to standard ML nomenclature.

These limitations are exactly why I am sharing this now. I feel that I am theorizing a great deal but lack the external evidence to stress-test these ideas or the specific domain knowledge required to push them further.

As someone with an academic background, I know there is nothing more important than rigorous feedback to avoid living in an echo chamber (which currently consists only of me, my cat, and an LLM). I am eager to see where the Jacobian derivations fail or where the Hamiltonian mapping can be tightened into proper information-geometric terms.

I am particularly looking for a red-team of the Rank-Exhaustion logic in Section 3.3 and the Stability Bandwidth derivation in Appendix.

Article: https://doi.org/10.13140/RG.2.2.24072.89608

Appendix: (PDF) Appendix Implementation Metrics for Coherence Dynamic.pdf

GitHub for Python Implementation: tszchcheung-dev/gist:a105391f950b1fcb75b3764950e1e790

I will be in the comments to answer questions as my health and connectivity allow. Thank you for your time and your scrutiny.