When Does the Local Learning Coefficient Track Circuit Formation?

There is a lot of justified hype right now around applying Singular Learning Theory (SLT) to mechanistic interpretability. We all desperately want a magic, noise-tolerant number that tells us, “Ah, yes, the model just sculpted a meaningful circuit here”.

Recently, the Local Learning Coefficient (LLC) has been floated as the prime candidate for this. The theory is elegant: compute the LLC across checkpoints, and watch it respond when the model undergoes a phase transition.

I wanted to actually pressure-test this. I ran the code, stared at the loss landscapes, and recently narrated the core findings to my colleagues to get the manuscript drafted. The TL;DR? Gradient descent is incredibly messy, and the strongest claims about LLC being a standalone circuit-tracker are basically a mirage.

Here is what actually happens when you put LLC to the test, and why we need to seriously rethink how we build observability tooling.

The Setup: Hunting for the “Jump”

I went back to the absolute basics where MLP circuit formation is well-understood: the Toy Models of Superposition ().

It is a one-hidden-layer ReLU MLP with tied weights.
The network is just trying to compress sparse features into a 2-dimensional bottleneck.
The hallmark event we are looking for is a discrete dimensionality-jump, where the effective rank of the weight matrix suddenly pops from ~1.81 to 2.00.

The Polytope Gallery

In theory, the LLC should cleanly track this phase transition. And to be fair, conditionally, it does. If a perfectly clean jump occurs, the LLC at the inflection point is consistently elevated above random cut-points within that same trajectory

The problem? That “clean jump” is a statistical unicorn.

The Single-Seed Fallacy

We have a bad habit in interpretability of running a single seed, getting a beautiful graph, and assuming we’ve mapped the “ground truth” algorithm. These experiments prove how wildly dangerous that assumption is.

I ran 20 seeds at the exact same canonical configuration. Only about 25% of them actually exhibited the textbook plateau-then-jump trajectory.

Seed Grid showing the sigmoid fit to effective-rank trajectories

Seed Grid showing the sigmoid fit to effective-rank trajectories

If a tiny toy MLP gets permanently stuck in a sub-optimal basin 1 out of 5 times, imagine the path-dependency when training massive predictive architectures.

Non-Monotonic Dynamics with pairwise cosine similarity graphs

Non-Monotonic Dynamics with pairwise cosine similarity graphs

The Nail in the Coffin: The Sparsity Sweep

The most damning piece of evidence comes from my multi-sparsity factor analysis, which outright falsifies the claim that “LLC drift tracks circuit formation”.

If LLC drift is actually tracking circuits, it should be relatively quiet when no circuits are forming. I ran a sweep at a sparsity of 0.3, a regime where the network keeps exactly two features and exhibits zero superposition (meaning no phase transition occurs).

Sparsity Sweep showing LLC drift across 0.3, 0.5, 0.7, and 0.9

Sparsity Sweep showing LLC drift across 0.3, 0.5, 0.7, and 0.9

The data is brutal: LLC drift at sparsity 0.3 was roughly 3x larger than the drift at the canonical 0.7 sparsity.

LLC drift is not a reliable circuit-level signal. It is heavily dominated by general training-time evolution that varies across regimes in ways completely unrelated to circuit formation. Furthermore, I found that feature emergence and the rank-jump are entirely distinct events. Feature emergence happens early (around step 700), while the rank-jump happens late. The measurable LLC effect is actually tracking that late rank-jump, not the early feature emergence.

The Takeaway for DevInterp

LLC is mathematically beautiful, but it won’t save us from the messy reality of loss landscapes. If we want to build robust diagnostic protocols, we have to adjust our methods:

Kill the single-seed analysis: Trajectory heterogeneity is a massive methodological flaw. If you validate a metric on one checkpoint, you might just be measuring the quirks of a specific local basin.
LLC is a passenger, not the driver: LLC drift is too noisy to be a standalone progress measure. It only provides real value as a confirmatory metric when paired tightly with other indicators like effective-rank trajectories or feature-norm dynamics.

We need multi-metric observability tooling that embraces the chaos, rather than hoping one single scalar will smooth it all out.

^
References
Elhage, N., et al. (2022). Toy Models of Superposition. Published in the Transformer Circuits Thread. Link: https://transformer-circuits.pub/2022/toy_model/index.html
Olsson, C., et al. (2022). In-context Learning and Induction Heads. Published in the Transformer Circuits Thread / arXiv. Link: https://arxiv.org/abs/2209.11895
Wang, G., et al. (2025). Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient. ICLR. Link: https://arxiv.org/abs/2410.02984
Hoogland, J., et al. (2024). Loss Landscape Degeneracy and Stagewise Development in Transformers. Link: https://arxiv.org/abs/2402.02364
Lau, E., Murfet, D., Wei, S. (2023). The Local Learning Coefficient: A Singularity-Aware Complexity Measure. Link: https://arxiv.org/abs/2308.12108
Sharkey, L., et al. (2025). Open Problems in Mechanistic Interpretability. Link: https://arxiv.org/abs/2501.16496
Singh, A., et al. (2024). What Needs to Go Right for an Induction Head? A Mechanistic Study of In-Context Learning Circuits and Their Formation. ICML. Link: https://arxiv.org/abs/2404.07129
Nanda, N., et al. (2023). Progress Measures for Grokking via Mechanistic Interpretability. ICLR. Link: https://arxiv.org/abs/2301.05217
Biderman, S., et al. (2023). Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling. Link: https://arxiv.org/abs/2304.01373

When Does the Local Learning Coefficient Track Circuit Formation?

The Setup: Hunting for the “Jump”

The Nail in the Coffin: The Sparsity Sweep

The Takeaway for DevInterp

Error