Hypothesis: Grokking is a Reachability Phase Transition driven by Mechanistic Description Length (RMDL)

TL;DR: I propose a framework (RMDL) combining MDL and Dynamical Reachability. It suggests that Grokking is not just “learning,” but a thermodynamic transition from a high-entropy “memorization basin” to a low-RMDL “generalization basin,” gated by an effective temperature () from SGD noise.

The Core Argument:

We know Grokking happens. We have progress measures (Nanda et al.). But why does the model snap?

My paper [Link to Zenodo] proposes:

  1. Optimization Reachability: A solution can exist but be dynamically unreachable under current noise/​stability constraints.

  2. RMDL Attractors: Training dynamics drift towards lower Reachable Mechanistic Description Length.

  3. Critical Dimension (): There exists a critical effective capacity below which the generalization basin is topologically unreachable .

Why this matters for Mech Interp: It connects circuits (discrete mechanisms) to loss landscapes (continuous geometry). It predicts that “clean circuits” are just the low-energy states of the description length metric .

Falsifiable Predictions: I outline 7 specific predictions in the paper , including:

  • MDL proxies drop before the accuracy cliff.

  • Noise injection has a non-monotonic effect on plateau duration (Barrier Crossing).

Request for Critique:

I am an undergrad student, and this is an attempt to formalize intuitions from physics into the language of MI. I’m looking for reasons why this thermodynamic analogy might break down in high-dimensional transformers.


No comments.