TL;DR: I propose a framework (RMDL) combining MDL and Dynamical Reachability. It suggests that Grokking is not just “learning,” but a thermodynamic transition from a high-entropy “memorization basin” to a low-RMDL “generalization basin,” gated by an effective temperature (Teff) from SGD noise.
The Core Argument:
We know Grokking happens. We have progress measures (Nanda et al.). But why does the model snap?
Optimization ≠ Reachability: A solution can exist but be dynamically unreachable under current noise/stability constraints.
RMDL Attractors: Training dynamics drift towards lower Reachable Mechanistic Description Length.
Critical Dimension (dRio): There exists a critical effective capacity below which the generalization basin is topologically unreachable .
Why this matters for Mech Interp: It connects circuits (discrete mechanisms) to loss landscapes (continuous geometry). It predicts that “clean circuits” are just the low-energy states of the description length metric .
Falsifiable Predictions: I outline 7 specific predictions in the paper , including:
MDL proxies drop before the accuracy cliff.
Noise injection has a non-monotonic effect on plateau duration (Barrier Crossing).
Request for Critique:
I am an undergrad student, and this is an attempt to formalize intuitions from physics into the language of MI. I’m looking for reasons why this thermodynamic analogy might break down in high-dimensional transformers.
Hypothesis: Grokking is a Reachability Phase Transition driven by Mechanistic Description Length (RMDL)
TL;DR: I propose a framework (RMDL) combining MDL and Dynamical Reachability. It suggests that Grokking is not just “learning,” but a thermodynamic transition from a high-entropy “memorization basin” to a low-RMDL “generalization basin,” gated by an effective temperature (Teff) from SGD noise.
The Core Argument:
We know Grokking happens. We have progress measures (Nanda et al.). But why does the model snap?
My paper [Link to Zenodo] proposes:
Optimization ≠ Reachability: A solution can exist but be dynamically unreachable under current noise/stability constraints.
RMDL Attractors: Training dynamics drift towards lower Reachable Mechanistic Description Length.
Critical Dimension (dRio): There exists a critical effective capacity below which the generalization basin is topologically unreachable .
Why this matters for Mech Interp: It connects circuits (discrete mechanisms) to loss landscapes (continuous geometry). It predicts that “clean circuits” are just the low-energy states of the description length metric .
Falsifiable Predictions: I outline 7 specific predictions in the paper , including:
MDL proxies drop before the accuracy cliff.
Noise injection has a non-monotonic effect on plateau duration (Barrier Crossing).
Request for Critique:
I am an undergrad student, and this is an attempt to formalize intuitions from physics into the language of MI. I’m looking for reasons why this thermodynamic analogy might break down in high-dimensional transformers.