Grokking (ML)

TagLast edit: 29 Feb 2024 5:58 UTC by Morpheus

A Phenomenon in machine learning where a machine learning model generalizes to a test set only long after it achieved perfect loss on the training set.

Grokking, memorization, and generalization — a discussion

Kaarel and Dmitry Vaintrob

29 Oct 2023 23:17 UTC

75 points

11 comments23 min readLW link

Paper+Summary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA

Marius Hobbhahn4 Oct 2022 7:22 UTC

46 points

11 comments1 min readLW link

(arxiv.org)

What if the dangerous moment isn’t when AI gets smarter, but when it starts trusting itself?

Qien Huang12 Jan 2026 19:44 UTC

1 point

0 comments5 min readLW link

A Simple Method for Accelerating Grokking

josh :)24 Jan 2026 3:19 UTC

14 points

1 comment3 min readLW link

AXRP Episode 29 - Science of Deep Learning with Vikrant Varma

DanielFilan25 Apr 2024 19:10 UTC

20 points

1 comment63 min readLW link

A short project on Mamba: grokking & interpretability

Alejandro Tlaie18 Oct 2024 16:59 UTC

21 points

0 comments6 min readLW link

Ambiguous out-of-distribution generalization on an algorithmic task

Wilson Wu and Louis Jaburi

13 Feb 2025 18:24 UTC

84 points

6 comments11 min readLW link

A Mechanistic Interpretability Analysis of Grokking

Neel Nanda and Tom Lieberum

15 Aug 2022 2:41 UTC

378 points

48 comments36 min readLW link 1 review

(colab.research.google.com)

[Proposal] Isomorphic Consolidation: A Protocol for Continuous Entropy Reduction via Offline Topology Search

Valen28 Nov 2025 3:11 UTC

1 point

0 comments2 min readLW link

LLMs Still Suck at Logical Reasoning

anovikov18 Jul 2025 18:35 UTC

1 point

0 comments2 min readLW link

An interactive introduction to grokking and mechanistic interpretability

Adam Pearce and Asma Ghandeharioun

7 Aug 2023 19:09 UTC

23 points

3 comments1 min readLW link

(pair.withgoogle.com)

Minor interpretability exploration #1: Grokking of modular addition, subtraction, multiplication, for different activation functions

Rareș Baron26 Feb 2025 11:35 UTC

5 points

13 comments4 min readLW link

Hypothesis: Grokking is a Reachability Phase Transition driven by Mechanistic Description Length (RMDL)

Rio Tsukatsuki26 Jan 2026 10:43 UTC

1 point

0 comments1 min readLW link

Explaining grokking through circuit efficiency

Vikrant Varma and Rohin Shah

8 Sep 2023 14:39 UTC

102 points

11 comments3 min readLW link

(arxiv.org)

Irreducible representations versus cosets: a discriminating experiment on a same-character-table group pair

Brook Stefanou15 Jun 2026 6:09 UTC

1 point

0 comments18 min readLW link

(brook-stefanou.github.io)

Grokking Beyond Neural Networks

Jack Miller30 Oct 2023 17:28 UTC

10 points

0 comments2 min readLW link

(arxiv.org)

QAPR 5: grokking is maybe not that big a deal?

Quintin Pope23 Jul 2023 20:14 UTC

116 points

15 comments9 min readLW link

Mesa-Optimizers via Grokking

orthonormal6 Dec 2022 20:05 UTC

36 points

4 comments6 min readLW link

The slingshot helps with learning

Wilson Wu31 Oct 2024 23:18 UTC

33 points

0 comments8 min readLW link

No comments.