Superposition

TagLast edit: 5 Dec 2023 20:41 UTC by duck_master

Posts about the concept of superposition—that is, neural nets representing concepts as a superposition of many neurons.

[Interim research report] Taking features out of superposition with sparse autoencoders

Lee Sharkey, Dan Braun and beren

13 Dec 2022 15:41 UTC

155 points

23 comments22 min readLW link 2 reviews

Superposition is not “just” neuron polysemanticity

LawrenceC26 Apr 2024 23:22 UTC

69 points

4 comments13 min readLW link

Toy Models of Superposition in the dense regime

Morpheus and Andre Assis

25 Nov 2025 2:12 UTC

6 points

0 comments7 min readLW link

Circuits in Superposition 2: Now with Less Wrong Math

Linda Linsefors and Lucius Bushnaq

30 Jun 2025 10:25 UTC

73 points

0 comments22 min readLW link

Toward A Mathematical Framework for Computation in Superposition

Dmitry Vaintrob, jake_mendel and Kaarel

18 Jan 2024 21:06 UTC

213 points

19 comments63 min readLW link

Circuits in Superposition: Compressing many small neural networks into one

Lucius Bushnaq and jake_mendel

14 Oct 2024 13:06 UTC

131 points

9 comments13 min readLW link

Ping pong computation in superposition

Alex Gibson29 Dec 2025 16:31 UTC

13 points

0 comments3 min readLW link

Conditional Importance in Toy Models of Superposition

james__p2 Feb 2025 20:35 UTC

9 points

4 comments10 min readLW link

Some costs of superposition

Linda Linsefors3 Mar 2024 16:08 UTC

46 points

11 comments3 min readLW link

Rotations in Superposition

Linda Linsefors and Lucius Bushnaq

15 Dec 2025 14:58 UTC

54 points

6 comments11 min readLW link

Thoughts on Toy Models of Superposition

james__p2 Feb 2025 13:52 UTC

5 points

2 comments9 min readLW link

AI alignment as a translation problem

Roman Leventov5 Feb 2024 14:14 UTC

23 points

2 comments3 min readLW link

From Conceptual Spaces to Quantum Concepts: Formalising and Learning Structured Conceptual Models

Roman Leventov6 Feb 2024 10:18 UTC

8 points

1 comment4 min readLW link

(arxiv.org)

An OV-Coherent Toy Model of Attention Head Superposition

Lauren Greenspan and keith_wynroe

29 Aug 2023 19:44 UTC

26 points

2 comments6 min readLW link

Sparse autoencoders find composed features in small toy models

Evan Anders, Clement Neo, Jason Hoelscher-Obermaier and Jessica N. Howard

14 Mar 2024 18:00 UTC

33 points

12 comments15 min readLW link

Superposition through Active Learning Lens

akankshanc17 Sep 2024 17:32 UTC

1 point

0 comments10 min readLW link

Effects of Non-Uniform Sparsity on Superposition in Toy Models

Shreyans Jain14 Nov 2024 16:59 UTC

4 points

3 comments6 min readLW link

Scaling Laws and Superposition

Pavan Katta10 Apr 2024 15:36 UTC

9 points

4 comments5 min readLW link

(www.pavankatta.com)

Expanding the Scope of Superposition

Derek Larson13 Sep 2023 17:38 UTC

10 points

0 comments4 min readLW link

Untitled Draft

Me19 Dec 2025 7:48 UTC

1 point

0 comments9 min readLW link

Some open-source dictionaries and dictionary learning infrastructure

Sam Marks5 Dec 2023 6:05 UTC

46 points

7 comments5 min readLW link

Toy Models of Superposition

evhub21 Sep 2022 23:48 UTC

69 points

4 comments5 min readLW link 1 review

(transformer-circuits.pub)

Alternative Models of Superposition

zroe1 and RGRGRG

11 Aug 2025 15:52 UTC

20 points

6 comments5 min readLW link

Superposition and Dropout

Edoardo Pona16 May 2023 7:24 UTC

21 points

5 comments6 min readLW link

Growth and Form in a Toy Model of Superposition

Liam Carroll and Edmund Lau

8 Nov 2023 11:08 UTC

91 points

7 comments14 min readLW link

Paper: Superposition, Memorization, and Double Descent (Anthropic)

LawrenceC5 Jan 2023 17:54 UTC

53 points

11 comments1 min readLW link

(transformer-circuits.pub)

200 COP in MI: Exploring Polysemanticity and Superposition

Neel Nanda3 Jan 2023 1:52 UTC

34 points

6 comments16 min readLW link

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Zac Hatfield-Dodds5 Oct 2023 21:01 UTC

289 points

22 comments2 min readLW link 1 review

(transformer-circuits.pub)

Limitations on the Interpretability of Learned Features from Sparse Dictionary Learning

Tom Angsten30 Jul 2024 16:36 UTC

6 points

0 comments9 min readLW link

Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small

Joseph Bloom2 Feb 2024 6:54 UTC

103 points

37 comments15 min readLW link

Comparing Anthropic’s Dictionary Learning to Ours

Robert_AIZI7 Oct 2023 23:30 UTC

137 points

8 comments4 min readLW link

Computational Superposition in a Toy Model of the U-AND Problem

Adam Newgas27 Mar 2025 16:56 UTC

18 points

2 comments11 min readLW link

Toy Models of Superposition: Simplified by Hand

Axel Sorensen29 Sep 2024 21:19 UTC

9 points

3 comments8 min readLW link

Taking features out of superposition with sparse autoencoders more quickly with informed initialization

Pierre Peigné23 Sep 2023 16:21 UTC

30 points

8 comments5 min readLW link

Sparse MLP Distillation

slavachalnev15 Jan 2024 19:39 UTC

30 points

3 comments6 min readLW link

Crafting Polysemantic Transformer Benchmarks with Known Circuits

Evan Anders and Adrià Garriga-alonso

23 Aug 2024 22:03 UTC

17 points

0 comments25 min readLW link

Interpretability with Sparse Autoencoders (Colab exercises)

CallumMcDougall29 Nov 2023 12:56 UTC

80 points

9 comments4 min readLW link

No comments.