Lucius Bushnaq

Karma: 5,196

AI notkilleveryoneism researcher, focused on interpretability.

Personal account, opinions are my own.

I have signed no contracts or agreements whose existence I cannot mention.

[Linkpost] Interpreting Language Model Parameters

Lucius Bushnaq, Dan Braun, Oliver Clive-Griffin, Bart Bussmann, Nathan Hu, mivanitskiy, Linda Linsefors and Lee Sharkey

5 May 2026 17:37 UTC

162 points

2 comments2 min readLW link

(www.goodfire.ai)

Rotations in Superposition

Linda Linsefors and Lucius Bushnaq

15 Dec 2025 14:58 UTC

54 points

6 comments11 min readLW link

From SLT to AIT: NN generalisation out-of-distribution

Lucius Bushnaq4 Sep 2025 15:20 UTC

116 points

8 comments14 min readLW link

Circuits in Superposition 2: Now with Less Wrong Math

Linda Linsefors and Lucius Bushnaq

30 Jun 2025 10:25 UTC

73 points

0 comments22 min readLW link

[Paper] Stochastic Parameter Decomposition

Lee Sharkey, Lucius Bushnaq and Dan Braun

27 Jun 2025 16:54 UTC

47 points

14 comments1 min readLW link

(arxiv.org)

Proof idea: SLT to AIT

Lucius Bushnaq10 Feb 2025 23:14 UTC

42 points

15 comments6 min readLW link

[Question] Can we infer the search space of a local optimiser?

Lucius Bushnaq3 Feb 2025 10:17 UTC

25 points

5 comments3 min readLW link

Attribution-based parameter decomposition

Lucius Bushnaq, Dan Braun, StefanHex, jake_mendel and Lee Sharkey

25 Jan 2025 13:12 UTC

109 points

21 comments4 min readLW link

(publications.apolloresearch.ai)

Activation space interpretability may be doomed

bilalchughtai and Lucius Bushnaq

8 Jan 2025 12:49 UTC

154 points

34 comments8 min readLW link

Intricacies of Feature Geometry in Large Language Models

7vik, Lucius Bushnaq and Nandi

7 Dec 2024 18:10 UTC

72 points

2 comments12 min readLW link

Deep Learning is cheap Solomonoff induction?

Lucius Bushnaq, Kaarel and Dmitry Vaintrob

7 Dec 2024 11:00 UTC

46 points

1 comment17 min readLW link

Circuits in Superposition: Compressing many small neural networks into one

Lucius Bushnaq and jake_mendel

14 Oct 2024 13:06 UTC

131 points

9 comments13 min readLW link

The Hessian rank bounds the learning coefficient

Lucius Bushnaq8 Aug 2024 20:55 UTC

68 points

11 comments4 min readLW link

A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team

Lee Sharkey, Lucius Bushnaq, Dan Braun, StefanHex and Nicholas Goldowsky-Dill

18 Jul 2024 14:15 UTC

126 points

18 comments18 min readLW link

Lucius Bushnaq’s Shortform

Lucius Bushnaq6 Jul 2024 9:08 UTC

8 points

105 comments1 min readLW link

Apollo Research 1-year update

Marius Hobbhahn, Lee Sharkey, Lucius Bushnaq, Dan Braun, Mikita Balesni, Jérémy Scheurer, Nicholas Goldowsky-Dill, StefanHex, jake_mendel, Alex Meinke and rusheb

29 May 2024 17:44 UTC

93 points

0 comments7 min readLW link

Interpretability: Integrated Gradients is a decent attribution method

Lucius Bushnaq, jake_mendel, StefanHex and Kaarel

20 May 2024 17:55 UTC

24 points

7 comments6 min readLW link

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Lucius Bushnaq, jake_mendel, Dan Braun, StefanHex, Nicholas Goldowsky-Dill, Kaarel, Avery, Joern Stoehler, debrevitatevitae, Magdalena Wache and Marius Hobbhahn

20 May 2024 17:53 UTC

108 points

4 comments3 min readLW link

Charbel-Raphaël and Lucius discuss interpretability

Mateusz Bagiński, Charbel-Raphaël and Lucius Bushnaq

30 Oct 2023 5:50 UTC

112 points

7 comments21 min readLW link

Announcing Apollo Research

Marius Hobbhahn, beren, Lee Sharkey, Lucius Bushnaq, Dan Braun, Mikita Balesni and Jérémy Scheurer

30 May 2023 16:17 UTC

226 points

11 comments8 min readLW link