Tom Lieberum

Karma: 1,069

Research Engineer at DeepMind, focused on mechanistic interpretability and large language models. Opinions are my own.

Announcing Gemma Scope 2

CallumMcDougall, Arthur Conmy, János Kramár, Tom Lieberum, Senthooran Rajamanoharan and Neel Nanda

22 Dec 2025 21:56 UTC

96 points

1 comment2 min readLW link

Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)

lewis smith, Senthooran Rajamanoharan, Arthur Conmy, CallumMcDougall, Tom Lieberum, János Kramár, Rohin Shah and Neel Nanda

26 Mar 2025 19:07 UTC

117 points

15 comments29 min readLW link

(deepmindsafetyresearch.medium.com)

JumpReLU SAEs + Early Access to Gemma 2 SAEs

Senthooran Rajamanoharan, Tom Lieberum, nps29, Arthur Conmy, Vikrant Varma, János Kramár and Neel Nanda

19 Jul 2024 16:10 UTC

55 points

10 comments1 min readLW link

(storage.googleapis.com)

Improving Dictionary Learning with Gated Sparse Autoencoders

Senthooran Rajamanoharan, Arthur Conmy, lewis smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah and Neel Nanda

25 Apr 2024 18:43 UTC

63 points

38 comments1 min readLW link

(arxiv.org)

[Full Post] Progress Update #1 from the GDM Mech Interp Team

Neel Nanda, Arthur Conmy, lewis smith, Senthooran Rajamanoharan, Tom Lieberum, János Kramár and Vikrant Varma

19 Apr 2024 19:06 UTC

80 points

10 comments8 min readLW link

[Summary] Progress Update #1 from the GDM Mech Interp Team

Neel Nanda, Arthur Conmy, lewis smith, Senthooran Rajamanoharan, Tom Lieberum, János Kramár and Vikrant Varma

19 Apr 2024 19:06 UTC

73 points

0 comments3 min readLW link

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Neel Nanda, János Kramár, Tom Lieberum and Rohin Shah

18 Mar 2024 17:28 UTC

19 points

0 comments1 min readLW link

(arxiv.org)

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Neel Nanda, Tom Lieberum, Matthew Rahtz, János Kramár, Geoffrey Irving, Rohin Shah and Vlad Mikulik

20 Jul 2023 10:50 UTC

44 points

3 comments2 min readLW link

(arxiv.org)

A Mechanistic Interpretability Analysis of Grokking

Neel Nanda and Tom Lieberum

15 Aug 2022 2:41 UTC

375 points

48 comments36 min readLW link 1 review

(colab.research.google.com)

Investigating causal understanding in LLMs

Marius Hobbhahn and Tom Lieberum

14 Jun 2022 13:57 UTC

28 points

6 comments13 min readLW link

Thoughts on Formalizing Composition

Tom Lieberum7 Jun 2022 7:51 UTC

13 points

0 comments7 min readLW link

Understanding the tensor product formulation in Transformer Circuits

Tom Lieberum24 Dec 2021 18:05 UTC

16 points

2 comments3 min readLW link

[Question] How should my timelines influence my career choice?

Tom Lieberum3 Aug 2021 10:14 UTC

13 points

10 comments1 min readLW link