CallumMcDougall

Karma: 2,504

New ARENA material: 8 exercise sets on alignment science & interpretability

CallumMcDougall27 Feb 2026 17:37 UTC

103 points

1 comment7 min readLW link

ARENA 8.0 - Call for Applicants

JScriven, JamesH, David Quarel and CallumMcDougall

20 Feb 2026 18:28 UTC

31 points

1 comment6 min readLW link

Announcing Gemma Scope 2

CallumMcDougall, Arthur Conmy, János Kramár, Tom Lieberum, Senthooran Rajamanoharan and Neel Nanda

22 Dec 2025 21:56 UTC

96 points

1 comment2 min readLW link

Transmitting Misalignment with Subliminal Learning via Paraphrasing

Matthew Bozoukov, Taywon Min, CallumMcDougall and J Rosser

17 Dec 2025 19:34 UTC

39 points

0 comments10 min readLW link

How Can Interpretability Researchers Help AGI Go Well?

Neel Nanda, Josh Engels, Senthooran Rajamanoharan, Arthur Conmy, bilalchughtai, CallumMcDougall, János Kramár and lewis smith

1 Dec 2025 13:05 UTC

67 points

1 comment14 min readLW link

A Pragmatic Vision for Interpretability

Neel Nanda, Josh Engels, Arthur Conmy, Senthooran Rajamanoharan, bilalchughtai, CallumMcDougall, János Kramár and lewis smith

1 Dec 2025 13:05 UTC

136 points

39 comments27 min readLW link

ARENA 7.0 - Call for Applicants

JScriven, JamesH, CallumMcDougall and David Quarel

30 Sep 2025 14:54 UTC

27 points

1 comment6 min readLW link

ARENA 6.0 - Call for Applicants

JamesH, JScriven, David Quarel, CallumMcDougall and James Fox

4 Jun 2025 10:19 UTC

26 points

3 comments6 min readLW link

New Cause Area Proposal

CallumMcDougall1 Apr 2025 7:12 UTC

110 points

4 comments1 min readLW link

Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)

lewis smith, Senthooran Rajamanoharan, Arthur Conmy, CallumMcDougall, Tom Lieberum, János Kramár, Rohin Shah and Neel Nanda

26 Mar 2025 19:07 UTC

117 points

15 comments29 min readLW link

(deepmindsafetyresearch.medium.com)

ARENA 5.0 - Call for Applicants

JamesH, James Fox, CallumMcDougall, Chloe Li and David Quarel

30 Jan 2025 13:18 UTC

35 points

2 comments6 min readLW link

Scaling Sparse Feature Circuit Finding to Gemma 9B

Diego Caples, Jatin Nainani, CallumMcDougall and rrenaud

10 Jan 2025 11:08 UTC

88 points

11 comments17 min readLW link

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders

Can, Adam Karvonen, Johnny Lin, Curt Tigges, Joseph Bloom, chanind, Yeu-Tong Lau, Eoin Farrell, Arthur Conmy, CallumMcDougall, Kola Ayonrinde, Matthew Wearden, Sam Marks and Neel Nanda

11 Dec 2024 6:30 UTC

82 points

6 comments2 min readLW link

(www.neuronpedia.org)

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0

James Fox, Chloe Li, JamesH, Gracie Green and CallumMcDougall

6 Jul 2024 11:34 UTC

57 points

7 comments6 min readLW link

How ARENA course material gets made

CallumMcDougall2 Jul 2024 18:04 UTC

41 points

2 comments7 min readLW link

A Selection of Randomly Selected SAE Features

CallumMcDougall and Joseph Bloom

1 Apr 2024 9:09 UTC

109 points

2 comments4 min readLW link

SAE-VIS: Announcement Post

CallumMcDougall and Joseph Bloom

31 Mar 2024 15:30 UTC

74 points

8 comments1 min readLW link

Mech Interp Challenge: January—Deciphering the Caesar Cipher Model

CallumMcDougall1 Jan 2024 18:03 UTC

17 points

0 comments3 min readLW link

Interpretability with Sparse Autoencoders (Colab exercises)

CallumMcDougall29 Nov 2023 12:56 UTC

83 points

9 comments4 min readLW link

AI Alignment Research Engineer Accelerator (ARENA): call for applicants

CallumMcDougall7 Nov 2023 9:43 UTC

56 points

0 comments10 min readLW link