jake_mendel

Karma: 1,478

technical AI safety program associate at OpenPhil

Three positive updates I made about technical grantmaking at Coefficient Giving (fka Open Phil)

jake_mendel26 Nov 2025 1:09 UTC

127 points

3 comments6 min readLW link

Research directions Open Phil wants to fund in technical AI safety

jake_mendel, maxnadeau and Peter Favaloro

8 Feb 2025 1:40 UTC

117 points

21 comments58 min readLW link

(www.openphilanthropy.org)

Open Philanthropy Technical AI Safety RFP - $40M Available Across 21 Research Areas

jake_mendel, maxnadeau and Peter Favaloro

6 Feb 2025 18:58 UTC

111 points

0 comments1 min readLW link

(www.openphilanthropy.org)

Attribution-based parameter decomposition

Lucius Bushnaq, Dan Braun, StefanHex, jake_mendel and Lee Sharkey

25 Jan 2025 13:12 UTC

108 points

21 comments4 min readLW link

(publications.apolloresearch.ai)

Circuits in Superposition: Compressing many small neural networks into one

Lucius Bushnaq and jake_mendel

14 Oct 2024 13:06 UTC

131 points

9 comments13 min readLW link

jake_mendel’s Shortform

jake_mendel19 Sep 2024 10:37 UTC

5 points

3 comments1 min readLW link

[Interim research report] Activation plateaus & sensitive directions in GPT2

StefanHex and jake_mendel

5 Jul 2024 17:05 UTC

65 points

2 comments5 min readLW link

SAE feature geometry is outside the superposition hypothesis

jake_mendel24 Jun 2024 16:07 UTC

229 points

18 comments11 min readLW link 1 review

Apollo Research 1-year update

Marius Hobbhahn, Lee Sharkey, Lucius Bushnaq, Dan Braun, Mikita Balesni, Jérémy Scheurer, Nicholas Goldowsky-Dill, StefanHex, jake_mendel, AlexMeinke and rusheb

29 May 2024 17:44 UTC

93 points

0 comments7 min readLW link

Interpretability: Integrated Gradients is a decent attribution method

Lucius Bushnaq, jake_mendel, StefanHex and Kaarel

20 May 2024 17:55 UTC

23 points

7 comments6 min readLW link

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Lucius Bushnaq, jake_mendel, Dan Braun, StefanHex, Nicholas Goldowsky-Dill, Kaarel, Avery, Joern Stoehler, debrevitatevitae, Magdalena Wache and Marius Hobbhahn

20 May 2024 17:53 UTC

108 points

4 comments3 min readLW link

A starting point for making sense of task structure (in machine learning)

Kaarel, RP and jake_mendel

24 Feb 2024 1:51 UTC

51 points

2 comments12 min readLW link

Toward A Mathematical Framework for Computation in Superposition

Dmitry Vaintrob, jake_mendel and Kaarel

18 Jan 2024 21:06 UTC

213 points

19 comments63 min readLW link