RSS

Lucius Bushnaq

Karma: 4,530

AI notkilleveryoneism researcher, focused on interpretability.

Personal account, opinions are my own.

I have signed no contracts or agreements whose existence I cannot mention.

From SLT to AIT: NN gen­er­al­i­sa­tion out-of-distribution

Lucius Bushnaq4 Sep 2025 15:20 UTC
114 points
8 comments14 min readLW link

Cir­cuits in Su­per­po­si­tion 2: Now with Less Wrong Math

30 Jun 2025 10:25 UTC
72 points
0 comments20 min readLW link

[Paper] Stochas­tic Pa­ram­e­ter Decomposition

27 Jun 2025 16:54 UTC
47 points
14 comments1 min readLW link
(arxiv.org)

Proof idea: SLT to AIT

Lucius Bushnaq10 Feb 2025 23:14 UTC
42 points
15 comments6 min readLW link

[Question] Can we in­fer the search space of a lo­cal op­ti­miser?

Lucius Bushnaq3 Feb 2025 10:17 UTC
25 points
5 comments3 min readLW link

At­tri­bu­tion-based pa­ram­e­ter decomposition

25 Jan 2025 13:12 UTC
108 points
21 comments4 min readLW link
(publications.apolloresearch.ai)

Ac­ti­va­tion space in­ter­pretabil­ity may be doomed

8 Jan 2025 12:49 UTC
152 points
35 comments8 min readLW link

In­tri­ca­cies of Fea­ture Geom­e­try in Large Lan­guage Models

7 Dec 2024 18:10 UTC
71 points
0 comments12 min readLW link

Deep Learn­ing is cheap Solomonoff in­duc­tion?

7 Dec 2024 11:00 UTC
45 points
1 comment17 min readLW link

Cir­cuits in Su­per­po­si­tion: Com­press­ing many small neu­ral net­works into one

14 Oct 2024 13:06 UTC
131 points
9 comments13 min readLW link

The Hes­sian rank bounds the learn­ing coefficient

Lucius Bushnaq8 Aug 2024 20:55 UTC
68 points
11 comments4 min readLW link

A List of 45+ Mech In­terp Pro­ject Ideas from Apollo Re­search’s In­ter­pretabil­ity Team

18 Jul 2024 14:15 UTC
124 points
18 comments18 min readLW link

Lu­cius Bush­naq’s Shortform

Lucius Bushnaq6 Jul 2024 9:08 UTC
8 points
105 comments1 min readLW link

Apollo Re­search 1-year update

29 May 2024 17:44 UTC
93 points
0 comments7 min readLW link

In­ter­pretabil­ity: In­te­grated Gra­di­ents is a de­cent at­tri­bu­tion method

20 May 2024 17:55 UTC
23 points
7 comments6 min readLW link

The Lo­cal In­ter­ac­tion Ba­sis: Iden­ti­fy­ing Com­pu­ta­tion­ally-Rele­vant and Sparsely In­ter­act­ing Fea­tures in Neu­ral Networks

20 May 2024 17:53 UTC
108 points
4 comments3 min readLW link

Char­bel-Raphaël and Lu­cius dis­cuss interpretability

30 Oct 2023 5:50 UTC
112 points
7 comments21 min readLW link

An­nounc­ing Apollo Research

30 May 2023 16:17 UTC
217 points
11 comments8 min readLW link

Basin broad­ness de­pends on the size and num­ber of or­thog­o­nal features

27 Aug 2022 17:29 UTC
36 points
21 comments6 min readLW link

What Is The True Name of Mo­du­lar­ity?

1 Jul 2022 14:55 UTC
39 points
10 comments12 min readLW link