Lucius Bushnaq

Karma: 1,826

AI notkilleveryoneism researcher at Apollo, focused on interpretability.

A List of 45+ Mech In­terp Pro­ject Ideas from Apollo Re­search’s In­ter­pretabil­ity Team

18 Jul 2024 14:15 UTC
113 points
17 comments18 min readLW link

Lu­cius Bush­naq’s Shortform

Lucius Bushnaq6 Jul 2024 9:08 UTC
6 points
14 comments1 min readLW link

Apollo Re­search 1-year update

29 May 2024 17:44 UTC
92 points
0 comments7 min readLW link

In­ter­pretabil­ity: In­te­grated Gra­di­ents is a de­cent at­tri­bu­tion method

20 May 2024 17:55 UTC
16 points
7 comments6 min readLW link

The Lo­cal In­ter­ac­tion Ba­sis: Iden­ti­fy­ing Com­pu­ta­tion­ally-Rele­vant and Sparsely In­ter­act­ing Fea­tures in Neu­ral Networks

20 May 2024 17:53 UTC
103 points
4 comments3 min readLW link