RSS

Arthur Conmy

Karma: 1,232

Intepretability

Views my own

Ex­tract­ing SAE task fea­tures for in-con­text learning

12 Aug 2024 20:34 UTC
30 points
1 comment9 min readLW link

Self-ex­plain­ing SAE features

5 Aug 2024 22:20 UTC
57 points
13 comments10 min readLW link

JumpReLU SAEs + Early Ac­cess to Gemma 2 SAEs

19 Jul 2024 16:10 UTC
48 points
10 comments1 min readLW link
(storage.googleapis.com)

SAEs (usu­ally) Trans­fer Between Base and Chat Models

18 Jul 2024 10:29 UTC
51 points
0 comments10 min readLW link

At­ten­tion Out­put SAEs Im­prove Cir­cuit Analysis

21 Jun 2024 12:56 UTC
31 points
0 comments19 min readLW link