RSS

robertzk

Karma: 466

We In­spected Every Head In GPT-2 Small us­ing SAEs So You Don’t Have To

6 Mar 2024 5:03 UTC
56 points
0 comments12 min readLW link

At­ten­tion SAEs Scale to GPT-2 Small

3 Feb 2024 6:50 UTC
76 points
4 comments8 min readLW link

Sparse Au­toen­coders Work on At­ten­tion Layer Outputs

16 Jan 2024 0:26 UTC
82 points
5 comments19 min readLW link

Train­ing Pro­cess Trans­parency through Gra­di­ent In­ter­pretabil­ity: Early ex­per­i­ments on toy lan­guage models

21 Jul 2023 14:52 UTC
56 points
1 comment1 min readLW link

Get­ting up to Speed on the Speed Prior in 2022

robertzk28 Dec 2022 7:49 UTC
36 points
5 comments65 min readLW link

Emily Brontë on: Psy­chol­ogy Re­quired for Se­ri­ous™ AGI Safety Research

robertzk14 Sep 2022 14:47 UTC
2 points
0 comments1 min readLW link