RSS

robertzk

Karma: 810

Re­sist­ing Reality

robertzk22 Jan 2026 13:50 UTC
26 points
3 comments6 min readLW link

robertzk’s Shortform

robertzk8 Dec 2025 17:27 UTC
5 points
27 comments1 min readLW link

SAEs are highly dataset de­pen­dent: a case study on the re­fusal direction

7 Nov 2024 5:22 UTC
67 points
4 comments14 min readLW link

Open Source Repli­ca­tion of An­thropic’s Cross­coder pa­per for model-diffing

27 Oct 2024 18:46 UTC
48 points
4 comments5 min readLW link

Base LLMs re­fuse too

29 Sep 2024 16:04 UTC
61 points
20 comments10 min readLW link

SAEs (usu­ally) Trans­fer Between Base and Chat Models

18 Jul 2024 10:29 UTC
67 points
0 comments10 min readLW link

At­ten­tion Out­put SAEs Im­prove Cir­cuit Analysis

21 Jun 2024 12:56 UTC
33 points
3 comments19 min readLW link

We In­spected Every Head In GPT-2 Small us­ing SAEs So You Don’t Have To

6 Mar 2024 5:03 UTC
63 points
0 comments12 min readLW link