RSS

Patrick Leask

Karma: 406

Es­ti­mat­ing No-CoT Task-Com­ple­tion Time Hori­zons of Fron­tier AI Models

10 Jun 2026 17:58 UTC
115 points
0 comments4 min readLW link

The safe-to-dan­ger­ous shift is a fun­da­men­tal prob­lem for eval re­al­ism; but also for mea­sur­ing awareness

14 May 2026 17:05 UTC
58 points
2 comments3 min readLW link

Can mod­els gra­di­ent hack SFT elic­i­ta­tion?

11 Mar 2026 18:18 UTC
50 points
5 comments3 min readLW link

Learn­ing Multi-Level Fea­tures with Ma­tryoshka SAEs

19 Dec 2024 15:59 UTC
46 points
6 comments11 min readLW link

Show­ing SAE La­tents Are Not Atomic Us­ing Meta-SAEs

24 Aug 2024 0:56 UTC
73 points
10 comments20 min readLW link

Cal­en­dar fea­ture ge­om­e­try in GPT-2 layer 8 resi­d­ual stream SAEs

17 Aug 2024 1:16 UTC
54 points
0 comments5 min readLW link

BatchTopK: A Sim­ple Im­prove­ment for TopK-SAEs

20 Jul 2024 2:20 UTC
62 points
0 comments4 min readLW link

Stitch­ing SAEs of differ­ent sizes

13 Jul 2024 17:19 UTC
39 points
12 comments12 min readLW link

Con­cept ex­trap­o­la­tion for hy­poth­e­sis generation

12 Dec 2022 22:09 UTC
20 points
2 comments3 min readLW link