RSS

Neel Nanda

Karma: 15,374

LLM-Driven Fea­ture Discovery

22 Jun 2026 22:26 UTC
10 points
0 comments5 min readLW link

How trans­par­ent is Diffu­sionGemma (and why it mat­ters)

20 Jun 2026 20:05 UTC
79 points
2 comments4 min readLW link

Syn­thetic doc­u­ment fine­tun­ing for in­still­ing pos­i­tive traits

16 Jun 2026 0:04 UTC
60 points
1 comment10 min readLW link

Why Do Naive SFT Filters For Safety Prop­er­ties Fail?

14 Jun 2026 19:45 UTC
51 points
7 comments10 min readLW link

SFT Drives Gem­ini’s Safety Properties

13 Jun 2026 15:31 UTC
78 points
4 comments1 min readLW link

Build­ing and eval­u­at­ing model diffing agents

12 Jun 2026 17:14 UTC
61 points
2 comments12 min readLW link

Models May Be­have Worse When Eval Aware

11 Jun 2026 9:28 UTC
87 points
8 comments13 min readLW link

Build­ing Bet­ter Ac­ti­va­tion Oracles

4 Jun 2026 18:34 UTC
62 points
1 comment7 min readLW link