RSS

SebastianP

Karma: 301

How to re­duce ca­pa­bil­ity degra­da­tion from off-model SFT

8 Jun 2026 16:24 UTC
21 points
0 comments3 min readLW link

Ad­vice for mak­ing ro­bust-to-train­ing model organisms

28 May 2026 17:26 UTC
36 points
8 comments12 min readLW link
(blog.redwoodresearch.org)

Why does off-model SFT de­grade ca­pa­bil­ities?

21 May 2026 0:35 UTC
40 points
9 comments6 min readLW link

In­crim­i­nat­ing mis­al­igned AI mod­els via distillation

15 May 2026 21:43 UTC
115 points
12 comments5 min readLW link

Sleeper Agent Back­door Re­sults Are Messy

28 Apr 2026 1:55 UTC
81 points
4 comments7 min readLW link