RSS

Dylan Xu

Karma: 290

How to re­duce ca­pa­bil­ity degra­da­tion from off-model SFT

8 Jun 2026 16:24 UTC
21 points
0 comments3 min readLW link

Ad­vice for mak­ing ro­bust-to-train­ing model organisms

28 May 2026 17:26 UTC
37 points
8 comments12 min readLW link
(blog.redwoodresearch.org)

Why does off-model SFT de­grade ca­pa­bil­ities?

21 May 2026 0:35 UTC
40 points
9 comments6 min readLW link

Sleeper Agent Back­door Re­sults Are Messy

28 Apr 2026 1:55 UTC
81 points
4 comments7 min readLW link

An Em­piri­cal Study of Meth­ods for SFTing Opaque Rea­son­ing Models

24 Apr 2026 17:26 UTC
17 points
0 comments6 min readLW link

How do LLMs gen­er­al­ize when we do train­ing that is in­tu­itively com­pat­i­ble with two off-dis­tri­bu­tion be­hav­iors?

20 Apr 2026 16:58 UTC
62 points
5 comments20 min readLW link

Model or­ganisms re­searchers should check whether high LRs defeat their model organisms

10 Apr 2026 0:07 UTC
40 points
0 comments5 min readLW link

dx26′s Shortform

Dylan Xu16 Feb 2025 21:31 UTC
2 points
3 comments1 min readLW link

Mea­sur­ing Co­her­ence and Goal-Direct­ed­ness in RL Policies

Dylan Xu22 Apr 2024 18:26 UTC
10 points
0 comments7 min readLW link