RSS

Alek Westover

Karma: 369

Sleeper Agent Back­door Re­sults Are Messy

28 Apr 2026 1:55 UTC
79 points
4 comments7 min readLW link

An Em­piri­cal Study of Meth­ods for SFTing Opaque Rea­son­ing Models

24 Apr 2026 17:26 UTC
17 points
0 comments6 min readLW link

How do LLMs gen­er­al­ize when we do train­ing that is in­tu­itively com­pat­i­ble with two off-dis­tri­bu­tion be­hav­iors?

20 Apr 2026 16:58 UTC
61 points
5 comments20 min readLW link

Five ap­proaches to eval­u­at­ing train­ing-based con­trol measures

18 Apr 2026 1:07 UTC
19 points
0 comments6 min readLW link

Model or­ganisms re­searchers should check whether high LRs defeat their model organisms

10 Apr 2026 0:07 UTC
40 points
0 comments5 min readLW link

How will we do SFT on mod­els with opaque rea­son­ing?

21 Feb 2026 0:00 UTC
32 points
17 comments7 min readLW link

Three vi­sions for diffuse control

Alek Westover9 Feb 2026 6:41 UTC
5 points
0 comments3 min readLW link