RSS

Thomas Read

Karma: 204

Re­pro­duc­ing steer­ing against eval­u­a­tion aware­ness in a large open-weight model

10 Apr 2026 10:45 UTC
89 points
17 comments15 min readLW link

We found an open weight model that games al­ign­ment honeypots

16 Mar 2026 12:57 UTC
70 points
1 comment10 min readLW link

[Re­search sprint] Sin­gle-model cross­coder fea­ture ab­la­tion and steering

Thomas Read6 Apr 2025 14:42 UTC
11 points
0 comments12 min readLW link

[Repli­ca­tion] Cross­coder-based Stage-Wise Model Diffing

22 Mar 2025 18:35 UTC
25 points
0 comments7 min readLW link

Bir­m­ing­ham ACX Every­where meetup 2022

Thomas Read23 Aug 2022 18:57 UTC
2 points
0 comments1 min readLW link

Coven­try ACX Schel­ling Meetup

Thomas Read17 Apr 2022 9:05 UTC
2 points
0 comments1 min readLW link