Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Thomas Read
Karma:
283
All
Posts
Comments
New
Top
Old
Thomas Read’s Shortform
Thomas Read
11 Jun 2026 16:34 UTC
4
points
1
comment
1
min read
LW
link
Loss of Oversight: How AI Systems May Become Harder to Audit, Monitor, and Investigate
Jordan Taylor
,
Max H
,
Ed Fage
,
Thomas Read
and
Joseph Bloom
21 May 2026 14:52 UTC
83
points
0
comments
6
min read
LW
link
(www.aisi.gov.uk)
Reproducing steering against evaluation awareness in a large open-weight model
Thomas Read
,
Bronson Schoen
,
Santiago Aranguri
and
Joseph Bloom
10 Apr 2026 10:45 UTC
89
points
17
comments
15
min read
LW
link
We found an open weight model that games alignment honeypots
Thomas Read
and
Joseph Bloom
16 Mar 2026 12:57 UTC
79
points
2
comments
10
min read
LW
link
[Research sprint] Single-model crosscoder feature ablation and steering
Thomas Read
6 Apr 2025 14:42 UTC
11
points
0
comments
12
min read
LW
link
[Replication] Crosscoder-based Stage-Wise Model Diffing
Anna Soligo
,
Thomas Read
,
Oliver Clive-Griffin
,
dmanningcoe
,
Chun Hei Yip
,
rajashree
and
Jason Gross
22 Mar 2025 18:35 UTC
25
points
0
comments
7
min read
LW
link
Birmingham ACX Everywhere meetup 2022
Thomas Read
23 Aug 2022 18:57 UTC
2
points
0
comments
1
min read
LW
link
Coventry ACX Schelling Meetup
Thomas Read
17 Apr 2022 9:05 UTC
2
points
0
comments
1
min read
LW
link
Back to top