Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
schroederdewitt
Karma:
112
All
Posts
Comments
New
Top
Old
Detecting collusion through multi-agent interpretability
schroederdewitt
,
aaronrose227
and
carissacullen
3 Apr 2026 9:17 UTC
15
points
1
comment
6
min read
LW
link
Detecting collusion through multi-agent interpretability
schroederdewitt
,
aaronrose227
and
carissacullen
2 Apr 2026 22:20 UTC
2
points
0
comments
5
min read
LW
link
Secret Collusion: Will We Know When to Unplug AI?
schroederdewitt
,
srm
,
MikhailB
,
Lewis Hammond
,
chansmi
and
Angira Sharma
16 Sep 2024 16:07 UTC
66
points
8
comments
31
min read
LW
link
Take SCIFs, it’s dangerous to go alone
latterframe
,
Jeffrey Ladish
and
schroederdewitt
1 May 2024 8:02 UTC
43
points
1
comment
3
min read
LW
link
Back to top