Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
joanv
Karma:
155
www.joanvelja.com
All
Posts
Comments
New
Top
Old
joanv’s Shortform
joanv
8 Apr 2026 9:21 UTC
2
points
10
comments
1
min read
LW
link
[Paper] When can we trust untrusted monitoring? A safety case sketch across collusion strategies
Nelson Gardner-Challis
,
Morgan S
,
J Bostock
,
BarAltiva
,
joanv
and
Charlie Griffin
10 Mar 2026 17:28 UTC
46
points
2
comments
6
min read
LW
link
[Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew
,
joanv
,
robert mccarthy
,
ollie
,
Nandi
and
Dylan Cope
25 Sep 2024 14:52 UTC
37
points
2
comments
4
min read
LW
link
(arxiv.org)
Back to top