RSS

joanv

Karma: 152

www.joanvelja.com

joanv’s Shortform

joanv8 Apr 2026 9:21 UTC
2 points
10 comments1 min readLW link

[Paper] When can we trust un­trusted mon­i­tor­ing? A safety case sketch across col­lu­sion strategies

10 Mar 2026 17:28 UTC
45 points
2 comments6 min readLW link

[Paper] Hid­den in Plain Text: Emer­gence and Miti­ga­tion of Stegano­graphic Col­lu­sion in LLMs

25 Sep 2024 14:52 UTC
37 points
2 comments4 min readLW link
(arxiv.org)