RSS

Bartosz Cywiński

Karma: 152

MATS 8.0 scholar with Arthur Conmy and Sam Marks

Cen­sored LLMs as a Nat­u­ral Testbed for Se­cret Knowl­edge Elicitation

9 Mar 2026 18:50 UTC
30 points
2 comments5 min readLW link

Can we in­ter­pret la­tent rea­son­ing us­ing cur­rent mechanis­tic in­ter­pretabil­ity tools?

22 Dec 2025 16:56 UTC
44 points
1 comment9 min readLW link

Cur­rent LLMs seem to rarely de­tect CoT tampering

19 Nov 2025 15:27 UTC
56 points
0 comments20 min readLW link

Elic­it­ing se­cret knowl­edge from lan­guage models

2 Oct 2025 20:57 UTC
68 points
3 comments2 min readLW link
(arxiv.org)