Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Bartosz Cywiński
Karma:
102
MATS 8.0 scholar with Arthur Conmy and Sam Marks
All
Posts
Comments
New
Top
Old
Current LLMs seem to rarely detect CoT tampering
Bartosz Cywiński
,
Bart Bussmann
,
Arthur Conmy
,
Neel Nanda
,
Senthooran Rajamanoharan
and
Josh Engels
19 Nov 2025 15:27 UTC
50
points
0
comments
20
min read
LW
link
Eliciting secret knowledge from language models
Bartosz Cywiński
,
Arthur Conmy
and
Sam Marks
2 Oct 2025 20:57 UTC
68
points
3
comments
2
min read
LW
link
(arxiv.org)
Back to top