Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
ChrisCundy
Karma:
70
All
Posts
Comments
New
Top
Old
Avoiding AI Deception: Lie Detectors can either Induce Honesty or Evasion
ChengCheng
,
ChrisCundy
,
smallsilo
and
AdamGleave
5 Jun 2025 23:07 UTC
22
points
2
comments
5
min read
LW
link
(far.ai)
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
ChengCheng
,
Brendan Murphy
,
Adrià Garriga-alonso
,
Yashvardhan Sharma
,
dsbowen
,
smallsilo
,
Yawen Duan
,
ChrisCundy
,
Hannah Betts
,
AdamGleave
and
Kellin Pelrine
7 Feb 2025 3:57 UTC
37
points
0
comments
10
min read
LW
link
Back to top