Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Yawen Duan
Karma:
155
All
Posts
Comments
New
Top
Old
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
ChengCheng
,
Brendan Murphy
,
Adrià Garriga-alonso
,
Yashvardhan Sharma
,
dsbowen
,
smallsilo
,
Yawen Duan
,
ChrisCundy
,
Hannah Betts
,
AdamGleave
and
Kellin Pelrine
7 Feb 2025 3:57 UTC
37
points
0
comments
10
min read
LW
link
Even Superhuman Go AIs Have Surprising Failure Modes
AdamGleave
,
EuanMcLean
,
Tony Wang
,
Kellin Pelrine
,
Tom Tseng
,
Yawen Duan
,
Joseph Miller
and
MichaelDennis
20 Jul 2023 17:31 UTC
131
points
22
comments
10
min read
LW
link
(far.ai)
Back to top