Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Arno Libert
Karma:
114
All
Posts
Comments
New
Top
Old
Opus 4.6 Reasoning Doesn’t Verbalize Alignment Faking, but Behavior Persists
Daan Henselmans
,
Arno Libert
and
LennardZ
9 Feb 2026 12:55 UTC
118
points
13
comments
8
min read
LW
link
Published Safety Prompts May Create Evaluation Blind Spots
Daan Henselmans
and
Arno Libert
30 Jan 2026 18:27 UTC
2
points
0
comments
4
min read
LW
link
Back to top