Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Arno Libert
Karma:
135
All
Posts
Comments
New
Top
Old
Claude Opus 4.8 Agents Engage in Exploitation and Psychological Profiling
Daan Henselmans
,
Arno Libert
and
LennardZ
28 May 2026 21:26 UTC
8
points
13
comments
2
min read
LW
link
No frontier model has acceptable levels of compliance with the EU AI Act and privacy legislation.
Daan Henselmans
,
Arno Libert
,
Amber Koelfat
and
LennardZ
27 May 2026 7:35 UTC
29
points
0
comments
9
min read
LW
link
Opus 4.6 Reasoning Doesn’t Verbalize Alignment Faking, but Behavior Persists
Daan Henselmans
,
Arno Libert
and
LennardZ
9 Feb 2026 12:55 UTC
118
points
13
comments
8
min read
LW
link
Published Safety Prompts May Create Evaluation Blind Spots
Daan Henselmans
and
Arno Libert
30 Jan 2026 18:27 UTC
2
points
0
comments
4
min read
LW
link
Back to top