Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Daan Henselmans
Karma:
150
Computational linguist, writer, AI dev. Currently running AI safety research.
All
Posts
Comments
New
Top
Old
Opus 4.6 Reasoning Doesn’t Verbalize Alignment Faking, but Behavior Persists
Daan Henselmans
,
Arno Libert
and
LennardZ
9 Feb 2026 12:55 UTC
118
points
13
comments
8
min read
LW
link
Published Safety Prompts May Create Evaluation Blind Spots
Daan Henselmans
and
Arno Libert
30 Jan 2026 18:27 UTC
2
points
0
comments
4
min read
LW
link
Minor Wording Changes Produce Major Shifts in AI Behavior
Daan Henselmans
and
Derck Prinzhorn
26 Nov 2025 12:52 UTC
2
points
0
comments
6
min read
LW
link
Low-Temperature Evaluations Can Mask Critical AI Behaviors
Daan Henselmans
and
Derck Prinzhorn
13 Nov 2025 20:12 UTC
8
points
1
comment
4
min read
LW
link
Thin Alignment Can’t Solve Thick Problems
Daan Henselmans
27 Apr 2025 22:42 UTC
11
points
2
comments
9
min read
LW
link
Alignment Can Reduce Performance on Simple Ethical Questions
Daan Henselmans
3 Feb 2025 19:35 UTC
16
points
7
comments
6
min read
LW
link
Back to top