Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Daan Henselmans
Karma:
31
Computational linguist, writer, AI dev. Currently running AI safety research.
All
Posts
Comments
New
Top
Old
Minor Wording Changes Produce Major Shifts in AI Behavior
Daan Henselmans
and
Derck Prinzhorn
26 Nov 2025 12:52 UTC
2
points
0
comments
6
min read
LW
link
Low-Temperature Evaluations Can Mask Critical AI Behaviors
Daan Henselmans
and
Derck Prinzhorn
13 Nov 2025 20:12 UTC
7
points
0
comments
4
min read
LW
link
Thin Alignment Can’t Solve Thick Problems
Daan Henselmans
27 Apr 2025 22:42 UTC
11
points
2
comments
9
min read
LW
link
Alignment Can Reduce Performance on Simple Ethical Questions
Daan Henselmans
3 Feb 2025 19:35 UTC
16
points
7
comments
6
min read
LW
link
Back to top