RSS

Daan Henselmans

Karma: 174

Computational linguist, writer, AI dev. Currently running AI safety research.

Claude Opus 4.8 Agents En­gage in Ex­ploita­tion and Psy­cholog­i­cal Profiling

28 May 2026 21:26 UTC
8 points
12 comments2 min readLW link

No fron­tier model has ac­cept­able lev­els of com­pli­ance with the EU AI Act and pri­vacy leg­is­la­tion.

27 May 2026 7:35 UTC
29 points
0 comments9 min readLW link

Opus 4.6 Rea­son­ing Doesn’t Ver­bal­ize Align­ment Fak­ing, but Be­hav­ior Persists

9 Feb 2026 12:55 UTC
118 points
13 comments8 min readLW link

Pub­lished Safety Prompts May Create Eval­u­a­tion Blind Spots

30 Jan 2026 18:27 UTC
2 points
0 comments4 min readLW link

Minor Word­ing Changes Pro­duce Ma­jor Shifts in AI Behavior

26 Nov 2025 12:52 UTC
3 points
0 comments6 min readLW link

Low-Tem­per­a­ture Eval­u­a­tions Can Mask Crit­i­cal AI Behaviors

13 Nov 2025 20:12 UTC
8 points
1 comment4 min readLW link

Thin Align­ment Can’t Solve Thick Problems

Daan Henselmans27 Apr 2025 22:42 UTC
11 points
2 comments9 min readLW link

Align­ment Can Re­duce Perfor­mance on Sim­ple Eth­i­cal Questions

Daan Henselmans3 Feb 2025 19:35 UTC
16 points
7 comments6 min readLW link