RSS

Arno Libert

Karma: 114

Opus 4.6 Rea­son­ing Doesn’t Ver­bal­ize Align­ment Fak­ing, but Be­hav­ior Persists

9 Feb 2026 12:55 UTC
118 points
13 comments8 min readLW link

Pub­lished Safety Prompts May Create Eval­u­a­tion Blind Spots

30 Jan 2026 18:27 UTC
2 points
0 comments4 min readLW link