RSS

Oliver Daniels

Karma: 644

PhD Student at Umass Amherst

Char­ac­ter Train­ing In­duces Mo­ti­va­tion Clar­ifi­ca­tion: A Clue to Claude 3 Opus

Oliver Daniels25 Feb 2026 19:43 UTC
77 points
5 comments8 min readLW link

Stress-Test­ing Align­ment Au­dits With Prompt-Level Strate­gic Deception

10 Feb 2026 17:29 UTC
12 points
0 comments1 min readLW link
(arxiv.org)

On Meta-Level Ad­ver­sar­ial Eval­u­a­tions of (White-Box) Align­ment Auditing

Oliver Daniels10 Feb 2026 17:06 UTC
21 points
5 comments3 min readLW link