RSS

Oliver Daniels

Karma: 689

PhD Student at Umass Amherst

Char­ac­ter Train­ing In­duces Mo­ti­va­tion Clar­ifi­ca­tion: A Clue to Claude 3 Opus

Oliver Daniels25 Feb 2026 19:43 UTC
79 points
5 comments8 min readLW link

Stress-Test­ing Align­ment Au­dits With Prompt-Level Strate­gic Deception

10 Feb 2026 17:29 UTC
17 points
0 comments1 min readLW link
(arxiv.org)

On Meta-Level Ad­ver­sar­ial Eval­u­a­tions of (White-Box) Align­ment Auditing

Oliver Daniels10 Feb 2026 17:06 UTC
26 points
5 comments3 min readLW link

An Ex­pli­ca­tion of Align­ment Optimism

Oliver Daniels31 Jan 2026 20:58 UTC
43 points
22 comments1 min readLW link

[Linkpost] The­ory and AI Align­ment (Scott Aaron­son)

Oliver Daniels7 Dec 2025 19:17 UTC
15 points
1 comment3 min readLW link
(scottaaronson.blog)

Con­crete Meth­ods for Heuris­tic Es­ti­ma­tion on Neu­ral Networks

Oliver Daniels14 Nov 2024 5:07 UTC
35 points
0 comments27 min readLW link

Con­crete em­piri­cal re­search pro­jects in mechanis­tic anomaly detection

3 Apr 2024 23:07 UTC
43 points
3 comments10 min readLW link

Oliver Daniels-Koch’s Shortform

Oliver Daniels17 Mar 2024 17:24 UTC
2 points
57 comments1 min readLW link