Oliver Daniels

Karma: 721

PhD Student at Umass Amherst

Character Training Induces Motivation Clarification: A Clue to Claude 3 Opus

Oliver Daniels25 Feb 2026 19:43 UTC

82 points

5 comments8 min readLW link

Stress-Testing Alignment Audits With Prompt-Level Strategic Deception

Oliver Daniels, Perusha Moodley and David Lindner

10 Feb 2026 17:29 UTC

16 points

0 comments1 min readLW link

(arxiv.org)

On Meta-Level Adversarial Evaluations of (White-Box) Alignment Auditing

Oliver Daniels10 Feb 2026 17:06 UTC

27 points

5 comments3 min readLW link

An Explication of Alignment Optimism

Oliver Daniels31 Jan 2026 20:58 UTC

43 points

22 comments1 min readLW link

[Linkpost] Theory and AI Alignment (Scott Aaronson)

Oliver Daniels7 Dec 2025 19:17 UTC

15 points

1 comment3 min readLW link

(scottaaronson.blog)

Concrete Methods for Heuristic Estimation on Neural Networks

Oliver Daniels14 Nov 2024 5:07 UTC

35 points

0 comments27 min readLW link

Concrete empirical research projects in mechanistic anomaly detection

Erik Jenner, Viktor Rehnberg and Oliver Daniels

3 Apr 2024 23:07 UTC

43 points

3 comments10 min readLW link

Oliver Daniels-Koch’s Shortform

Oliver Daniels17 Mar 2024 17:24 UTC

2 points

59 comments1 min readLW link