Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Oliver Daniels
Karma:
689
PhD Student at Umass Amherst
All
Posts
Comments
New
Top
Old
Character Training Induces Motivation Clarification: A Clue to Claude 3 Opus
Oliver Daniels
25 Feb 2026 19:43 UTC
79
points
5
comments
8
min read
LW
link
Stress-Testing Alignment Audits With Prompt-Level Strategic Deception
Oliver Daniels
,
Perusha Moodley
and
David Lindner
10 Feb 2026 17:29 UTC
17
points
0
comments
1
min read
LW
link
(arxiv.org)
On Meta-Level Adversarial Evaluations of (White-Box) Alignment Auditing
Oliver Daniels
10 Feb 2026 17:06 UTC
26
points
5
comments
3
min read
LW
link
An Explication of Alignment Optimism
Oliver Daniels
31 Jan 2026 20:58 UTC
43
points
22
comments
1
min read
LW
link
[Linkpost] Theory and AI Alignment (Scott Aaronson)
Oliver Daniels
7 Dec 2025 19:17 UTC
15
points
1
comment
3
min read
LW
link
(scottaaronson.blog)
Concrete Methods for Heuristic Estimation on Neural Networks
Oliver Daniels
14 Nov 2024 5:07 UTC
35
points
0
comments
27
min read
LW
link
Concrete empirical research projects in mechanistic anomaly detection
Erik Jenner
,
Viktor Rehnberg
and
Oliver Daniels
3 Apr 2024 23:07 UTC
43
points
3
comments
10
min read
LW
link
Oliver Daniels-Koch’s Shortform
Oliver Daniels
17 Mar 2024 17:24 UTC
2
points
57
comments
1
min read
LW
link
Back to top