Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Nathaniel Mitrani
Karma:
69
All
Posts
Comments
New
Top
Old
What Drives the Compliance Gap? A Three-Driver Decomposition of Alignment Faking
Nathaniel Mitrani
,
Rhea Karty
,
dwk
and
Alan Cooney
28 May 2026 10:50 UTC
22
points
0
comments
8
min read
LW
link
(arxiv.org)
Character-trained models can struggle to generalise
Nathaniel Mitrani
25 May 2026 12:58 UTC
22
points
4
comments
4
min read
LW
link
Learned Chain-of-Thought Obfuscation Generalises to Unseen Tasks
Nathaniel Mitrani
,
sassanb
,
Cam
and
Puria
21 May 2026 10:11 UTC
31
points
0
comments
5
min read
LW
link
(arxiv.org)
Investigating Neural Scaling Laws Emerging from Deep Data Structure
Nathaniel Mitrani
and
Ari Brill
9 Oct 2025 20:11 UTC
4
points
0
comments
8
min read
LW
link
Making the case for average-case AI Control
Nathaniel Mitrani
5 Feb 2025 18:56 UTC
5
points
0
comments
5
min read
LW
link
Back to top