Nathaniel Mitrani

Karma: 69

What Drives the Compliance Gap? A Three-Driver Decomposition of Alignment Faking

Nathaniel Mitrani, Rhea Karty, dwk and Alan Cooney

28 May 2026 10:50 UTC

22 points

0 comments8 min readLW link

(arxiv.org)

Character-trained models can struggle to generalise

Nathaniel Mitrani25 May 2026 12:58 UTC

22 points

4 comments4 min readLW link

Learned Chain-of-Thought Obfuscation Generalises to Unseen Tasks

Nathaniel Mitrani, sassanb, Cam and Puria

21 May 2026 10:11 UTC

31 points

0 comments5 min readLW link

(arxiv.org)

Investigating Neural Scaling Laws Emerging from Deep Data Structure

Nathaniel Mitrani and Ari Brill

9 Oct 2025 20:11 UTC

4 points

0 comments8 min readLW link

Making the case for average-case AI Control

Nathaniel Mitrani5 Feb 2025 18:56 UTC

5 points

0 comments5 min readLW link