RSS

Nathaniel Mitrani

Karma: 69

What Drives the Com­pli­ance Gap? A Three-Driver De­com­po­si­tion of Align­ment Faking

28 May 2026 10:50 UTC
22 points
0 comments8 min readLW link
(arxiv.org)

Char­ac­ter-trained mod­els can strug­gle to generalise

Nathaniel Mitrani25 May 2026 12:58 UTC
22 points
4 comments4 min readLW link

Learned Chain-of-Thought Obfus­ca­tion Gen­er­al­ises to Unseen Tasks

21 May 2026 10:11 UTC
31 points
0 comments5 min readLW link
(arxiv.org)

In­ves­ti­gat­ing Neu­ral Scal­ing Laws Emerg­ing from Deep Data Structure

9 Oct 2025 20:11 UTC
4 points
0 comments8 min readLW link

Mak­ing the case for av­er­age-case AI Control

Nathaniel Mitrani5 Feb 2025 18:56 UTC
5 points
0 comments5 min readLW link