Julian Stastny

Karma: 890

member of technical staff @ redwood research

Research Sabotage in ML Codebases

egan, Vivek Hebbar and Julian Stastny

30 Apr 2026 0:26 UTC

62 points

2 comments6 min readLW link

Sleeper Agent Backdoor Results Are Messy

SebastianP, Alek Westover, Dylan Xu, Vivek Hebbar and Julian Stastny

28 Apr 2026 1:55 UTC

79 points

4 comments7 min readLW link

An Empirical Study of Methods for SFTing Opaque Reasoning Models

SebastianP, Alek Westover, Vivek Hebbar, Dylan Xu and Julian Stastny

24 Apr 2026 17:26 UTC

17 points

0 comments6 min readLW link

How do LLMs generalize when we do training that is intuitively compatible with two off-distribution behaviors?

Dylan Xu, Alek Westover, Vivek Hebbar, SebastianP, frisby and Julian Stastny

20 Apr 2026 16:58 UTC

61 points

5 comments20 min readLW link

Five approaches to evaluating training-based control measures

Alek Westover, SebastianP, Julian Stastny and Vivek Hebbar

18 Apr 2026 1:07 UTC

19 points

0 comments6 min readLW link

Logit ROCs: Monitor TPR is linear in FPR in logit space

Kerrick Staley, Aryan Bhatt and Julian Stastny

15 Apr 2026 1:57 UTC

25 points

0 comments7 min readLW link

(blog.redwoodresearch.org)

Model organisms researchers should check whether high LRs defeat their model organisms

Dylan Xu, SebastianP, Alek Westover, Vivek Hebbar and Julian Stastny

10 Apr 2026 0:07 UTC

40 points

0 comments5 min readLW link

How do we (more) safely defer to AIs?

ryan_greenblatt and Julian Stastny

12 Feb 2026 16:55 UTC

82 points

5 comments72 min readLW link

Methodological considerations in making malign initializations for control research

Alek Westover, Vivek Hebbar and Julian Stastny

24 Dec 2025 1:18 UTC

16 points

0 comments13 min readLW link

Prospects for studying actual schemers

ryan_greenblatt and Julian Stastny

19 Sep 2025 14:11 UTC

40 points

2 comments58 min readLW link

Research Areas in AI Control (The Alignment Project by UK AISI)

Julian Stastny, Tomek Korbak, Mojmir, Buck and Alan Cooney

1 Aug 2025 10:27 UTC

25 points

0 comments18 min readLW link

(alignmentproject.aisi.gov.uk)

Recent Redwood Research project proposals

ryan_greenblatt, Buck, Julian Stastny, joshc, Alex Mallen, Adam Kaufman , Tyler Tracy, Aryan Bhatt and Joey Yudelson

14 Jul 2025 22:27 UTC

93 points

0 comments3 min readLW link

Linkpost: Redwood Research reading list

Julian Stastny10 Jul 2025 18:39 UTC

50 points

0 comments1 min readLW link

(redwoodresearch.substack.com)

What’s worse, spies or schemers?

Buck and Julian Stastny

9 Jul 2025 14:37 UTC

51 points

2 comments5 min readLW link

Two proposed projects on abstract analogies for scheming

Julian Stastny4 Jul 2025 16:03 UTC

49 points

0 comments3 min readLW link

Making deals with early schemers

Julian Stastny, Olli Järviniemi and Buck

20 Jun 2025 18:21 UTC

129 points

41 comments15 min readLW link

Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking

Buck and Julian Stastny

8 May 2025 19:06 UTC

80 points

3 comments15 min readLW link

7+ tractable directions in AI control

Julian Stastny and ryan_greenblatt

28 Apr 2025 17:12 UTC

93 points

1 comment13 min readLW link

Disentangling four motivations for acting in accordance with UDT

Julian Stastny5 Nov 2023 21:26 UTC

46 points

4 comments7 min readLW link