Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Julian Stastny
Karma:
890
member of technical staff @ redwood research
All
Posts
Comments
New
Top
Old
Research Sabotage in ML Codebases
egan
,
Vivek Hebbar
and
Julian Stastny
30 Apr 2026 0:26 UTC
62
points
2
comments
6
min read
LW
link
Sleeper Agent Backdoor Results Are Messy
SebastianP
,
Alek Westover
,
Dylan Xu
,
Vivek Hebbar
and
Julian Stastny
28 Apr 2026 1:55 UTC
79
points
4
comments
7
min read
LW
link
An Empirical Study of Methods for SFTing Opaque Reasoning Models
SebastianP
,
Alek Westover
,
Vivek Hebbar
,
Dylan Xu
and
Julian Stastny
24 Apr 2026 17:26 UTC
17
points
0
comments
6
min read
LW
link
How do LLMs generalize when we do training that is intuitively compatible with two off-distribution behaviors?
Dylan Xu
,
Alek Westover
,
Vivek Hebbar
,
SebastianP
,
frisby
and
Julian Stastny
20 Apr 2026 16:58 UTC
61
points
5
comments
20
min read
LW
link
Five approaches to evaluating training-based control measures
Alek Westover
,
SebastianP
,
Julian Stastny
and
Vivek Hebbar
18 Apr 2026 1:07 UTC
19
points
0
comments
6
min read
LW
link
Logit ROCs: Monitor TPR is linear in FPR in logit space
Kerrick Staley
,
Aryan Bhatt
and
Julian Stastny
15 Apr 2026 1:57 UTC
25
points
0
comments
7
min read
LW
link
(blog.redwoodresearch.org)
Model organisms researchers should check whether high LRs defeat their model organisms
Dylan Xu
,
SebastianP
,
Alek Westover
,
Vivek Hebbar
and
Julian Stastny
10 Apr 2026 0:07 UTC
40
points
0
comments
5
min read
LW
link
How do we (more) safely defer to AIs?
ryan_greenblatt
and
Julian Stastny
12 Feb 2026 16:55 UTC
82
points
5
comments
72
min read
LW
link
Methodological considerations in making malign initializations for control research
Alek Westover
,
Vivek Hebbar
and
Julian Stastny
24 Dec 2025 1:18 UTC
16
points
0
comments
13
min read
LW
link
Prospects for studying actual schemers
ryan_greenblatt
and
Julian Stastny
19 Sep 2025 14:11 UTC
40
points
2
comments
58
min read
LW
link
Research Areas in AI Control (The Alignment Project by UK AISI)
Julian Stastny
,
Tomek Korbak
,
Mojmir
,
Buck
and
Alan Cooney
1 Aug 2025 10:27 UTC
25
points
0
comments
18
min read
LW
link
(alignmentproject.aisi.gov.uk)
Recent Redwood Research project proposals
ryan_greenblatt
,
Buck
,
Julian Stastny
,
joshc
,
Alex Mallen
,
Adam Kaufman
,
Tyler Tracy
,
Aryan Bhatt
and
Joey Yudelson
14 Jul 2025 22:27 UTC
93
points
0
comments
3
min read
LW
link
Linkpost: Redwood Research reading list
Julian Stastny
10 Jul 2025 18:39 UTC
50
points
0
comments
1
min read
LW
link
(redwoodresearch.substack.com)
What’s worse, spies or schemers?
Buck
and
Julian Stastny
9 Jul 2025 14:37 UTC
51
points
2
comments
5
min read
LW
link
Two proposed projects on abstract analogies for scheming
Julian Stastny
4 Jul 2025 16:03 UTC
49
points
0
comments
3
min read
LW
link
Making deals with early schemers
Julian Stastny
,
Olli Järviniemi
and
Buck
20 Jun 2025 18:21 UTC
129
points
41
comments
15
min read
LW
link
Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking
Buck
and
Julian Stastny
8 May 2025 19:06 UTC
80
points
3
comments
15
min read
LW
link
7+ tractable directions in AI control
Julian Stastny
and
ryan_greenblatt
28 Apr 2025 17:12 UTC
93
points
1
comment
13
min read
LW
link
Disentangling four motivations for acting in accordance with UDT
Julian Stastny
5 Nov 2023 21:26 UTC
46
points
4
comments
7
min read
LW
link
Back to top