Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Secret Loyalties
Tag
Last edit:
12 Feb 2026 18:48 UTC
by
Joe Kwon
Relevant
New
Old
How do intentional secret loyalties differ from other schemer motivations?
Cleo Nardo
26 Apr 2026 20:03 UTC
25
points
1
comment
12
min read
LW
link
A Research Agenda for Secret Loyalties
Joe Kwon
,
Alfie Lamerton
,
draganover
,
Dave Banerjee
,
Bronson Schoen
,
Daniel Kokotajlo
,
ryan_greenblatt
,
Owain_Evans
,
Fabien Roger
and
Tom Davidson
13 May 2026 17:34 UTC
33
points
3
comments
3
min read
LW
link
Pre-training data poisoning likely makes installing secret loyalties easier
Joe Kwon
23 Feb 2026 18:12 UTC
12
points
0
comments
4
min read
LW
link
How Robust Is Monitoring Against Secret Loyalties?
Joe Kwon
26 Feb 2026 15:50 UTC
8
points
0
comments
5
min read
LW
link
The Easiest Route to Secret Loyalty May Be Hijacking the Model’s Chain of Command
Joe Kwon
24 Feb 2026 17:47 UTC
16
points
1
comment
5
min read
LW
link
How Secret Loyalty Differs from Standard Backdoor Threats
Joe Kwon
12 Feb 2026 18:48 UTC
23
points
4
comments
12
min read
LW
link
Reasoning Traces as a Path to Data-Efficient Generalization in Data Poisoning
Joe Kwon
25 Feb 2026 18:17 UTC
14
points
0
comments
3
min read
LW
link
No comments.
Back to top