Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Secret Loyalties
Tag
Last edit:
12 Feb 2026 18:48 UTC
by
Joe Kwon
Relevant
New
Old
Pre-training data poisoning likely makes installing secret loyalties easier
Joe Kwon
23 Feb 2026 18:12 UTC
12
points
0
comments
4
min read
LW
link
How Robust Is Monitoring Against Secret Loyalties?
Joe Kwon
26 Feb 2026 15:50 UTC
8
points
0
comments
5
min read
LW
link
The Easiest Route to Secret Loyalty May Be Hijacking the Model’s Chain of Command
Joe Kwon
24 Feb 2026 17:47 UTC
16
points
1
comment
5
min read
LW
link
How Secret Loyalty Differs from Standard Backdoor Threats
Joe Kwon
12 Feb 2026 18:48 UTC
23
points
4
comments
12
min read
LW
link
Reasoning Traces as a Path to Data-Efficient Generalization in Data Poisoning
Joe Kwon
25 Feb 2026 18:17 UTC
14
points
0
comments
3
min read
LW
link
No comments.
Back to top