RSS

Se­cret Loyalties

TagLast edit: 12 Feb 2026 18:48 UTC by Joe Kwon

Pre-train­ing data poi­son­ing likely makes in­stal­ling se­cret loy­alties easier

Joe Kwon23 Feb 2026 18:12 UTC
12 points
0 comments4 min readLW link

How Ro­bust Is Mon­i­tor­ing Against Se­cret Loy­alties?

Joe Kwon26 Feb 2026 15:50 UTC
8 points
0 comments5 min readLW link

The Easiest Route to Se­cret Loy­alty May Be Hi­jack­ing the Model’s Chain of Command

Joe Kwon24 Feb 2026 17:47 UTC
16 points
1 comment5 min readLW link

How Se­cret Loy­alty Differs from Stan­dard Back­door Threats

Joe Kwon12 Feb 2026 18:48 UTC
23 points
4 comments12 min readLW link

Rea­son­ing Traces as a Path to Data-Effi­cient Gen­er­al­iza­tion in Data Poisoning

Joe Kwon25 Feb 2026 18:17 UTC
14 points
0 comments3 min readLW link
No comments.